fix(glob): bound slow-path gitignore matching memory#1380
Open
Simon (simonhj) wants to merge 1 commit into
Open
fix(glob): bound slow-path gitignore matching memory#1380Simon (simonhj) wants to merge 1 commit into
Simon (simonhj) wants to merge 1 commit into
Conversation
Scanning a large monorepo could abort with a heap-OOM SIGABRT. On the slow path (any `\!` negation present), globWithGitIgnore built one `ignore` instance holding every nested .gitignore's cwd-anchored patterns; its first match call JIT-compiled all of them into V8 code-space (capped near 250-300MB regardless of --max-old-space-size) and crashed before the scan ran. Rework only that slow path: build one matcher per .gitignore from its raw lines, dedup identical files by content so a repo of N packages sharing one boilerplate .gitignore compiles a single matcher instead of N, and apply them along each candidate's ancestor chain. A 300-package tree now grows code-space ~1MB instead of crossing the cliff. The no-negation fast path is left exactly as before. Because the slow path now applies each .gitignore relative to its own directory, its nested-gitignore matching is also more git-faithful than the old cwd-anchored translation (a bare filename matches at any depth; a file under an excluded directory is not re-included). This is intentional and verified against `git check-ignore`; the fast path keeps the prior semantics. Also guards against scan targets outside cwd, which the old slow path threw on.
Author
|
Claude (@claude) review once |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
socket fix/socket scancould abort with a heap-OOM SIGABRT on large monorepos. On the slow path (taken when any gitignore or projectIgnore pattern is negated),globWithGitIgnorebuilt oneignoreinstance holding every nested.gitignore's cwd-anchored patterns and tested each streamed path against it; the first match call JIT-compiled all of them into V8 code-space, which is capped near 250-300MB regardless of--max-old-space-size.This reworks only that slow path: one matcher per
.gitignore, built from its raw lines and deduped by content, applied along each candidate's ancestor chain. A repo where N packages share one boilerplate.gitignorenow compiles a single matcher instead of N, so compiled-regex memory is bounded by the number of distinct.gitignorecontents rather than file count. The no-negation fast path is unchanged.Because the slow path now applies each
.gitignorerelative to its own directory, its nested-gitignore matching is more git-faithful than the prior cwd-anchored translation: a bare filename matches at any depth, and a file under an excluded directory is not re-included by a deeper negation. Both verified againstgit check-ignore. Built-in ignored directories and discovered virtualenvs are still pruned on the slow path, and scan targets outside cwd no longer throw.Tested: the existing glob suite plus regression coverage for the OOM (a 300-package tree), the two nested-gitignore behaviors, and slow-path exclusion of venvs and built-in ignored dirs.
Note
Medium Risk
Changes core file-discovery ignore logic on the negated-gitignore path, which can alter which files are scanned versus the prior global matcher, though behavior is tightened toward git and covered by new tests.
Overview
Fixes heap OOM / SIGABRT on large monorepos during
socket fix/socket scanwhen any gitignore (or project ignore) line is negated (!), which forces the streaming slow path.On that path,
globWithGitIgnoreno longer builds one globalignoreinstance from every nested.gitignoreflattened into cwd-anchored patterns (which could JIT-compile hundreds of thousands of regexes into V8 code-space). It now compiles one matcher per distinct.gitignorebody, caches matchers by content, and walks each candidate’s ancestor directories so shared boilerplate across hundreds of packages stays bounded. The no-negation fast path is unchanged.Slow-path matching is also more git-like: patterns apply relative to each
.gitignore’s directory, parent directory excludes block deeper!re-includes, and built-in / venv pruning still goes through fast-glob’s ignore list. Adds regression tests for code-space growth, nested ignore semantics, and venv/coverageexclusion on the slow path.Reviewed by Cursor Bugbot for commit d7463cb. Configure here.