Skip to content

fix(glob): bound slow-path gitignore matching memory#1380

Open
Simon (simonhj) wants to merge 1 commit into
v1.xfrom
simon/glob-content-dedup-v1x
Open

fix(glob): bound slow-path gitignore matching memory#1380
Simon (simonhj) wants to merge 1 commit into
v1.xfrom
simon/glob-content-dedup-v1x

Conversation

@simonhj

@simonhj Simon (simonhj) commented Jun 26, 2026

Copy link
Copy Markdown

socket fix / socket scan could abort with a heap-OOM SIGABRT on large monorepos. On the slow path (taken when any gitignore or projectIgnore pattern is negated), globWithGitIgnore built one ignore instance holding every nested .gitignore's cwd-anchored patterns and tested each streamed path against it; the first match call JIT-compiled all of them into V8 code-space, which is capped near 250-300MB regardless of --max-old-space-size.

This reworks only that slow path: one matcher per .gitignore, built from its raw lines and deduped by content, applied along each candidate's ancestor chain. A repo where N packages share one boilerplate .gitignore now compiles a single matcher instead of N, so compiled-regex memory is bounded by the number of distinct .gitignore contents rather than file count. The no-negation fast path is unchanged.

Because the slow path now applies each .gitignore relative to its own directory, its nested-gitignore matching is more git-faithful than the prior cwd-anchored translation: a bare filename matches at any depth, and a file under an excluded directory is not re-included by a deeper negation. Both verified against git check-ignore. Built-in ignored directories and discovered virtualenvs are still pruned on the slow path, and scan targets outside cwd no longer throw.

Tested: the existing glob suite plus regression coverage for the OOM (a 300-package tree), the two nested-gitignore behaviors, and slow-path exclusion of venvs and built-in ignored dirs.


Note

Medium Risk
Changes core file-discovery ignore logic on the negated-gitignore path, which can alter which files are scanned versus the prior global matcher, though behavior is tightened toward git and covered by new tests.

Overview
Fixes heap OOM / SIGABRT on large monorepos during socket fix / socket scan when any gitignore (or project ignore) line is negated (!), which forces the streaming slow path.

On that path, globWithGitIgnore no longer builds one global ignore instance from every nested .gitignore flattened into cwd-anchored patterns (which could JIT-compile hundreds of thousands of regexes into V8 code-space). It now compiles one matcher per distinct .gitignore body, caches matchers by content, and walks each candidate’s ancestor directories so shared boilerplate across hundreds of packages stays bounded. The no-negation fast path is unchanged.

Slow-path matching is also more git-like: patterns apply relative to each .gitignore’s directory, parent directory excludes block deeper ! re-includes, and built-in / venv pruning still goes through fast-glob’s ignore list. Adds regression tests for code-space growth, nested ignore semantics, and venv/coverage exclusion on the slow path.

Reviewed by Cursor Bugbot for commit d7463cb. Configure here.

Scanning a large monorepo could abort with a heap-OOM SIGABRT. On the slow path (any `\!` negation present), globWithGitIgnore built one `ignore` instance holding every nested .gitignore's cwd-anchored patterns; its first match call JIT-compiled all of them into V8 code-space (capped near 250-300MB regardless of --max-old-space-size) and crashed before the scan ran.

Rework only that slow path: build one matcher per .gitignore from its raw lines, dedup identical files by content so a repo of N packages sharing one boilerplate .gitignore compiles a single matcher instead of N, and apply them along each candidate's ancestor chain. A 300-package tree now grows code-space ~1MB instead of crossing the cliff. The no-negation fast path is left exactly as before.

Because the slow path now applies each .gitignore relative to its own directory, its nested-gitignore matching is also more git-faithful than the old cwd-anchored translation (a bare filename matches at any depth; a file under an excluded directory is not re-included). This is intentional and verified against `git check-ignore`; the fast path keeps the prior semantics. Also guards against scan targets outside cwd, which the old slow path threw on.
@simonhj

Copy link
Copy Markdown
Author

Claude (@claude) review once

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant