Skip to content

fix(glob): bound gitignore matching memory to prevent scan OOM#1377

Closed
Simon (simonhj) wants to merge 3 commits into
v1.xfrom
simon/glob-gitignore-oom-fix
Closed

fix(glob): bound gitignore matching memory to prevent scan OOM#1377
Simon (simonhj) wants to merge 3 commits into
v1.xfrom
simon/glob-gitignore-oom-fix

Conversation

@simonhj

@simonhj Simon (simonhj) commented Jun 23, 2026

Copy link
Copy Markdown

socket fix and socket scan abort with FATAL ERROR: CALL_AND_RETRY_LAST … heap out of memory on large monorepos that contain many nested .gitignore files.

Cause

globWithGitIgnore discovers every nested .gitignore, unions all their translated patterns into one set, and handed that entire set to fast-glob's native ignore option. fast-glob re-compiles and re-tests its whole ignore array inside every directory scan, so a union of tens of thousands of patterns exhausts V8 code space and aborts the process. Raising --max-old-space-size does not help — the allocation is regex executable code, not the data heap.

Fix

Match the high-cardinality gitignore set through a single reused ignore instance (which compiles each rule once and memoizes it) applied per streamed entry, and hand fast-glob only the small bounded set it needs to prune directories during the walk. The negated-pattern path already worked this way; this unifies both paths and removes the asymmetry that left the common, non-negated case crashing.

Two parity details versus fast-glob's native ignore matching are preserved:

  • Case sensitivity tracks caseSensitiveMatch (default case-sensitive, matching git) rather than the ignore package's case-insensitive default, so dist/ no longer also ignores a differently-cased Dist/.
  • The cwd-relative path is normalized to POSIX separators before matching, so it still matches the forward-slash-anchored patterns on Windows.

Tests

  • A regression test builds a 100k-pattern nested-.gitignore tree and asserts the walk completes with the correct manifests; the pre-fix path exhausts a constrained worker heap at that count.
  • A case-sensitivity test asserts dist/ ignores dist/ but leaves Dist/ alone.
  • The existing glob suite stays green.

Note

Medium Risk
Core file-discovery path for scan/fix; behavior is intentionally unified (always streaming) with parity fixes for case and Windows paths, but regressions in ignore semantics could miss or over-exclude manifests.

Overview
Fixes socket scan / socket fix heap OOM on monorepos with many nested .gitignore files by changing how globWithGitIgnore applies the unioned ignore set.

Instead of passing tens of thousands of gitignore-derived patterns into fast-glob’s ignore option (which recompiles the full list on every directory scan), the walk now streams paths, applies a single reused ignore instance per entry, and gives fast-glob only the small defaultIgnore + CLI additionalIgnores set for directory pruning. The old fast path and negated-vs-non-negated split are removed so all scans use this streaming + ig path.

Parity: ignore matching respects case-sensitive git/fast-glob defaults via ignorecase, and cwd-relative paths are normalizePath’d before matching so Windows backslashes still hit forward-slash patterns.

Adds a 100k-pattern regression test (glob-oom.test.mts) and a case-sensitivity test; path-resolve comments clarify that streaming filters bound result memory, not ignore-pattern memory.

Reviewed by Cursor Bugbot for commit 90890ee. Configure here.

socket fix and socket scan aborted with
"FATAL ERROR: CALL_AND_RETRY_LAST ... heap out of memory" (SIGABRT) on
large monorepos. globWithGitIgnore discovers every nested .gitignore and
unions their patterns; the non-negated code path handed that whole set to
fast-glob's native ignore option. fast-glob re-compiles and re-tests its
entire ignore array inside each directory scan, so a set of tens of
thousands of patterns exhausts V8 code space, which raising
--max-old-space-size does not relieve.

Route the high-cardinality gitignore set through a single reused ignore
instance (which compiles each rule once and memoizes it) and hand fast-glob
only the small bounded set it needs to prune directories during the walk.
The negated-pattern path already worked this way; this unifies both paths
and removes the asymmetry that left the common case crashing.

Add a regression test that builds a 100k-pattern nested-.gitignore tree and
asserts the walk completes with the correct manifests, and correct a
comment in getPackageFilesForScan that overstated what the streaming filter
prevents.
Routing the non-negated path through the ignore package introduced two
parity gaps versus fast-glob's native ignore matching:

- The ignore package defaults to case-insensitive matching, while fast-glob
  (caseSensitiveMatch defaults to true) and git match case-sensitively. Build
  the matcher with ignorecase derived from caseSensitiveMatch so a `dist/`
  entry no longer also ignores a differently-cased `Dist/` sibling.
- path.relative yields backslash-separated paths on Windows, which never
  match the forward-slash-anchored patterns. Normalize the relative path with
  normalizePath before ig.ignores(), matching how the patterns are anchored.

Add a case-sensitivity regression test (dist/ vs Dist/).
@simonhj Simon (simonhj) marked this pull request as ready for review June 25, 2026 08:42
@simonhj

Copy link
Copy Markdown
Author

Closing, was a red herring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant