fix: stop no-dash SSN branch matching digit runs inside hex IDs#163
Open
sidmohan0 wants to merge 2 commits into
Open
fix: stop no-dash SSN branch matching digit runs inside hex IDs#163sidmohan0 wants to merge 2 commits into
sidmohan0 wants to merge 2 commits into
Conversation
A bare nine-digit run embedded in a longer alphanumeric token (random hex IDs, UUID segments) matched the SSN pattern because its boundaries only excluded adjacent digits. In practice this let randomly generated server IDs trip the Claude Code PII firewall hook and block entire sessions. Tighten the no-dash branch only: the run must not be followed by a letter, and must start at a non-alphanumeric boundary or right after a two-letter token prefix (preserving v4.4.0 country-code parity, e.g. DE123456789). The dashed branch keeps its existing boundaries.
Review found the two-letter-prefix exception matched any two-letter token (ID, PO, ...), and via IGNORECASE also lowercase "de" — a hex byte, which would reopen the random-hex-ID false positive. The v4.4.0 parity test only pins uppercase DE (DE_VAT_ID overlap), so restrict the lookbehind to a case-sensitive DE prefix and cover generic and lowercase prefixes in the regression test.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
datafog-hookPII firewall block an entire preview session whose randomly generated server ID contained such a run.DEtoken prefix — the one letter-prefixed shape pinned for v4.4.0DE_VAT_IDparity (test_ssn_detection_keeps_v44_behavior_for_country_prefixed_digits). The prefix is case-sensitive because lowercasedeis a hex byte and would reopen the hex-ID false positive. The dashed branch is unchanged.Review notes
An adversarial regex review confirmed: fixed-width lookbehinds are valid, IGNORECASE/VERBOSE interactions are correct, no catastrophic-backtracking risk (tested on 200k-char hex blobs), and the dashed branch is semantically identical to before. Its one HIGH finding — the initial generic two-letter-prefix exception matching
ID.../PO.../lowercasede...— is addressed in the second commit by scoping to case-sensitiveDE.Test plan
ID,PO, lowercasede) are not flaggedDE-prefix regression test pass unchangeddevlocally (missing spaCy model, environmental)