Skip to content

fix: stop treating bare digit runs as SSN/PHONE by default (#158)#162

Open
sidmohan0 wants to merge 1 commit into
devfrom
fix/4.7.1-ssn-phone-precision
Open

fix: stop treating bare digit runs as SSN/PHONE by default (#158)#162
sidmohan0 wants to merge 1 commit into
devfrom
fix/4.7.1-ssn-phone-precision

Conversation

@sidmohan0

Copy link
Copy Markdown
Contributor

Closes #158.

The bug

Dogfooding the Claude Code hook in real agent sessions, browser/MCP tool output like {"tabId": <9-digit>, "tabGroupId": <10-digit>} triggered SSN/PHONE warnings on nearly every tool call. Any bare 9-digit integer matched the SSN pattern and any bare 10-digit matched PHONE, so sessions touching tab ids, row ids, epoch timestamps, or ticket numbers got a constant stream of advisory noise — which trains users to ignore the firewall entirely, the one failure mode a security tool can't survive.

The fix

strict_numeric (default True) on scan()/redact():

  • SSN requires a dash or space delimiter (NNN-NN-NNNN / NNN NN NNNN). Space delimiters are newly supported.
  • PHONE requires a separator, parentheses, or a +country prefix.
  • Delimited/formatted numbers still match exactly as before; the pre-existing 000/666 area, 00 group, and 0000 serial checks are unchanged.
  • strict_numeric=False restores undelimited matching (v4.4.0 parity) as an opt-in.

Threaded through both agent adapters (they run strict). The hook README and plugin README document an ^\d{9}$|^\d{10}$ allowlist pattern as belt-and-braces.

The exact #158 payload now yields zero findings; a delimited SSN still matches.

Scope

This is a behavior change shipped as a patch with a prominent CHANGELOG note — flagging in case you'd rather signal it as 4.8.0. It deliberately reverses the v4.4.0 bare-9-digit SSN parity that was restored earlier (that parity is exactly what #158 is complaining about); parity is preserved as strict_numeric=False.

Broader structural validation (SSA area/group ranges, NANP area/exchange must start 2-9) is deferred to the v5 validator layer (DFPY-110) — it would reject the invalid placeholder values several test fixtures use, which is a larger change than a hotfix warrants.

Test plan

  • New tests/test_numeric_precision.py (12 tests: bare-not-matched, delimited-still-matched, opt-in parity, the False positives: numeric IDs in structured tool output flagged as SSN/PHONE #158 JSON payload)
  • Updated fixtures that encoded the old bare-digit behavior (corpus ssn-no-dashes/phone-plain-digits/passport-log, regex parametrize flips, DE-VAT parity test, allowlist-timestamp test) — dropping only bare-numeric expectations, preserving all else (e.g. PERSON)
  • Full suite: 640 passed (3 skipped spacy-import failures are pre-existing/environmental)
  • pre-commit clean
  • CI green

Structured tool output (tab ids, row ids, timestamps) contains bare
nine- and ten-digit integers that matched the SSN and PHONE patterns,
producing a constant stream of false-positive warnings that train users
to ignore the firewall. SSN now requires a dash or space delimiter and
PHONE requires a separator, parentheses, or a +country prefix by
default. Delimited/formatted numbers still match; pass
strict_numeric=False to restore undelimited matching (v4.4.0 parity).

Threads strict_numeric through scan/redact and both agent adapters;
updates corpus fixtures and regex tests that encoded the old bare-digit
behavior. Broader SSA/NANP structural validation is deferred to the v5
validator layer.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

False positives: numeric IDs in structured tool output flagged as SSN/PHONE

1 participant