fix: stop treating bare digit runs as SSN/PHONE by default (#158) by sidmohan0 · Pull Request #162 · DataFog/datafog-python

sidmohan0 · 2026-07-03T00:07:27Z

Closes #158.

The bug

Dogfooding the Claude Code hook in real agent sessions, browser/MCP tool output like {"tabId": <9-digit>, "tabGroupId": <10-digit>} triggered SSN/PHONE warnings on nearly every tool call. Any bare 9-digit integer matched the SSN pattern and any bare 10-digit matched PHONE, so sessions touching tab ids, row ids, epoch timestamps, or ticket numbers got a constant stream of advisory noise — which trains users to ignore the firewall entirely, the one failure mode a security tool can't survive.

The fix

strict_numeric (default True) on scan()/redact():

SSN requires a dash or space delimiter (NNN-NN-NNNN / NNN NN NNNN). Space delimiters are newly supported.
PHONE requires a separator, parentheses, or a +country prefix.
Delimited/formatted numbers still match exactly as before; the pre-existing 000/666 area, 00 group, and 0000 serial checks are unchanged.
strict_numeric=False restores undelimited matching (v4.4.0 parity) as an opt-in.

Threaded through both agent adapters (they run strict). The hook README and plugin README document an ^\d{9}$|^\d{10}$ allowlist pattern as belt-and-braces.

The exact #158 payload now yields zero findings; a delimited SSN still matches.

Scope

This is a behavior change shipped as a patch with a prominent CHANGELOG note — flagging in case you'd rather signal it as 4.8.0. It deliberately reverses the v4.4.0 bare-9-digit SSN parity that was restored earlier (that parity is exactly what #158 is complaining about); parity is preserved as strict_numeric=False.

Broader structural validation (SSA area/group ranges, NANP area/exchange must start 2-9) is deferred to the v5 validator layer (DFPY-110) — it would reject the invalid placeholder values several test fixtures use, which is a larger change than a hotfix warrants.

Test plan

New tests/test_numeric_precision.py (12 tests: bare-not-matched, delimited-still-matched, opt-in parity, the False positives: numeric IDs in structured tool output flagged as SSN/PHONE #158 JSON payload)
Updated fixtures that encoded the old bare-digit behavior (corpus ssn-no-dashes/phone-plain-digits/passport-log, regex parametrize flips, DE-VAT parity test, allowlist-timestamp test) — dropping only bare-numeric expectations, preserving all else (e.g. PERSON)
Full suite: 640 passed (3 skipped spacy-import failures are pre-existing/environmental)
pre-commit clean
CI green

Structured tool output (tab ids, row ids, timestamps) contains bare nine- and ten-digit integers that matched the SSN and PHONE patterns, producing a constant stream of false-positive warnings that train users to ignore the firewall. SSN now requires a dash or space delimiter and PHONE requires a separator, parentheses, or a +country prefix by default. Delimited/formatted numbers still match; pass strict_numeric=False to restore undelimited matching (v4.4.0 parity). Threads strict_numeric through scan/redact and both agent adapters; updates corpus fixtures and regex tests that encoded the old bare-digit behavior. Broader SSA/NANP structural validation is deferred to the v5 validator layer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: stop treating bare digit runs as SSN/PHONE by default (#158)#162

fix: stop treating bare digit runs as SSN/PHONE by default (#158)#162
sidmohan0 wants to merge 1 commit into
devfrom
fix/4.7.1-ssn-phone-precision

sidmohan0 commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

sidmohan0 commented Jul 3, 2026

The bug

The fix

Scope

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant