Skip to content

feat: add Claude Code hook adapter (datafog-hook)#152

Merged
sidmohan0 merged 5 commits into
devfrom
feat/claude-code-hook
Jul 2, 2026
Merged

feat: add Claude Code hook adapter (datafog-hook)#152
sidmohan0 merged 5 commits into
devfrom
feat/claude-code-hook

Conversation

@sidmohan0

Copy link
Copy Markdown
Contributor

What this is

A datafog-hook console entry point that turns DataFog into an offline PII firewall for Claude Code agent sessions, speaking the hooks protocol (JSON stdin → JSON stdout). First artifact of the agent-surfaces growth track discussed alongside the v5 roadmap.

  • PreToolUse gates egress tools (Bash|WebFetch|WebSearch|Write|Edit|mcp__.*): PII in tool input → ask (default) or deny with a reason the model can act on.
  • UserPromptSubmit / PostToolUse inject non-blocking warnings so the model avoids re-propagating PII it has seen.
  • Core-only dependencies (no extras needed); measured ~70ms per invocation including process startup — fast enough to run on every tool call, which is the moat over spaCy-based alternatives.

What it protects (honest framing)

The realistic threat is second-order leakage: PII pasted innocently during debugging that the agent later hardcodes into a committed test fixture, a gh issue, or an MCP call. The Write/Edit/Bash/MCP gates catch the re-emission at the tool boundary. Inbound PII (user hands the agent a bank statement) is warned about but not preventable at the hook layer — documented plainly in the README's Limitations section, with redact-before-sharing as the guidance.

Design decisions

  • Fail-open everywhere: a hook bug must never brick a session (verified including adversarial recursion-bomb payloads).
  • Never echoes matched PII in its own output — findings reported as type counts only, since hook output lands in transcripts.
  • High-precision default entity set (EMAIL, PHONE, CREDIT_CARD, SSN); noisy-in-code types (IP_ADDRESS, DOB, ZIP) are opt-in.
  • deny holds in --dangerously-skip-permissions mode (verified empirically in a live bypass-mode session). ask degrades with permission mode — documented as the key configuration gotcha, since permissions-relaxed sessions are exactly where a firewall matters most.

Field test results

Live-tested end to end: dry demo intercepts, a real session where ask was silently swallowed by relaxed permissions (→ gotcha docs), and a deny-mode bypass-permissions session where the PII curl was blocked before any network call. At one point the hook denied its own developer's verification command — the payload contained the test PII.

Review

Reviewed by python-reviewer agent: no CRITICAL findings; 2 HIGH (RecursionError paths breaking fail-open on adversarial nesting) fixed with regression tests; MEDIUM (scan-budget starvation via padding field) fixed via per-string budget; LOWs fixed (entity-filter fallback, explicit engine="regex").

Test plan

  • 18 unit tests passing (tests/test_claude_code_hook.py) — protocol contract, PII-never-echoed, nested/adversarial payloads, fail-open, env config
  • black/isort/flake8 clean
  • Live end-to-end verification in real Claude Code sessions (ask mode, deny mode, bypass-permissions mode)
  • CI green on this PR

sidmohan0 added 3 commits July 2, 2026 11:15
Offline PII firewall for agent tool calls. Speaks the Claude Code hooks
protocol: PreToolUse gates egress tools (ask/deny on EMAIL, PHONE,
CREDIT_CARD, SSN findings), UserPromptSubmit and PostToolUse inject
non-blocking warnings. Core-only dependencies, ~70ms per invocation,
fail-open by design, never echoes matched PII in its own output.
@sidmohan0 sidmohan0 merged commit d44d6b0 into dev Jul 2, 2026
26 checks passed
@sidmohan0 sidmohan0 mentioned this pull request Jul 2, 2026
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant