feat: add Claude Code hook adapter (datafog-hook)#152
Merged
Conversation
Offline PII firewall for agent tool calls. Speaks the Claude Code hooks protocol: PreToolUse gates egress tools (ask/deny on EMAIL, PHONE, CREDIT_CARD, SSN findings), UserPromptSubmit and PostToolUse inject non-blocking warnings. Core-only dependencies, ~70ms per invocation, fail-open by design, never echoes matched PII in its own output.
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this is
A
datafog-hookconsole entry point that turns DataFog into an offline PII firewall for Claude Code agent sessions, speaking the hooks protocol (JSON stdin → JSON stdout). First artifact of the agent-surfaces growth track discussed alongside the v5 roadmap.Bash|WebFetch|WebSearch|Write|Edit|mcp__.*): PII in tool input →ask(default) ordenywith a reason the model can act on.What it protects (honest framing)
The realistic threat is second-order leakage: PII pasted innocently during debugging that the agent later hardcodes into a committed test fixture, a
gh issue, or an MCP call. The Write/Edit/Bash/MCP gates catch the re-emission at the tool boundary. Inbound PII (user hands the agent a bank statement) is warned about but not preventable at the hook layer — documented plainly in the README's Limitations section, with redact-before-sharing as the guidance.Design decisions
denyholds in--dangerously-skip-permissionsmode (verified empirically in a live bypass-mode session).askdegrades with permission mode — documented as the key configuration gotcha, since permissions-relaxed sessions are exactly where a firewall matters most.Field test results
Live-tested end to end: dry demo intercepts, a real session where
askwas silently swallowed by relaxed permissions (→ gotcha docs), and adeny-mode bypass-permissions session where the PII curl was blocked before any network call. At one point the hook denied its own developer's verification command — the payload contained the test PII.Review
Reviewed by python-reviewer agent: no CRITICAL findings; 2 HIGH (RecursionError paths breaking fail-open on adversarial nesting) fixed with regression tests; MEDIUM (scan-budget starvation via padding field) fixed via per-string budget; LOWs fixed (entity-filter fallback, explicit
engine="regex").Test plan
tests/test_claude_code_hook.py) — protocol contract, PII-never-echoed, nested/adversarial payloads, fail-open, env config