Release v4.7.0#160
Merged
Merged
Conversation
…ion refs and entity labels
docs: refresh README for 4.6
docs: give Claude Code plugin install top billing
Adds allowlist (exact values) and allowlist_patterns (full-match regexes) to scan/redact and threads them through both agent adapters: DATAFOG_HOOK_ALLOWLIST / DATAFOG_HOOK_ALLOWLIST_PATTERNS env vars for the Claude Code hook, allowlist/allowlist_patterns params for the LiteLLM guardrail. Motivated by a day of dogfooding: unix timestamps and numeric IDs match the PHONE pattern, and intentional identifiers (own support email, doc placeholders) should be exemptable. Accepts presidio-style entity names (EMAIL_ADDRESS, US_SSN) as input aliases via the existing canonical type map, ships a py.typed marker so downstream type checkers see our annotations, and backports the upstream-review fixes to the in-repo litellm adapter (guardrail spans recorded on the returned dict, redaction reported as intervention). Also corrects an entity-name documentation error introduced in #156: the scan API returns DATE and ZIP_CODE (DOB/ZIP are input aliases).
Review findings: reject quantified groups containing nested quantifiers at compile time (catastrophic backtracking on attacker-influenced entity text), cap pattern length at 512 chars, and skip pattern matching for entities longer than 512 chars (fail-safe: the finding is kept). Match semantics documented as case-sensitive with no Unicode normalization; allowlist entries are operator configuration, never end-user input. Adds regression tests for the rejection heuristic, the smart-engine path, and the redact(entities=..., allowlist=...) guard. Replaces a walrus assignment with a plain one in the litellm adapter.
feat: allowlist support, presidio entity aliases, py.typed (4.7.0)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What ships in 4.7.0
Additive minor release, engine + adapters (#159):
EMAIL_ADDRESS,US_SSN) for config migrationguardrail_status="guardrail_intervened"(was"success") — see CHANGELOGRelease mechanics
Same checklist as 4.6.0: merge commit (not squash), then I verify merged tree == this branch, dispatch stable
dry_run=true, confirm "Would have published: 4.7.0", dispatchdry_run=false, verify PyPI + tag, fast-forward dev.Why tonight
Starts litellm's 3-day package quarantine clock now: 4.7.0 clears ~July 5, so the scheduled CI-pin push to BerriAI/litellm#31991 can pin 4.7.0 directly and drop the pyright suppression in one commit.
Test plan