fix(custody): harden STH witness plug — no client-triggered halt, fixed-length prefix, one-time bootstrap#254
Merged
Conversation
…ed-length prefix, one-time bootstrap
The X-Loopctl-Last-Known-STH witness plug operated on raw, unauthenticated
client input yet could halt an entire tenant, evade divergence detection, and
grant unbounded validation opt-out. Three private advisories are addressed.
FIX A (custody-01, GHSA-w786-f588-2943 — HIGH): a client-supplied signature
prefix mismatch no longer calls Tenants.halt_custody/1. A raw mismatch almost
always means the caller's cached STH is stale, not that the log forked, so any
agent-role caller could persistently DoS the tenant via CheckCustodyHalt. The
plug now returns 409 witness_divergence, hands back the server's current STH in
the x-loopctl-current-sth response header so a legitimate client can resync, and
emits a [:loopctl, :witness, :divergence] telemetry event plus a Logger.warning
for monitoring. Genuine byzantine halts remain with the FallbackController
trust-violation paths and the signed-STH verifier (untouched).
FIX B (custody-02, GHSA-cxrw-xg8f-mx8v — MEDIUM): the server prefix was sliced
to the client prefix's length, so an empty ("<pos>:") or 1-char prefix compared
"" == "" (or a single char) and evaded divergence detection. The client prefix
is now validated to be exactly 22 base64url chars (the canonical length of
base64url(first 16 signature bytes)) up front — else 412 witness_header_malformed
— and the comparison window is derived solely from the fixed-length server value
via Plug.Crypto.secure_compare/2, never from client input.
FIX C (custody-03, GHSA-36g5-mcrh-rcrm — MEDIUM): X-Loopctl-STH-Bootstrap: true
granted validation grace on every request with no state, permanently opting the
caller out of witness enforcement. Bootstrap grace is now one-time per API key:
a new nullable api_keys.sth_bootstrap_consumed_at column is stamped (via
AdminRepo) on the first bootstrap and any later bootstrap from the same key is
rejected 412 witness_bootstrap_already_consumed. An absent/superadmin key is
handled gracefully (grace without stamping, no crash).
Also standardizes all precondition rejections to HTTP 412 (they used the 428
:precondition_required atom while the JSON body and moduledoc declared 412) and
adds an :enforce plug opt so the dedicated tests exercise enforcement without
Application.put_env. Adds test/loopctl_web/plugs/validate_witness_header_test.exs
covering all three fixes.
Refs private advisories GHSA-w786-f588-2943, GHSA-cxrw-xg8f-mx8v, GHSA-36g5-mcrh-rcrm.
mkreyman
added a commit
that referenced
this pull request
Jul 2, 2026
…ss, oversized-position crash (#256) * fix(custody): close witness-plug bootstrap race, future-position bypass, oversized-position crash Follow-up to #254, which merged the round-1 witness hardening before the enhanced-review gate's fixes landed — so master still had three defects the gate reproduced: - Bootstrap grace was TOCTOU-racy (read-then-write): concurrent first requests for the same key both got the "one-time" grace. Now atomic and DB-authoritative via a conditional UPDATE guarded by `sth_bootstrap_consumed_at IS NULL`, branching on the row count. - The divergence check could be skipped by claiming a chain position beyond the tenant's latest sealed STH (the nil branch passed through with no comparison). A future/unverifiable position now returns the same 409 resync response. - An oversized position (> bigint range) reached the query and raised a Postgrex encode error (500); position is now range-checked before the query and rejected as 412 malformed. Refs private advisories GHSA-w786-f588-2943, GHSA-cxrw-xg8f-mx8v, GHSA-36g5-mcrh-rcrm. * fix(custody): close gate follow-ups — public STH endpoint crash, bootstrap strand, SQL-in-log Enhanced-review (Sonnet-5/xhigh) of the witness fix-forward found three issues: - HIGH: the bigint range-check was applied in the witness plug but NOT in the public, unauthenticated AuditSthController (`GET /audit/sth/:id?at=...`). An oversized `at` reached Postgrex and raised DBConnection.EncodeError — a class the DB-error sanitizer does not handle — so it re-raised as an uncaught 500, repeatable with no auth or rate limit. Fix: make AuditChain.get_sth_at_position defensively return nil for a non-integer / out-of-bigint-range position (single source of truth via AuditChain.max_chain_position/0, which the plug now references), protecting every caller. Adds the previously-missing controller test suite. - HIGH: the atomic bootstrap consume's rescue wrapped BOTH the grace-consuming UPDATE and the follow-on STH read, so a transient read failure after the UPDATE committed would swallow the error and permanently strand the key (grace burned, 412 forever). Fix: read the STH BEFORE consuming the grace; after a committed UPDATE only pure header-setting remains, so nothing can fail post-consume. - LOW: the rescue logged `inspect(exception)`, which prints %Postgrex.Error{}'s raw `:query` SQL. Fix: log only the struct name + SQLSTATE (US-27.3 sanitized- logging convention). Refs private advisories GHSA-w786-f588-2943, GHSA-cxrw-xg8f-mx8v, GHSA-36g5-mcrh-rcrm.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The
X-Loopctl-Last-Known-STHwitness plug (LoopctlWeb.Plugs.ValidateWitnessHeader) sits in the:authenticatedpipeline and validates raw, unauthenticated client input — yet it could halt an entire tenant, be evaded, and be permanently opted-out. This PR fixes three chain-of-custody vulnerabilities (private advisories) and adds a dedicated test suite.Fixes
FIX A — custody-01 (GHSA-w786-f588-2943, HIGH): unauthenticated client can force a persistent tenant-wide custody halt
Previously, a client-supplied signature-prefix mismatch called
Tenants.halt_custody/1, so any agent-role caller could persistently DoS the whole tenant (viaCheckCustodyHalt) just by sending a bogus/stale prefix. A raw mismatch is not proof of a fork — it almost always means the caller's cached STH is stale.409 witness_divergenceand includes the server's current STH in thex-loopctl-current-sthresponse header so a legitimate client can resync.[:loopctl, :witness, :divergence]telemetry event +Logger.warningfor monitoring (noLogger.error+ halt).FallbackControllertrust-violation paths and the signed-STH verifier (untouched).FIX B — custody-02 (GHSA-cxrw-xg8f-mx8v, MEDIUM): divergence check bypassable via empty/short prefix
The server prefix was sliced to
String.length(client_prefix), so an empty prefix ("<pos>:") made"" == ""always pass, and a 1-char prefix compared a single char — evading divergence detection.base64url(first 16 signature bytes)), else412 witness_header_malformed.Plug.Crypto.secure_compare/2— never from client input.FIX C — custody-03 (GHSA-36g5-mcrh-rcrm, MEDIUM): witness enforcement permanently opt-out via bootstrap header
X-Loopctl-STH-Bootstrap: truegranted grace on every request with no state, permanently skipping validation.api_keys.sth_bootstrap_consumed_at :utc_datetime_usec(RLS onapi_keysleft untouched; stamped viaAdminRepo).x-loopctl-current-sthcache hint); a second bootstrap from the same key is rejected412 witness_bootstrap_already_consumed.Migration
priv/repo/migrations/20260701150000_add_sth_bootstrap_consumed_at_to_api_keys.exs—alter table(:api_keys) add :sth_bootstrap_consumed_at. Reversiblechange/0(verified down/up on dev DB). Does not alter the existingapi_keysRLS.Notable adjustments
:precondition_required(428) atom while the JSON body and moduledoc declared 412 (a latent HTTP-vs-body inconsistency).:enforceplug opt so the dedicated tests exercise enforcement withoutApplication.put_env(forbidden by repo test conventions).Tests
New
test/loopctl_web/plugs/validate_witness_header_test.exs(11 tests) covering: mismatch → 409 + resync header + tenant not halted (A); empty/short/over-length/non-base64url prefixes → 412 (B); first bootstrap grants+stamps, second is rejected 412, absent-key no-crash (C).Full gate green locally:
compile --warnings-as-errors,format --check,credo --strict,deps.audit,dialyzer(0 errors), andmix test→ 3111 tests, 0 failures, 40 excluded.Refs private advisories GHSA-w786-f588-2943, GHSA-cxrw-xg8f-mx8v, GHSA-36g5-mcrh-rcrm.