Skip to content

fix(custody): harden STH witness plug — no client-triggered halt, fixed-length prefix, one-time bootstrap#254

Merged
mkreyman merged 1 commit into
masterfrom
fix/custody-witness-header
Jul 1, 2026
Merged

fix(custody): harden STH witness plug — no client-triggered halt, fixed-length prefix, one-time bootstrap#254
mkreyman merged 1 commit into
masterfrom
fix/custody-witness-header

Conversation

@mkreyman

@mkreyman mkreyman commented Jul 1, 2026

Copy link
Copy Markdown
Owner

Summary

The X-Loopctl-Last-Known-STH witness plug (LoopctlWeb.Plugs.ValidateWitnessHeader) sits in the :authenticated pipeline and validates raw, unauthenticated client input — yet it could halt an entire tenant, be evaded, and be permanently opted-out. This PR fixes three chain-of-custody vulnerabilities (private advisories) and adds a dedicated test suite.

Fixes

FIX A — custody-01 (GHSA-w786-f588-2943, HIGH): unauthenticated client can force a persistent tenant-wide custody halt

Previously, a client-supplied signature-prefix mismatch called Tenants.halt_custody/1, so any agent-role caller could persistently DoS the whole tenant (via CheckCustodyHalt) just by sending a bogus/stale prefix. A raw mismatch is not proof of a fork — it almost always means the caller's cached STH is stale.

  • No longer halts custody from this plug.
  • Returns 409 witness_divergence and includes the server's current STH in the x-loopctl-current-sth response header so a legitimate client can resync.
  • Emits a [:loopctl, :witness, :divergence] telemetry event + Logger.warning for monitoring (no Logger.error + halt).
  • Genuine byzantine halts remain with the FallbackController trust-violation paths and the signed-STH verifier (untouched).

FIX B — custody-02 (GHSA-cxrw-xg8f-mx8v, MEDIUM): divergence check bypassable via empty/short prefix

The server prefix was sliced to String.length(client_prefix), so an empty prefix ("<pos>:") made "" == "" always pass, and a 1-char prefix compared a single char — evading divergence detection.

  • The client prefix must now be exactly 22 base64url chars (the canonical length of base64url(first 16 signature bytes)), else 412 witness_header_malformed.
  • The comparison window is derived solely from the fixed-length server value via Plug.Crypto.secure_compare/2 — never from client input.

FIX C — custody-03 (GHSA-36g5-mcrh-rcrm, MEDIUM): witness enforcement permanently opt-out via bootstrap header

X-Loopctl-STH-Bootstrap: true granted grace on every request with no state, permanently skipping validation.

  • Bootstrap grace is now one-time per API key.
  • New migration adds nullable api_keys.sth_bootstrap_consumed_at :utc_datetime_usec (RLS on api_keys left untouched; stamped via AdminRepo).
  • First bootstrap stamps consumption and grants grace (keeping the x-loopctl-current-sth cache hint); a second bootstrap from the same key is rejected 412 witness_bootstrap_already_consumed.
  • Absent/superadmin key is handled gracefully (grace without stamping, no crash).

Migration

priv/repo/migrations/20260701150000_add_sth_bootstrap_consumed_at_to_api_keys.exsalter table(:api_keys) add :sth_bootstrap_consumed_at. Reversible change/0 (verified down/up on dev DB). Does not alter the existing api_keys RLS.

Notable adjustments

  • Standardized all precondition rejections to actual HTTP 412 — the plug previously used the :precondition_required (428) atom while the JSON body and moduledoc declared 412 (a latent HTTP-vs-body inconsistency).
  • Added an :enforce plug opt so the dedicated tests exercise enforcement without Application.put_env (forbidden by repo test conventions).

Tests

New test/loopctl_web/plugs/validate_witness_header_test.exs (11 tests) covering: mismatch → 409 + resync header + tenant not halted (A); empty/short/over-length/non-base64url prefixes → 412 (B); first bootstrap grants+stamps, second is rejected 412, absent-key no-crash (C).

Full gate green locally: compile --warnings-as-errors, format --check, credo --strict, deps.audit, dialyzer (0 errors), and mix test3111 tests, 0 failures, 40 excluded.

Refs private advisories GHSA-w786-f588-2943, GHSA-cxrw-xg8f-mx8v, GHSA-36g5-mcrh-rcrm.

…ed-length prefix, one-time bootstrap

The X-Loopctl-Last-Known-STH witness plug operated on raw, unauthenticated
client input yet could halt an entire tenant, evade divergence detection, and
grant unbounded validation opt-out. Three private advisories are addressed.

FIX A (custody-01, GHSA-w786-f588-2943 — HIGH): a client-supplied signature
prefix mismatch no longer calls Tenants.halt_custody/1. A raw mismatch almost
always means the caller's cached STH is stale, not that the log forked, so any
agent-role caller could persistently DoS the tenant via CheckCustodyHalt. The
plug now returns 409 witness_divergence, hands back the server's current STH in
the x-loopctl-current-sth response header so a legitimate client can resync, and
emits a [:loopctl, :witness, :divergence] telemetry event plus a Logger.warning
for monitoring. Genuine byzantine halts remain with the FallbackController
trust-violation paths and the signed-STH verifier (untouched).

FIX B (custody-02, GHSA-cxrw-xg8f-mx8v — MEDIUM): the server prefix was sliced
to the client prefix's length, so an empty ("<pos>:") or 1-char prefix compared
"" == "" (or a single char) and evaded divergence detection. The client prefix
is now validated to be exactly 22 base64url chars (the canonical length of
base64url(first 16 signature bytes)) up front — else 412 witness_header_malformed
— and the comparison window is derived solely from the fixed-length server value
via Plug.Crypto.secure_compare/2, never from client input.

FIX C (custody-03, GHSA-36g5-mcrh-rcrm — MEDIUM): X-Loopctl-STH-Bootstrap: true
granted validation grace on every request with no state, permanently opting the
caller out of witness enforcement. Bootstrap grace is now one-time per API key:
a new nullable api_keys.sth_bootstrap_consumed_at column is stamped (via
AdminRepo) on the first bootstrap and any later bootstrap from the same key is
rejected 412 witness_bootstrap_already_consumed. An absent/superadmin key is
handled gracefully (grace without stamping, no crash).

Also standardizes all precondition rejections to HTTP 412 (they used the 428
:precondition_required atom while the JSON body and moduledoc declared 412) and
adds an :enforce plug opt so the dedicated tests exercise enforcement without
Application.put_env. Adds test/loopctl_web/plugs/validate_witness_header_test.exs
covering all three fixes.

Refs private advisories GHSA-w786-f588-2943, GHSA-cxrw-xg8f-mx8v, GHSA-36g5-mcrh-rcrm.
@mkreyman mkreyman merged commit bc7e552 into master Jul 1, 2026
9 checks passed
mkreyman added a commit that referenced this pull request Jul 2, 2026
…ss, oversized-position crash (#256)

* fix(custody): close witness-plug bootstrap race, future-position bypass, oversized-position crash

Follow-up to #254, which merged the round-1 witness hardening before the
enhanced-review gate's fixes landed — so master still had three defects the gate
reproduced:

- Bootstrap grace was TOCTOU-racy (read-then-write): concurrent first requests
  for the same key both got the "one-time" grace. Now atomic and
  DB-authoritative via a conditional UPDATE guarded by
  `sth_bootstrap_consumed_at IS NULL`, branching on the row count.
- The divergence check could be skipped by claiming a chain position beyond the
  tenant's latest sealed STH (the nil branch passed through with no comparison).
  A future/unverifiable position now returns the same 409 resync response.
- An oversized position (> bigint range) reached the query and raised a Postgrex
  encode error (500); position is now range-checked before the query and
  rejected as 412 malformed.

Refs private advisories GHSA-w786-f588-2943, GHSA-cxrw-xg8f-mx8v, GHSA-36g5-mcrh-rcrm.

* fix(custody): close gate follow-ups — public STH endpoint crash, bootstrap strand, SQL-in-log

Enhanced-review (Sonnet-5/xhigh) of the witness fix-forward found three issues:

- HIGH: the bigint range-check was applied in the witness plug but NOT in the
  public, unauthenticated AuditSthController (`GET /audit/sth/:id?at=...`). An
  oversized `at` reached Postgrex and raised DBConnection.EncodeError — a class
  the DB-error sanitizer does not handle — so it re-raised as an uncaught 500,
  repeatable with no auth or rate limit. Fix: make AuditChain.get_sth_at_position
  defensively return nil for a non-integer / out-of-bigint-range position (single
  source of truth via AuditChain.max_chain_position/0, which the plug now
  references), protecting every caller. Adds the previously-missing controller
  test suite.
- HIGH: the atomic bootstrap consume's rescue wrapped BOTH the grace-consuming
  UPDATE and the follow-on STH read, so a transient read failure after the UPDATE
  committed would swallow the error and permanently strand the key (grace burned,
  412 forever). Fix: read the STH BEFORE consuming the grace; after a committed
  UPDATE only pure header-setting remains, so nothing can fail post-consume.
- LOW: the rescue logged `inspect(exception)`, which prints %Postgrex.Error{}'s
  raw `:query` SQL. Fix: log only the struct name + SQLSTATE (US-27.3 sanitized-
  logging convention).

Refs private advisories GHSA-w786-f588-2943, GHSA-cxrw-xg8f-mx8v, GHSA-36g5-mcrh-rcrm.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant