Skip to content

fix(custody): close witness-plug bootstrap race, future-position bypass, oversized-position crash#256

Merged
mkreyman merged 2 commits into
masterfrom
fix/witness-plug-hardening
Jul 2, 2026
Merged

fix(custody): close witness-plug bootstrap race, future-position bypass, oversized-position crash#256
mkreyman merged 2 commits into
masterfrom
fix/witness-plug-hardening

Conversation

@mkreyman

@mkreyman mkreyman commented Jul 2, 2026

Copy link
Copy Markdown
Owner

Summary

Closes three chain-of-custody defects in the STH witness plug that were merged to master via #254 before the enhanced-review gate's fixes landed (the gate reproduced all three empirically). Private advisories: GHSA-w786-f588-2943, GHSA-cxrw-xg8f-mx8v, GHSA-36g5-mcrh-rcrm.

Fixes

  • High — bootstrap grace TOCTOU race. One-time bootstrap consumption was read-then-write, so two concurrent first requests for the same key both got the grace. Now atomic: a single conditional UPDATE ... WHERE sth_bootstrap_consumed_at IS NULL, granting only when the row count is 1, else 412 witness_bootstrap_already_consumed.
  • High — divergence check skippable via a future position. Claiming a chain position beyond the tenant's latest sealed STH made get_sth_at_position return nil, and the nil branch passed through with no comparison at all. A future/unverifiable position now returns the same 409 resync response (with the current-STH header), never a silent pass. The server is the sole STH source, so an honest agent can never legitimately hold a position beyond the latest — no false positives.
  • Medium — oversized position crash. A position beyond bigint range reached the query and raised a Postgrex encode error (500). Now range-checked before the query → 412 malformed.

Tests

14 plug tests including the atomic single-winner bootstrap, future-position 409 resync (tenant not halted), and oversized-position 412. Full suite green (3115 tests, 0 failures); credo/dialyzer clean.

mkreyman added 2 commits July 1, 2026 19:03
…ss, oversized-position crash

Follow-up to #254, which merged the round-1 witness hardening before the
enhanced-review gate's fixes landed — so master still had three defects the gate
reproduced:

- Bootstrap grace was TOCTOU-racy (read-then-write): concurrent first requests
  for the same key both got the "one-time" grace. Now atomic and
  DB-authoritative via a conditional UPDATE guarded by
  `sth_bootstrap_consumed_at IS NULL`, branching on the row count.
- The divergence check could be skipped by claiming a chain position beyond the
  tenant's latest sealed STH (the nil branch passed through with no comparison).
  A future/unverifiable position now returns the same 409 resync response.
- An oversized position (> bigint range) reached the query and raised a Postgrex
  encode error (500); position is now range-checked before the query and
  rejected as 412 malformed.

Refs private advisories GHSA-w786-f588-2943, GHSA-cxrw-xg8f-mx8v, GHSA-36g5-mcrh-rcrm.
…strap strand, SQL-in-log

Enhanced-review (Sonnet-5/xhigh) of the witness fix-forward found three issues:

- HIGH: the bigint range-check was applied in the witness plug but NOT in the
  public, unauthenticated AuditSthController (`GET /audit/sth/:id?at=...`). An
  oversized `at` reached Postgrex and raised DBConnection.EncodeError — a class
  the DB-error sanitizer does not handle — so it re-raised as an uncaught 500,
  repeatable with no auth or rate limit. Fix: make AuditChain.get_sth_at_position
  defensively return nil for a non-integer / out-of-bigint-range position (single
  source of truth via AuditChain.max_chain_position/0, which the plug now
  references), protecting every caller. Adds the previously-missing controller
  test suite.
- HIGH: the atomic bootstrap consume's rescue wrapped BOTH the grace-consuming
  UPDATE and the follow-on STH read, so a transient read failure after the UPDATE
  committed would swallow the error and permanently strand the key (grace burned,
  412 forever). Fix: read the STH BEFORE consuming the grace; after a committed
  UPDATE only pure header-setting remains, so nothing can fail post-consume.
- LOW: the rescue logged `inspect(exception)`, which prints %Postgrex.Error{}'s
  raw `:query` SQL. Fix: log only the struct name + SQLSTATE (US-27.3 sanitized-
  logging convention).

Refs private advisories GHSA-w786-f588-2943, GHSA-cxrw-xg8f-mx8v, GHSA-36g5-mcrh-rcrm.
@mkreyman mkreyman merged commit 4bfba3d into master Jul 2, 2026
9 checks passed
@mkreyman mkreyman deleted the fix/witness-plug-hardening branch July 2, 2026 01:30
mkreyman added a commit that referenced this pull request Jul 2, 2026
…ssion test (#257)

Round-3 enhanced-review follow-ups that landed after #256 was merged, so they
are re-applied here on top of master:

- The sanitized bootstrap-consume log used postgres.code (the human atom, e.g.
  :unique_violation) mislabelled as "sqlstate". Reuse LoopctlWeb.DBError.sqlstate/1
  (which reads the real 5-char postgres.pg_code) so the line correlates with the
  rest of the app's DB-error logs. Still leaks no query text.
- Add a plug regression test: a DB error during the atomic consume (forced via a
  malformed api_key id) fails closed with 412 and a SQL-free log, pinning the
  fail-closed rescue + sanitized logging against future regressions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant