fix(custody): close witness-plug bootstrap race, future-position bypass, oversized-position crash by mkreyman · Pull Request #256 · mkreyman/loopctl

mkreyman · 2026-07-02T01:04:52Z

Summary

Closes three chain-of-custody defects in the STH witness plug that were merged to master via #254 before the enhanced-review gate's fixes landed (the gate reproduced all three empirically). Private advisories: GHSA-w786-f588-2943, GHSA-cxrw-xg8f-mx8v, GHSA-36g5-mcrh-rcrm.

Fixes

High — bootstrap grace TOCTOU race. One-time bootstrap consumption was read-then-write, so two concurrent first requests for the same key both got the grace. Now atomic: a single conditional UPDATE ... WHERE sth_bootstrap_consumed_at IS NULL, granting only when the row count is 1, else 412 witness_bootstrap_already_consumed.
High — divergence check skippable via a future position. Claiming a chain position beyond the tenant's latest sealed STH made get_sth_at_position return nil, and the nil branch passed through with no comparison at all. A future/unverifiable position now returns the same 409 resync response (with the current-STH header), never a silent pass. The server is the sole STH source, so an honest agent can never legitimately hold a position beyond the latest — no false positives.
Medium — oversized position crash. A position beyond bigint range reached the query and raised a Postgrex encode error (500). Now range-checked before the query → 412 malformed.

Tests

14 plug tests including the atomic single-winner bootstrap, future-position 409 resync (tenant not halted), and oversized-position 412. Full suite green (3115 tests, 0 failures); credo/dialyzer clean.

…ss, oversized-position crash Follow-up to #254, which merged the round-1 witness hardening before the enhanced-review gate's fixes landed — so master still had three defects the gate reproduced: - Bootstrap grace was TOCTOU-racy (read-then-write): concurrent first requests for the same key both got the "one-time" grace. Now atomic and DB-authoritative via a conditional UPDATE guarded by `sth_bootstrap_consumed_at IS NULL`, branching on the row count. - The divergence check could be skipped by claiming a chain position beyond the tenant's latest sealed STH (the nil branch passed through with no comparison). A future/unverifiable position now returns the same 409 resync response. - An oversized position (> bigint range) reached the query and raised a Postgrex encode error (500); position is now range-checked before the query and rejected as 412 malformed. Refs private advisories GHSA-w786-f588-2943, GHSA-cxrw-xg8f-mx8v, GHSA-36g5-mcrh-rcrm.

…strap strand, SQL-in-log Enhanced-review (Sonnet-5/xhigh) of the witness fix-forward found three issues: - HIGH: the bigint range-check was applied in the witness plug but NOT in the public, unauthenticated AuditSthController (`GET /audit/sth/:id?at=...`). An oversized `at` reached Postgrex and raised DBConnection.EncodeError — a class the DB-error sanitizer does not handle — so it re-raised as an uncaught 500, repeatable with no auth or rate limit. Fix: make AuditChain.get_sth_at_position defensively return nil for a non-integer / out-of-bigint-range position (single source of truth via AuditChain.max_chain_position/0, which the plug now references), protecting every caller. Adds the previously-missing controller test suite. - HIGH: the atomic bootstrap consume's rescue wrapped BOTH the grace-consuming UPDATE and the follow-on STH read, so a transient read failure after the UPDATE committed would swallow the error and permanently strand the key (grace burned, 412 forever). Fix: read the STH BEFORE consuming the grace; after a committed UPDATE only pure header-setting remains, so nothing can fail post-consume. - LOW: the rescue logged `inspect(exception)`, which prints %Postgrex.Error{}'s raw `:query` SQL. Fix: log only the struct name + SQLSTATE (US-27.3 sanitized- logging convention). Refs private advisories GHSA-w786-f588-2943, GHSA-cxrw-xg8f-mx8v, GHSA-36g5-mcrh-rcrm.

…ssion test (#257) Round-3 enhanced-review follow-ups that landed after #256 was merged, so they are re-applied here on top of master: - The sanitized bootstrap-consume log used postgres.code (the human atom, e.g. :unique_violation) mislabelled as "sqlstate". Reuse LoopctlWeb.DBError.sqlstate/1 (which reads the real 5-char postgres.pg_code) so the line correlates with the rest of the app's DB-error logs. Still leaks no query text. - Add a plug regression test: a DB error during the atomic consume (forced via a malformed api_key id) fails closed with 412 and a SQL-free log, pinning the fail-closed rescue + sanitized logging against future regressions.

mkreyman added 2 commits July 1, 2026 19:03

mkreyman merged commit 4bfba3d into master Jul 2, 2026
9 checks passed

mkreyman deleted the fix/witness-plug-hardening branch July 2, 2026 01:30

mkreyman mentioned this pull request Jul 2, 2026

fix(custody): correct SQLSTATE log label + bootstrap-rescue regression test #257

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(custody): close witness-plug bootstrap race, future-position bypass, oversized-position crash#256

fix(custody): close witness-plug bootstrap race, future-position bypass, oversized-position crash#256
mkreyman merged 2 commits into
masterfrom
fix/witness-plug-hardening

mkreyman commented Jul 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mkreyman commented Jul 2, 2026

Summary

Fixes

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant