feat(consensus): mock_chain_validation replay build + memIAVL state-sync restore fixes by bdchatham · Pull Request #3663 · sei-protocol/sei-chain

bdchatham · 2026-06-29T19:54:37Z

Summary

Makes the mock_chain_validation replay build a first-class, reviewed capability, plus two prod-reachable memIAVL state-sync restore fixes it depends on. The build replays real chain history (e.g. an AppHash-breaking memIAVL→flatKV migration) as a non-validating node, checked by an out-of-band logical-digest comparator instead of consensus.

Three self-contained commits: feat(consensus) (the policy), fix(memiavl) (the restore fixes), ci(ecr) (the image build).

Production behavior is unchanged

ConsensusPolicy is a build-tag-selected type whose swallowing variant compiles only under the mock tags. The default build returns every error unchanged, so every consensus halt is byte-for-byte preserved — confirmed at link level and by the test suites.

What the replay build relaxes

Under mock_chain_validation the policy swallows the consensus checks a replayed/migrated chain can't reproduce (app-state and validator-set/commit hashes). Peer-supplied block content still HALTS — the transaction merkle root and evidence integrity — so a malicious peer can't poison the audit input. A build-failing test guard enforces that boundary.

Two changes that warrant your explicit sign-off:

ErrLastCommitVerify is swallowed, so buildLastCommitInfo builds best-effort commit info (approximate per-index pairing). LastCommitInfo feeds staking rewards/downtime, never EVM state.
mock_block_validation widened to also swallow ErrUpgradeBeforeTrigger and the state-sync appHash check.

memIAVL restore (prod-reachable, not tag-gated)

Re-bootstrapping from an S3 snapshot exposed two restore bugs, both fixed: an interrupted/re-offered restore crashlooped (restore is now idempotent — stale-tmp cleanup, adopt-an-existing-snapshot, reject non-directories), and Close() leaked the import lock on error (now released on every path).

Review & test

Blinded multi-specialist review with an assigned dissenter, iterated to unanimous RESOLVED; all automated-review threads (Cursor, Codex, seidroid, ai-review) resolved. go build + go test pass under all three configs — default, mock_chain_validation, mock_block_validation — verified by a full sweep.

cursor · 2026-06-29T19:55:34Z

PR Summary

High Risk
Touches consensus validation, upgrade halting, and state-sync verification (security-sensitive), plus production memIAVL restore idempotency; mock builds deliberately relax checks that must not ship in default binaries.

Overview
Introduces a mock_chain_validation replay build where ConsensusPolicy can swallow selected validation failures (AppHash, commit/validator drift, ErrUpgradeBeforeTrigger, etc.) while peer-supplied data/evidence checks still halt in that tag. Production/default builds route the same sentinels through HandleError unchanged; call sites updated include upgrade BeginBlocker, blocksync commit verify, buildLastCommitInfo (best-effort votes when commit size ≠ valset under mock), and state-sync verifyApp.

mock_chain_validation now uses an explicit swallow allowlist (including ErrLastCommitVerify, previously excluded). mock_block_validation also swallows ErrUpgradeBeforeTrigger. Tests are split by build tags so halt vs swallow behavior is asserted per binary.

memIAVL state-sync/snapshot paths are made idempotent on re-offer: stale import temps and current-tmp are cleared, existing snapshot dirs can be adopted on rename conflicts, and import Close always releases the file lock.

ECR/nightly images split mock_chain_validation from mock_balances + mock_chain_validation so faithful history replay vs chaos/benchmark replays get distinct tags.

^{Reviewed by Cursor Bugbot for commit 58b2263. Bugbot is set up for automated code reviews on this repo. Configure here.}

github-actions · 2026-06-29T19:56:56Z

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

Build	Format	Lint	Breaking	Updated (UTC)
`✅ passed`	`✅ passed`	`✅ passed`	`✅ passed`	Jun 30, 2026, 2:42 PM

codecov · 2026-06-29T19:58:58Z

Codecov Report

❌ Patch coverage is 42.10526% with 33 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.50%. Comparing base (b8c3929) to head (58b2263).
⚠️ Report is 5 commits behind head on main.

Files with missing lines	Patch %	Lines
sei-db/state_db/sc/memiavl/import.go	26.08%	13 Missing and 4 partials ⚠️
sei-db/state_db/sc/memiavl/db.go	0.00%	10 Missing and 1 partial ⚠️
sei-tendermint/internal/state/execution.go	50.00%	5 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3663      +/-   ##
==========================================
- Coverage   59.17%   58.50%   -0.67%     
==========================================
  Files        2262     2196      -66     
  Lines      187055   180318    -6737     
==========================================
- Hits       110690   105498    -5192     
+ Misses      66427    65452     -975     
+ Partials     9938     9368     -570

Flag	Coverage Δ
sei-chain-pr	`75.24% <78.26%> (?)`
sei-db	`70.41% <ø> (ø)`
sei-db-state-db	`?`
sei-db-state-db-pr	`75.22% <17.64%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
sei-cosmos/x/upgrade/abci.go	`80.88% <100.00%> (+0.28%)`	⬆️
sei-tendermint/internal/blocksync/reactor.go	`68.01% <100.00%> (+0.17%)`	⬆️
sei-tendermint/internal/statesync/syncer.go	`68.40% <100.00%> (+0.37%)`	⬆️
sei-tendermint/types/consensus_policy.go	`100.00% <100.00%> (ø)`
sei-tendermint/internal/state/execution.go	`82.28% <50.00%> (+0.37%)`	⬆️
sei-db/state_db/sc/memiavl/db.go	`65.71% <0.00%> (-0.62%)`	⬇️
sei-db/state_db/sc/memiavl/import.go	`69.10% <26.08%> (-4.76%)`	⬇️

... and 99 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

seidroid

A well-isolated, build-tag-gated ConsensusPolicy that leaves the default/production build byte-for-byte unchanged, plus two prod-reachable memIAVL state-sync restore fixes that look correct. No production blockers; a few shadow-build-scoped notes and a missing unit test are worth addressing.

Findings: 0 blocking | 5 non-blocking | 2 posted inline

Blockers

None at the file/PR level.

Non-blocking

Missing unit test for buildLastCommitInfo's newly-reachable best-effort path (commit/valset size mismatch under mock_chain_validation). The PR acknowledges this as a reviewer-sanctioned deferred follow-up; it's shadow-tag-only and doesn't touch the EVM oracle, but the bounds-checked loop is exactly the kind of off-by-one that a small table test would lock down.
Cursor's second-opinion review file (cursor-review.md) was empty — that pass produced no output. Codex's review was present and its two findings are incorporated below.
mock_block_validation relaxation surface is widened by this PR in two places. ErrUpgradeBeforeTrigger is explicitly and intentionally added (documented). However the syncer.go ErrAppHash routing also silently extends mock_block_validation to swallow state-sync snapshot appHash mismatches — an undocumented side effect (see inline comment).
2 suggestion(s)/nit(s) flagged inline on specific lines.

claude

I didn't find any bugs, but this PR introduces a new build-tag-selected ConsensusPolicy framework, swallows ErrLastCommitVerify under mock_chain_validation, gates buildLastCommitInfo's panic invariant on TolerateLastCommitMismatch(), and routes x/upgrade + state-sync + blocksync through the policy — even with default-build isolation, the consensus-safety surface and the two flagged semantic changes warrant a human review.

Extended reasoning...

Overview

The PR touches 16 files across consensus-critical and state-sync subsystems: a new build-tag-selected ConsensusPolicy framework in sei-tendermint/types (default / mock_block_validation / mock_chain_validation variants), routing call sites in x/upgrade BeginBlocker, blocksync reactor's last-commit verify, statesync syncer's appHash check, and execution.go's buildLastCommitInfo; plus prod-reachable memIAVL state-sync restore fixes (idempotent restore, defer-release of import flock). New sentinel ErrUpgradeBeforeTrigger added.

Security risks

The PR is consensus-relaxing by design under shadow tags. The default-build isolation is the load-bearing safety guarantee: ConsensusPolicy.HandleError returns every error unchanged, TolerateLastCommitMismatch() is false, and the swallow code is not linked into a default binary. Two production-reachable changes still need scrutiny: (1) buildLastCommitInfo no longer unconditionally panics on commit/valset size mismatch — the panic is now gated by a policy call (still always-panics in default, but the invariant moved from a hard guard to a policy contract); (2) the memIAVL restore fixes affect every node doing state-sync, not just shadow builds. No injection / auth / crypto surface changes.

Level of scrutiny

High. This is consensus-critical and security-sensitive code. Even with airtight build-tag isolation, the policy framework introduces a new contract (HandleError + TolerateLastCommitMismatch) that future call sites must use correctly, and the swallow-allowlist semantics under mock_chain_validation determine what a replay-shadow can mask. The author themselves flagged two semantic changes for explicit sign-off and acknowledged a deferred unit test for the newly-reachable best-effort path in buildLastCommitInfo.

Other factors

The PR has thorough test coverage for the policy matrix across all three build tags, a per-sentinel swallow-matrix test, wrapped-error and label-mapping tests, and the memIAVL restore changes are accompanied by file-existence/ENOTEMPTY handling and defer-based lock release. The description notes an independent multi-specialist review converged to unanimous resolution. Only cursor[bot] has commented; no human review yet. Given the scope and the consensus-relaxing semantics, a human consensus/security reviewer should give explicit sign-off.

bdchatham · 2026-06-29T20:18:29Z

Thanks — addressed the non-blocking items in the latest push (force-update to 641b2577d):

Missing buildLastCommitInfo test → added TestBuildLastCommitInfo_ToleratesCommitValSetMismatch (build-tagged mock_chain_validation): a commit shorter than the validator set builds best-effort CommitInfo (sized by the valset, present sig applied, absent slots not-signed) without panicking. Locks down the bounds-checked loop.
SwallowMatrix comment/messaging nit → the test now asserts HandleError matches the allowlist exactly with an explicit semantic guard that ErrDataHash/ErrEvidenceHash/ErrPerEvidenceValidateBasic are never in it; a not-allowlisted sentinel now fails with "want HALT (not in the swallow allowlist)", aligning the message with the halt-by-default design.
mock_block_validation widening → the PR body now explicitly calls out both surfaces: ErrUpgradeBeforeTrigger and the statesync/syncer.go ErrAppHash routing (consistent with mock_block's existing AppHash relaxation, flagged for transparency).

The empty cursor-review.md was a tooling artifact on your side (no output produced), not a finding.

seidroid

A well-structured, thoroughly-tested change that makes the build-tag-gated mock_chain_validation replay-shadow policy first-class and lands two prod-reachable memIAVL state-sync restore fixes. Default/production behavior is provably unchanged (zero-sized, tag-selected ConsensusPolicy); the only substantive note is a defensive gap in the idempotent-restore path that adopts an existing snapshot-<h> path without confirming it is a directory.

Findings: 0 blocking | 6 non-blocking | 1 posted inline

Blockers

None at the file/PR level.

Non-blocking

Inconsistency between the two memIAVL fixes: RewriteSnapshot (db.go) keeps os.Rename and only adopts the existing target on fs.ErrExist/ENOTEMPTY, so renaming onto a regular file (ENOTDIR) still falls through to the error/cleanup path — the safe behavior. Close (import.go) instead stat-checks first and unconditionally adopts any existing path. Consider aligning the two so a non-directory target is treated as an error in both, or at least document why the divergence is intentional.
REVIEW_GUIDELINES.md (pulled from the base branch) is empty/blank, so no repo-specific review standards could be applied in this pass.
The Cursor second-opinion file (cursor-review.md) is empty — that review pass produced no output. The OpenAI Codex pass did produce output (the import.go IsDir concern), which is incorporated here.
Codex noted it could not run tests because the Go 1.25.6 toolchain download was network-blocked in its environment; tests were not independently re-run here either. The PR claims go build/go test pass for default and both mock tags, plus memIAVL package tests — worth confirming CI shows green for all three build configurations.
Minor: the deferred buildLastCommitInfo best-effort test the PR description lists as a follow-up appears to already be included (execution_lastcommit_shadow_test.go); the description's 'deferred follow-up' note is now stale and could be updated.
1 suggestion(s)/nit(s) flagged inline on specific lines.

seidroid

A well-isolated change that makes the mock_chain_validation replay-shadow build first-class and lands two prod-reachable memIAVL state-sync restore fixes. Default-build behavior is preserved at every routed call site; the only notable finding is a shadow-build-only signature-attribution concern in buildLastCommitInfo that does not affect production or the audit oracle.

Findings: 0 blocking | 5 non-blocking | 1 posted inline

Blockers

None at the file/PR level.

Non-blocking

REVIEW_GUIDELINES.md was empty/missing, so no repo-specific review standards could be applied.
The Cursor second-opinion review (cursor-review.md) was empty — that pass produced no output.
The PR explicitly defers a unit test for buildLastCommitInfo's best-effort path beyond the size-mismatch case (e.g. the by-address attribution behavior). Reviewers classified this low-risk/cuttable; reasonable to track as follow-up.
Default-build isolation, the load-bearing safety guarantee, rests on link-level exclusion of the swallow path and the build-tag test suites. Consider a CI check asserting the default binary does not link the swallowing ConsensusPolicy variants, to keep this guarantee from silently regressing.
1 suggestion(s)/nit(s) flagged inline on specific lines.

seidroid

A carefully-scoped change that routes a fixed allowlist of consensus-validation failures through a build-tag-selected ConsensusPolicy (swallowed only under mock_chain_validation/mock_block_validation) plus two prod-reachable memIAVL state-sync restore fixes. The load-bearing guarantee — byte-for-byte unchanged production behavior — holds: every routed default-build call site retains its original halt. No blockers; only minor non-blocking notes.

Findings: 0 blocking | 4 non-blocking | 1 posted inline

Blockers

None at the file/PR level.

Non-blocking

cursor-review.md is empty (Cursor produced no output) and codex-review.md reports no material findings; REVIEW_GUIDELINES.md is empty, so no repo-specific standards were applied. Noting per the review process.
Behavioral change to the mock_block_validation build: ErrUpgradeBeforeTrigger is now swallowed there too (consensus_policy_mock_block_validation.go). This is intentional per the PR description but widens that tag's relaxation set — worth explicit maintainer sign-off.
Good defensive testing: the mock_chain_validation swallow-matrix test asserts ErrDataHash/ErrEvidenceHash/ErrPerEvidenceValidateBasic are never in the allowlist, and the allowlist is halt-by-default for future sentinels.
1 suggestion(s)/nit(s) flagged inline on specific lines.

seidroid

A well-scoped, well-tested change that promotes the mock_chain_validation replay build to a reviewed capability and lands two prod-reachable memIAVL state-sync restore fixes; production default-build behavior is preserved (all routed call sites still halt) and the swallow allowlist is halt-by-default with a build-failing guard protecting peer-content integrity checks. No blocking issues found.

Findings: 0 blocking | 3 non-blocking | 0 posted inline

Blockers

None at the file/PR level.

Non-blocking

Both second-opinion passes produced no findings to merge: the Codex review states "No material findings" (and notes it could not run tests due to a Go 1.24 vs required 1.25.6 mismatch with no network access), and the Cursor review file is empty. REVIEW_GUIDELINES.md is also empty, so no repo-specific standards were applied.
memIAVL Close()/RewriteSnapshot adopt an existing snapshot- directory as complete without validating its contents. This is sound given the import flock serialization + atomic-rename invariant (the dir only exists if a prior Close fully renamed it), but it relies entirely on that invariant holding; worth a brief note for future maintainers who might reach these paths outside the flock-protected restore flow.
Under mock_chain_validation, ErrTooMuchEvidence is in the swallow allowlist while ErrEvidenceHash/ErrPerEvidenceValidateBasic correctly still halt. Swallowing the evidence-quantity bound is acceptable for a non-validating replay build whose oracle is the out-of-band digest comparator, but it is the one allowlist entry not strictly tied to migration/validator-set drift — confirm this is intentional.

cursor

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 4952ead. Configure here.}

seidroid

A carefully scoped, well-tested change that makes the mock_chain_validation replay build first-class and lands two prod-reachable memIAVL state-sync restore fixes. Production behavior is byte-for-byte unchanged (the swallowing ConsensusPolicy compiles only under mock tags), and the memIAVL idempotency/lock-release fixes are correct. No blocking issues found.

Findings: 0 blocking | 3 non-blocking | 0 posted inline

Blockers

None at the file/PR level.

Non-blocking

Second-opinion passes produced no usable output: Cursor's review file (cursor-review.md) is empty, and Codex (codex-review.md) reported no findings but explicitly could not run the targeted Go tests because the sandbox couldn't download Go 1.25.6. This review therefore rests on the PR's own CI run and manual analysis; consider confirming the green CI run (default + both mock tags) before merge.
Behavioral widening worth an explicit sign-off (already called out in the PR description): routing the state-sync appHash check in syncer.go through HandleError means the mock_block_validation tag now also swallows ErrAppHash mismatches during state-sync verifyApp, not just block validation. This is consistent with that tag's existing ErrAppHash relaxation and affects only a mock build, but it is a real expansion of that tag's scope.
On the swallow path in statesync/syncer.go, a swallowed appHash mismatch continues silently with no log line (logging only happens on the halting branch). This is intentional per the design comment, but a single debug/info breadcrumb on the swallow path would aid post-mortem of a replay that diverged.

seidroid

The mock_chain_validation policy refactor and memIAVL restore fixes are well-structured and keep production behavior unchanged, but routing the state-sync appHash check through the policy leaves the untagged TestSyncer_verifyApp "invalid hash" case failing under both mock tags, and a safety-critical package doc is now stale.

Findings: 1 blocking | 3 non-blocking | 2 posted inline

Blockers

None at the file/PR level.
1 blocking issue(s) flagged inline on specific lines.

Non-blocking

Cursor's second-opinion review file (cursor-review.md) was empty — that pass produced no output, so its perspective is absent from this synthesis.
PR description claims go test passes for default, mock_chain_validation, and mock_block_validation, but the statesync package's TestSyncer_verifyApp would fail under both mock tags (see inline). Either the tagged statesync tests weren't actually run, or the claim should be scoped to compilation only — worth clarifying.
1 suggestion(s)/nit(s) flagged inline on specific lines.

seidroid

A well-engineered, thoroughly-tested change that makes the mock_chain_validation replay build a first-class capability and lands two prod-reachable memIAVL state-sync restore fixes; default-build (production) behavior is preserved byte-for-byte at every routed call site, and no correctness or security issues were found. Verdict is neutral only because the two second-opinion passes added no usable signal (Cursor empty; Codex could not run tests) and a couple of minor non-blocking suggestions remain.

Findings: 0 blocking | 5 non-blocking | 1 posted inline

Blockers

None at the file/PR level.

Non-blocking

Cursor's second-opinion review file (cursor-review.md) is empty — that pass produced no output, so only this review plus Codex's (which also found nothing) cover the change.
Codex's review reported "No material findings" but explicitly could NOT run go test: its sandbox has Go 1.24.13 while the module requires 1.25.6 and the toolchain download was blocked. So Codex's pass did not exercise the test suites; rely on CI for the default + both-tag runs.
Two deliberate semantic relaxations are called out in the PR body and warrant explicit maintainer sign-off (not code defects): (a) ErrLastCommitVerify is now swallowed under mock_chain_validation, making buildLastCommitInfo build positionally-approximate LastCommitInfo; (b) mock_block_validation now also relaxes ErrUpgradeBeforeTrigger and the state-sync ErrAppHash routing.
Minor: the DefaultConsensusPolicy() constructor name is slightly misleading — under mock build tags it returns the swallowing policy, not a "default/production" one. Pre-existing naming, but worth a doc note since this PR adds more call sites.
1 suggestion(s)/nit(s) flagged inline on specific lines.

Superseded: latest AI review found no blocking issues.

seidroid

The PR cleanly gates all consensus-relaxing behavior behind build tags — the default (production) build's HandleError provably returns every error unchanged, so prod halting semantics are byte-for-byte preserved, and tests are well-scoped per build. The one substantive finding is a narrow but prod-reachable idempotency gap in the memIAVL restore path (stale current-tmp not cleared), confirming Codex's P1.

Findings: 0 blocking | 5 non-blocking | 1 posted inline

Blockers

None at the file/PR level.

Non-blocking

Restore idempotency is still incomplete for the symlink swap step (Codex P1, valid): updateCurrentSymlink (db.go:1247) creates current-tmp via os.Symlink and never clears a stale one. If a prior restore crashed after creating current-tmp but before the atomic rename to current, a re-offer now adopts the existing snapshot-<h> dir and then fails in updateCurrentSymlink with EEXIST — re-introducing a crashloop the PR aims to eliminate. Window is narrow, but the path is prod-reachable (not tag-gated). Suggest os.Remove(currentTmpPath(dir)) before the symlink, or clearing it alongside the tmp-dir cleanup on entry. Affects both import.go Close() and db.go RewriteSnapshot adopt arms.
Cursor second-opinion review (cursor-review.md) produced no output; only Codex's single P1 was available to merge.
mock_block_validation's swallow set is widened to include ErrUpgradeBeforeTrigger, changing that existing tag's behavior. Intentional and documented, and it does not affect the default build, but worth an explicit reviewer sign-off since it alters an established build's semantics.
Confirmed no production behavior change: default ConsensusPolicy.HandleError returns err unchanged, and buildLastCommitInfo is behavior-identical when commit size matches the validator set (the normal path), so the rewrite only diverges under mock builds.
1 suggestion(s)/nit(s) flagged inline on specific lines.

…ay builds Adds a build-tag-gated ConsensusPolicy ("mock_chain_validation") so a non-validating node can replay real chain history despite the consensus divergence inherent to replaying an AppHash-breaking storage migration (memIAVL->flatKV) against a validator set it cannot reproduce bit-for-bit. The default build is unchanged: DefaultConsensusPolicy().HandleError preserves every panic/halt, verified byte-identical. Under the mock_chain_validation tag the policy swallows an explicit ALLOWLIST of the sentinels that drift for migration / validator-set / commit reasons. Peer- supplied block-content integrity still HALTS -- ErrDataHash (the tx merkle root feeding the EVM execution this build compares), ErrEvidenceHash, ErrPerEvidenceValidateBasic -- so a malformed/lying peer cannot poison the audit input. Allowlist => a sentinel added later halts by default. Previously-unguarded checks route through the policy (inert in the default build), all via the single HandleError(err) verb: - blocksync/reactor.go: VerifyCommitLight -> policy.HandleError - state/execution.go: buildLastCommitInfo routes a commit/validator-set size mismatch through the policy (default panics; mock_chain_validation builds best-effort commit info, which only feeds staking rewards/downtime) - statesync/syncer.go: state-sync snapshot appHash check -> policy (also extends mock_block_validation, which already relaxes ErrAppHash) - x/upgrade/abci.go: binary-updated-before-trigger panic -> policy Tests: SwallowMatrix pins the allowlist + the never-swallow content-integrity guard; TestBlockValidateBasic probes the active policy; a build-tagged test pins buildLastCommitInfo's best-effort path (commit/valset size mismatch, no panic).

…every path Re-bootstrapping a node from an S3 state-sync snapshot exposed two issues: restore re-entry after an interrupted import crashlooped (now stat-first idempotent -- clears the stale tmp dir on entry, adopts an already-completed snapshot-<h>, and rejects a non-directory at that path), and Close released the import flock only on the success path, so an error leaked it and a same-process re-offer failed with ErrLocked (Close now releases via defer on every path). The import flock serializes restore against any rewrite, so the existing-dir arms are defensive idempotency, not race handling.

The mock_chain_validation image backs the migrate_evm replay node, which must execute real chain history faithfully; mock_balances stubs balances (and panics on pacific-1), corrupting execution. The standalone mock_balances image (mock-<tag>) remains for benchmark chains that submit synthetic transactions.

seidroid

A carefully-scoped PR that makes the build-tag-gated mock_chain_validation replay policy first-class and lands two prod-reachable memIAVL state-sync restore fixes. Production (default-build) behavior is unchanged at the link level and the changes are well-tested across all three build configs; no blocking issues found.

Findings: 0 blocking | 4 non-blocking | 0 posted inline

Blockers

None at the file/PR level.

Non-blocking

Second-opinion passes produced no actionable output: the Codex review reported "No material findings" (and noted it could not run the Go tests due to a Go 1.24 vs required 1.25.6 toolchain mismatch in its sandbox), and the Cursor review file was empty. Neither surfaced findings to merge.
buildLastCommitInfo's best-effort path under mock_chain_validation (swallowing ErrLastCommitVerify) produces positionally-approximate Signatures↔Validators pairing. This is documented and only affects the non-production replay build (LastCommitInfo feeds staking rewards/downtime, not EVM state), but it is a genuine semantic relaxation worth explicit reviewer sign-off as the PR itself flags.
The mock_block_validation swallow set is widened (ErrUpgradeBeforeTrigger + the state-sync ErrAppHash routing). Only affects mock builds, but it changes that tag's long-standing behavior — confirm this widening is intended for all consumers of mock_block_validation images.
In RewriteSnapshot/Close adoption arms, an already-present snapshot- directory is adopted without validating its contents; this is safe given os.Rename atomicity and import-flock serialization (a snapshot- dir only exists from a completed atomic rename), but relies on those invariants holding.

seidroid

A well-tested, build-tag-gated change that makes the mock_chain_validation replay build first-class and fixes two prod-reachable memIAVL restore bugs; default-build (production) consensus/halt semantics are preserved across every touched call site, with the relaxations compiling only under the mock tags. No blockers; a few non-blocking consistency notes, chiefly an ECR image semantics divergence flagged by Codex.

Findings: 0 blocking | 6 non-blocking | 2 posted inline

Blockers

None at the file/PR level.

Non-blocking

RewriteSnapshot idempotency inconsistency (Codex #1): the pre-existing early-return at sei-db/state_db/sc/memiavl/db.go:772 returns nil when snapshot- already exists WITHOUT calling updateCurrentSymlink, while the new adopt-existing-snapshot arm (db.go:830) and import.go's Close() both DO update the symlink. If a prior RewriteSnapshot crashed after os.Rename but before updateCurrentSymlink, a retry hits the early-return and leaves current pointing at the older snapshot. Impact is bounded (restart reloads current + WAL replay, so it is not a state-correctness bug), and the early-return predates this PR, but given the new idempotency work the two paths should converge — have the early-return also call updateCurrentSymlink for consistency.
Cursor second-opinion review (cursor-review.md) is empty — that pass produced no output.
REVIEW_GUIDELINES.md is empty/blank, so no repo-specific review standards were available to apply.
Cross-module dependency: sei-cosmos/x/upgrade/abci.go now imports sei-tendermint/types (ErrUpgradeBeforeTrigger / DefaultConsensusPolicy). Behavior is preserved (default policy still panics), but it adds a new sei-cosmos→sei-tendermint coupling for a single sentinel — worth a maintainer nod that this layering is acceptable.
2 suggestion(s)/nit(s) flagged inline on specific lines.

…nchmark variants mock_chain_validation-nightly now builds with GO_BUILD_TAGS=mock_chain_validation only (faithful real-history replay; real txs execute against real balances), matching the on-demand mock_chain_validation-<ref> image. A new mock_chain_validation-mock_balances-nightly image carries mock_balances + mock_chain_validation for benchmark/chaos replays that run on a fresh chain and need pre-funded accounts. chaos-lab forked-state replays must repoint from mock_chain_validation-nightly to mock_chain_validation-mock_balances-nightly.

seidroid

A carefully build-tag-gated change that makes the mock_chain_validation replay build a first-class capability plus two prod-reachable memIAVL restore fixes; production halting semantics are preserved (DefaultConsensusPolicy returns errors unchanged) and the changes are well-tested. No blocking issues found.

Findings: 0 blocking | 5 non-blocking | 0 posted inline

Blockers

None at the file/PR level.

Non-blocking

Second-opinion reviews produced no actionable findings: codex-review.md reports no material findings but notes it could NOT run targeted tests (sandbox had Go 1.24.13 and could not fetch the required Go 1.25.6 toolchain), so its pass did not independently verify the test suite; cursor-review.md is empty (no output).
REVIEW_GUIDELINES.md on the base branch is empty, so no repo-specific review standards were available to apply.
The mock_chain_validation swallow set is now an explicit allowlist rather than 'ValidationErrors() minus exclusions'. Correctness depends on TestConsensusPolicy_MockChainValidation_SwallowMatrix and TestValidationErrors_Count staying in lockstep with ValidationErrors(); a new sentinel that is silently safe-by-default (halts) is the intended behavior, which the matrix test enforces — worth keeping that guard prominent so future sentinel additions don't accidentally drift.
In the mock-only swallow path of buildLastCommitInfo, the per-index Signatures/Validators pairing is positional and only approximate when commitSize != valSetLen (extra signatures dropped, missing validators marked not-signed). This is documented and only affects LastCommitInfo (staking rewards/downtime) inside the non-validating audit build, never EVM state or any production path — flagging only for reviewer awareness.
The replay build relaxes ErrLastCommitVerify and the state-sync appHash check; these warrant the explicit sign-off the PR author requested from consensus owners, since they widen what a non-validating replay node will accept — acceptable given the build is non-production and checked out-of-band by the logical-digest comparator.

* main: feat(seid): ConfigManager selection seam (PLT-775 PR1) (#3671) fix(evmrpc): limit listener max open connections, configurable via max_open_connections (PLT-704) (#3637) LittDB: Keymap threading improvements (#3645) integrate hashlogger (#3647) fix(metrics): Prometheus metrics output (#3640) [codex] Harden multiversion iterator validation (#3656) feat(consensus): mock_chain_validation replay build + memIAVL state-sync restore fixes (#3663) chore: replace OLD red SeiLogo banner in README with new 2026 Sei lockup (#3670) Require absolute path for evmone lib (#3668) fix(evmrpc): apply getLogs maxLog cap during merge instead of after (PLT-687) (#3666) feat(evmrpc): pre-decode request size admission control (PLT-295) (#3648) Make autobahn block production check wait for progress (#3667) fix(sei-tendermint): prevent readRoutine goroutine leak on /websocket when writeChan is full (PLT-707) (#3664) Per-block littidx flush + single shard (gated on #3645) (#3660) fix(evmrpc): bound debug_traceStateAccess memory and add trace admission control (PLT-360) (#3653) [codex] bump go-ethereum to v1.15.7-sei-17 (#3657) Upodate checkout GHA step across all workflows (#3659) Add GoReleaser release pipeline for static seid binaries (#3425) Parallelize littidx eth_getLogs across blocks (#3652)

bdchatham added the non-app-hash-breaking label Jun 29, 2026

seidroid Bot approved these changes Jun 29, 2026

View reviewed changes

Comment thread sei-tendermint/internal/statesync/syncer.go

Comment thread sei-tendermint/types/consensus_policy_mock_chain_validation_test.go Outdated

claude Bot reviewed Jun 29, 2026

View reviewed changes

bdchatham force-pushed the feat/shadow-replay-consensus branch from 165c9e3 to 641b257 Compare June 29, 2026 20:17

seidroid Bot approved these changes Jun 29, 2026

View reviewed changes

Comment thread sei-db/state_db/sc/memiavl/import.go Outdated

claude Bot reviewed Jun 29, 2026

View reviewed changes

Comment thread sei-tendermint/types/consensus_policy_mock_block_validation.go

Comment thread .github/workflows/ecr.yml

bdchatham force-pushed the feat/shadow-replay-consensus branch from 641b257 to 7fe93ad Compare June 29, 2026 21:03

seidroid Bot approved these changes Jun 29, 2026

View reviewed changes

Comment thread sei-tendermint/internal/state/execution.go

bdchatham commented Jun 29, 2026

View reviewed changes

Comment thread sei-db/state_db/sc/memiavl/db.go Outdated

Comment thread sei-tendermint/internal/state/execution.go Outdated

Comment thread sei-tendermint/types/consensus_policy.go Outdated

Comment thread sei-tendermint/internal/state/execution.go Outdated

bdchatham force-pushed the feat/shadow-replay-consensus branch from 7fe93ad to 74035bd Compare June 29, 2026 21:30

bdchatham changed the title ~~feat(consensus): mock_chain_validation replay-shadow build + memIAVL state-sync restore fixes~~ feat(consensus): mock_chain_validation replay build + memIAVL state-sync restore fixes Jun 29, 2026

cursor Bot reviewed Jun 29, 2026

View reviewed changes

Comment thread sei-db/state_db/sc/memiavl/db.go

seidroid Bot approved these changes Jun 29, 2026

View reviewed changes

Comment thread sei-tendermint/internal/state/execution.go Outdated

bdchatham force-pushed the feat/shadow-replay-consensus branch from 74035bd to cb5fe23 Compare June 29, 2026 21:35

seidroid Bot approved these changes Jun 29, 2026

View reviewed changes

bdchatham force-pushed the feat/shadow-replay-consensus branch from cb5fe23 to 4952ead Compare June 29, 2026 21:58

cursor Bot reviewed Jun 29, 2026

View reviewed changes

Comment thread sei-db/state_db/sc/memiavl/import.go

seidroid Bot approved these changes Jun 29, 2026

View reviewed changes

bdchatham force-pushed the feat/shadow-replay-consensus branch 2 times, most recently from 08eaf03 to c9f9b42 Compare June 29, 2026 22:10

seidroid Bot previously requested changes Jun 29, 2026

View reviewed changes

Comment thread sei-tendermint/internal/statesync/syncer.go

Comment thread sei-tendermint/types/consensus_policy.go Outdated

bdchatham commented Jun 29, 2026

View reviewed changes

Comment thread sei-cosmos/x/upgrade/abci.go Outdated

Comment thread sei-tendermint/internal/blocksync/reactor.go Outdated

claude Bot reviewed Jun 29, 2026

View reviewed changes

Comment thread sei-tendermint/internal/statesync/syncer.go

Comment thread sei-tendermint/types/consensus_policy.go Outdated

bdchatham force-pushed the feat/shadow-replay-consensus branch from c9f9b42 to 1707513 Compare June 29, 2026 23:18

seidroid Bot approved these changes Jun 29, 2026

View reviewed changes

Comment thread sei-db/state_db/sc/memiavl/db.go

bdchatham force-pushed the feat/shadow-replay-consensus branch 2 times, most recently from aad1310 to 47eae02 Compare June 29, 2026 23:40

seidroid Bot approved these changes Jun 29, 2026

View reviewed changes

Comment thread sei-db/state_db/sc/memiavl/import.go

claude Bot reviewed Jun 29, 2026

View reviewed changes

Comment thread sei-tendermint/types/consensus_policy.go Outdated

bdchatham force-pushed the feat/shadow-replay-consensus branch from 47eae02 to 17a6e19 Compare June 29, 2026 23:51

bdchatham added 3 commits June 29, 2026 16:53

bdchatham force-pushed the feat/shadow-replay-consensus branch from 17a6e19 to b88231b Compare June 29, 2026 23:53

seidroid Bot approved these changes Jun 29, 2026

View reviewed changes

bdchatham requested review from amir-deris, blindchaser, cody-littley and masih June 30, 2026 00:07

Merge branch 'main' into feat/shadow-replay-consensus

f1d638e

seidroid Bot approved these changes Jun 30, 2026

View reviewed changes

Comment thread .github/workflows/ecr.yml

Comment thread sei-tendermint/internal/state/execution.go

sei-will approved these changes Jun 30, 2026

View reviewed changes

blindchaser approved these changes Jun 30, 2026

View reviewed changes

seidroid Bot approved these changes Jun 30, 2026

View reviewed changes

bdchatham added this pull request to the merge queue Jun 30, 2026

Merged via the queue into main with commit 89fcffe Jun 30, 2026
70 checks passed

bdchatham deleted the feat/shadow-replay-consensus branch June 30, 2026 15:39

Uh oh!

Conversation

bdchatham commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Production behavior is unchanged

What the replay build relaxes

memIAVL restore (prod-reachable, not tag-gated)

Review & test

Uh oh!

cursor Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

github-actions Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

seidroid Bot left a comment

Choose a reason for hiding this comment

Blockers

Non-blocking

Uh oh!

Uh oh!

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

bdchatham commented Jun 29, 2026

Uh oh!

seidroid Bot left a comment

Choose a reason for hiding this comment

Blockers

Non-blocking

Uh oh!

Uh oh!

Uh oh!

Uh oh!

seidroid Bot left a comment

Choose a reason for hiding this comment

Blockers

Non-blocking

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

seidroid Bot left a comment

Choose a reason for hiding this comment

Blockers

Non-blocking

Uh oh!

Uh oh!

seidroid Bot left a comment

Choose a reason for hiding this comment

Blockers

Non-blocking

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

seidroid Bot left a comment

Choose a reason for hiding this comment

Blockers

Non-blocking

Uh oh!

seidroid Bot left a comment

Choose a reason for hiding this comment

Blockers

Non-blocking

bdchatham commented Jun 29, 2026 •

edited

Loading

cursor Bot commented Jun 29, 2026 •

edited

Loading

github-actions Bot commented Jun 29, 2026 •

edited

Loading

codecov Bot commented Jun 29, 2026 •

edited

Loading