Skip to content

feat(autobahn): Add epoch.Registry to maintain committee/stake (CON-358)#3632

Open
wen-coding wants to merge 104 commits into
mainfrom
wen/autobahn_epoch_registry
Open

feat(autobahn): Add epoch.Registry to maintain committee/stake (CON-358)#3632
wen-coding wants to merge 104 commits into
mainfrom
wen/autobahn_epoch_registry

Conversation

@wen-coding

@wen-coding wen-coding commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Introduces epoch.Registry which consolidates firstBlock and genesisTimestamp — genesis config that doesn't change across epochs — alongside the committee
  • Bakes firstBlock into Proposal so GlobalRange() returns absolute block numbers directly, removing it as a call-site parameter throughout the stack. The proto message is now self-contained on the wire, making it much easier to interpret during emergency debugging. Also fixes a reproposal bug where TimeoutQC.reproposal() was double-counting firstBlock
  • All consumers (consensus, avail, data) now route committee lookups through registry.CommitteeFor(roadIndex). The registry currently returns the genesis committee for all RoadIndexes — we wire the read side first so that any bug introduced when implementing dynamic epochs in follow-up PRs is immediately visible at the call sites rather than requiring a cascading refactor to find it

🤖 Generated with Claude Code

@cursor

cursor Bot commented Jun 24, 2026

Copy link
Copy Markdown

PR Summary

High Risk
Touches consensus validation, QC verification, and protobuf decoding with hard rejects for new required fields—core autobahn safety and wire compatibility.

Overview
Introduces types.Epoch (local context: committee, road range, firstBlock, genesis time) and epoch.Registry as the shared source for genesis firstBlock/timestamp and committee lookup. Committee no longer carries chain-genesis fields; data, avail, and consensus take a Registry instead of a bare committee.

Proposals and QCs now encode absolute global ranges on the message: Proposal.GlobalRange() and CommitQC.GlobalRange() drop the committee parameter; global_first and epoch_index are required on the wire (View, Proposal, AppProposal). ViewSpec must include an Epoch; NewProposal and FullProposal.Verify use epoch binding and stricter checks (including app QC epoch alignment). QC Verify paths for prepare/commit/timeout shift to *Epoch where proposal context matters.

Call sites across ledger/block tests, blocksim, and autobahn tests are updated to GlobalRange(), BuildCommitQC, and registry-based test setup. Minor .gitignore tweak for litt test binaries.

Reviewed by Cursor Bugbot for commit a455eaa. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions

github-actions Bot commented Jun 24, 2026

Copy link
Copy Markdown

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedJul 3, 2026, 1:37 AM

@github-actions

Copy link
Copy Markdown

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedJun 24, 2026, 2:30 AM

Comment thread sei-tendermint/internal/autobahn/avail/inner.go
@codecov

codecov Bot commented Jun 24, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 76.05634% with 119 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.85%. Comparing base (9f53204) to head (a455eaa).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
sei-tendermint/internal/autobahn/data/state.go 54.54% 11 Missing and 19 partials ⚠️
sei-tendermint/internal/autobahn/avail/state.go 49.05% 13 Missing and 14 partials ⚠️
...dermint/internal/autobahn/pb/autobahn.wireguard.go 31.57% 13 Missing ⚠️
sei-tendermint/internal/autobahn/epoch/registry.go 80.00% 10 Missing and 1 partial ⚠️
sei-tendermint/autobahn/types/timeout.go 47.36% 4 Missing and 6 partials ⚠️
sei-tendermint/autobahn/types/proposal.go 90.90% 3 Missing and 4 partials ⚠️
sei-tendermint/autobahn/types/commit_qc.go 54.54% 1 Missing and 4 partials ⚠️
...ei-tendermint/internal/autobahn/consensus/inner.go 66.66% 0 Missing and 5 partials ⚠️
sei-tendermint/autobahn/types/app_proposal.go 75.00% 1 Missing and 2 partials ⚠️
...ei-tendermint/internal/autobahn/consensus/state.go 78.57% 0 Missing and 3 partials ⚠️
... and 3 more
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##             main    #3632       +/-   ##
===========================================
+ Coverage   59.30%   74.85%   +15.55%     
===========================================
  Files        2273       87     -2186     
  Lines      188305     7665   -180640     
===========================================
- Hits       111675     5738   -105937     
+ Misses      66579     1472    -65107     
+ Partials    10051      455     -9596     
Flag Coverage Δ
sei-chain ?
sei-chain-pr 75.14% <76.05%> (?)
sei-db 70.41% <ø> (ø)
sei-db-state-db ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
sei-db/ledger_db/block/blocksim/block_generator.go 80.64% <100.00%> (ø)
sei-tendermint/autobahn/types/committee.go 98.48% <100.00%> (-0.09%) ⬇️
sei-tendermint/autobahn/types/epoch.go 100.00% <100.00%> (ø)
sei-tendermint/autobahn/types/opt.go 100.00% <ø> (ø)
sei-tendermint/autobahn/types/prepare_qc.go 92.85% <100.00%> (+0.85%) ⬆️
sei-tendermint/autobahn/types/testonly.go 95.83% <100.00%> (+1.38%) ⬆️
...-tendermint/internal/autobahn/avail/block_votes.go 90.90% <100.00%> (+0.43%) ⬆️
...ternal/autobahn/consensus/persist/fullcommitqcs.go 79.01% <100.00%> (+0.26%) ⬆️
...nternal/autobahn/consensus/persist/globalblocks.go 77.96% <100.00%> (+0.18%) ⬆️
...int/internal/autobahn/consensus/persisted_inner.go 97.59% <100.00%> (-0.09%) ⬇️
... and 17 more

... and 2194 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread sei-tendermint/internal/autobahn/data/state.go
Comment thread sei-tendermint/internal/autobahn/avail/state.go Outdated
Comment thread sei-tendermint/internal/autobahn/avail/state.go Outdated
@wen-coding wen-coding changed the title feat(autobahn): introduce epoch.Registry as single source of truth for committee/stake feat(autobahn): Add epoch.Registry to maintain committee/stake (CON-358) Jun 24, 2026
Comment thread sei-tendermint/internal/autobahn/avail/state_test.go Outdated
Comment thread sei-tendermint/internal/autobahn/epoch/registry.go Outdated
@wen-coding wen-coding force-pushed the wen/autobahn_epoch_registry branch from d764443 to 705d8c9 Compare June 24, 2026 03:59
Comment thread sei-tendermint/internal/autobahn/avail/inner.go Outdated
@wen-coding wen-coding force-pushed the wen/autobahn_epoch_registry branch 2 times, most recently from ab4c67c to 9cce5d2 Compare June 24, 2026 17:32
Comment thread sei-tendermint/autobahn/types/proposal.go
@wen-coding wen-coding force-pushed the wen/autobahn_epoch_registry branch 3 times, most recently from f1f1d55 to 09e677e Compare June 25, 2026 00:46
@wen-coding wen-coding requested a review from pompon0 June 25, 2026 02:36
Comment thread sei-tendermint/autobahn/types/proposal.go Outdated
Comment thread sei-tendermint/internal/autobahn/autobahn.proto Outdated
Comment thread sei-tendermint/autobahn/types/proposal.go Outdated
Comment thread sei-tendermint/autobahn/types/proposal.go Outdated
Comment thread sei-tendermint/autobahn/types/proposal.go Outdated
Comment thread sei-tendermint/autobahn/types/testonly.go Outdated
Comment thread sei-tendermint/autobahn/types/types_test.go Outdated
Comment thread sei-tendermint/internal/autobahn/epoch/registry.go
// EpochWindow returns the epoch→committee map for message acceptance across
// epoch transitions. With a fixed genesis committee it always returns a
// single entry.
func (r *Registry) EpochWindow() map[Index]*types.Committee {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is "window"? I don't see it in the doc.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly for the Epoch transition, we need to accept lane blocks for the current Epoch and possibly the Epoch before (repair blocks) or after it (others have got the commitQC to transition into new Epoch before me).

Comment thread sei-tendermint/internal/autobahn/epoch/registry.go Outdated
wen-coding and others added 2 commits July 2, 2026 15:44
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

There are 6 total unresolved issues (including 5 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 06711dd. Configure here.

lane := proposerKey.Public()

laneQC := makeLaneQC(rng, committee, keys, lane, MaxLaneRangeInProposal, GenBlockHeaderHash(rng))
vs.Epoch = NewEpoch(0, OpenRoadRange(), time.Time{}, committee, 0)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ViewSpec nil Epoch test panic

Low Severity

TestNewProposalRejectsLaneRangeLongerThanMaxLaneRangeInProposal calls vs.View() while vs.Epoch is still nil. ViewSpec.View() reads vs.Epoch.EpochIndex(), so the test panics before it can assert on NewProposal.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 06711dd. Configure here.

@seidroid seidroid Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A large but well-structured refactor introducing epoch.Registry, baking firstBlock/epochIndex into proposals so GlobalRange() is self-contained on the wire, and binding epoch into signatures. Correct for the current single-epoch (genesis-only) operation; the only concerns are gaps in the not-yet-active multi-epoch transition path, which are intentionally deferred and tracked by TODOs.

Findings: 0 blocking | 6 non-blocking | 2 posted inline

Blockers

  • None at the file/PR level.

Non-blocking

  • Codex flagged that Registry.VerifyInWindow only checks LatestEpoch() and that pushCommitQC doesn't rotate i.ep, so still-valid previous/next-epoch messages would be rejected during a transition. Both observations are accurate but are NOT live bugs in this PR: AddEpoch is never called from production code (verified by grep), so the registry always holds exactly one epoch (index 0) and LatestEpoch() == the only epoch. Both gaps are already tracked by explicit TODOs (registry.go:113, inner.go:117/144, avail/state.go:276) and match the PR's stated 'wire the read side first' design. Recommend keeping them as tracked follow-up items rather than merge blockers, but ensure they are closed before dynamic epochs are enabled.
  • The proto/wire format change is intentionally hard-breaking: Proposal.Decode rejects messages missing global_first, and View/AppProposal now require epoch_index. This is fine given Autobahn is pre-production and the code comment documents there is no rolling-upgrade path, but worth confirming no already-persisted consensus state (inner_a/inner_b.pb) from a prior build must be readable across this upgrade.
  • cursor-review.md was empty — the Cursor pass produced no output for this synthesis.
  • Consider a test that exercises AddEpoch end-to-end (registry with 2+ epochs, then verifying a message from the older epoch) even while the transition is not wired, to lock in the intended window semantics before follow-up PRs rely on them.
  • 2 suggestion(s)/nit(s) flagged inline on specific lines.

// Returns a slice of all matching epochs so callers can skip re-verification for any
// epoch already checked here.
// TODO: expand to neighbor epochs (previous and next) once multi-epoch transitions are wired up.
func (r *Registry) VerifyInWindow(fn func(*types.Committee) error) ([]*types.Epoch, error) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[suggestion] VerifyInWindow only verifies against LatestEpoch(). Once AddEpoch is actually invoked (follow-up PR), messages/blocks whose QC belongs to a still-valid previous (or upcoming) epoch will be rejected here and by the epochInSet check in data/state.go:469. Not a live bug today since AddEpoch is never called in production, but this must be expanded to neighbor epochs (as the TODO notes) before dynamic epochs are enabled.

}})
// CommitQC advances to new index; clear all state for new view.
// TODO: rotate ep when epoch transitions are wired up.
iSend.Store(inner{persistedInner: persistedInner{CommitQC: utils.Some(qc)}, ep: i.ep})

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[suggestion] pushCommitQC verifies against i.ep and re-stores the same i.ep, so the first CommitQC of a newly-registered epoch would fail View.Verify's epoch-index check instead of rotating. Guarded today by the fact that epochs never rotate (AddEpoch unused), and the TODO on line 144 tracks it — flagging so it isn't lost when transitions are wired.

wen-coding and others added 7 commits July 2, 2026 15:56
ep is context for the call, so it goes before prev per convention.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When the WAL has existing entries, override the constructor's firstBlock
with the first loaded entry's value rather than trusting the parameter.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
View() needs the epoch, so it belongs on inner which owns the epoch.
persistedInner.validate() inlines the ViewSpec directly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Look up the epoch from the QC (which is guaranteed present before
PushBlock is called), verify outside the lock against that specific
epoch, then insert. Removes the VerifyInWindow+epochInSet cross-check.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@seidroid seidroid Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A large, well-tested refactor that consolidates genesis config into epoch.Registry and bakes firstBlock/epoch_index into proposals so GlobalRange() is self-contained and message epoch-binding is verifiable. No blocking correctness or security issues found; the epoch_index/global_first fields are covered by the proposal hash and validated on decode/verify, and attacker-supplied epoch indices are resolved via EpochByIndex (unknown indices rejected). A few non-blocking observations only.

Findings: 0 blocking | 6 non-blocking | 0 posted inline

Blockers

  • None at the file/PR level.

Non-blocking

  • Second-opinion passes produced no material findings: Codex reported "No material findings" (noting it could not run Go tests due to a sandbox lacking Go 1.25.6), and the Cursor review file was empty. I independently reviewed the core logic and reached the same conclusion — no blockers.
  • Wire/hash compatibility: adding epoch_index to the hashable View and global_first to the hashable Proposal (plus epoch_index on AppProposal) changes the hash of these messages, making them incompatible with any previously-encoded autobahn messages. The code comments and PR body acknowledge this is acceptable because Autobahn is pre-production with no rolling-upgrade path — flagging it so reviewers are explicitly aware there is no cross-version wire compatibility.
  • epoch/registry.go: NewRegistry returns (*Registry, error) but never returns a non-nil error, and Registry methods use the for range r.mu.RLock() { return ... } / panic("unreachable") idiom. Both are fine, but the vestigial error return on NewRegistry could be dropped to simplify call sites.
  • avail/state.go: the []*types.Epoch slice returned by Registry.VerifyInWindow is discarded at both call sites (PushBlock, PushVote via if _, err := ...). The re-verification-skipping optimization the return value is meant to enable is currently unused; acceptable while single-epoch but currently dead API surface.
  • Registry.AddEpoch has no guard against startRoad <= the highest already-committed road index (only startRoad <= latest.Roads().First is checked). This is explicitly called out with a TODO and AddEpoch is not yet wired to any caller, so it is not currently reachable — but it must be addressed before dynamic epochs go live to prevent retroactively reassigning finalized roads.
  • Several epoch-transition paths are intentionally stubbed with consistent TODOs (persisted_inner restart after a transition assumes LatestEpoch; ep rotation in pushCommitQC/PushCommitQC; per-epoch weight counting in avail block votes). The read-side-first approach is reasonable and the TODOs are clear; just ensure follow-up PRs close them before enabling multi-epoch.

wen-coding and others added 2 commits July 2, 2026 16:49
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ison

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@seidroid seidroid Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A large but careful, well-tested refactor extracting genesis config into a new epoch.Registry/Epoch and binding epoch_index through Autobahn message types; no blocking correctness or security issues found, only minor stylistic/consistency notes and the expected TODO-gated dynamic-epoch stubs.

Findings: 0 blocking | 5 non-blocking | 1 posted inline

Blockers

  • None at the file/PR level.

Non-blocking

  • Second-opinion passes produced no findings: OpenAI Codex reported "No material findings" (and flagged that it could not run the targeted Go tests because Go 1.25.6 toolchain download was network-blocked), and the Cursor review file was empty (no output produced). Neither surfaced anything to merge in.
  • Dynamic-epoch support is intentionally stubbed: Registry.AddEpoch is not yet wired, LatestEpoch() is assumed everywhere, and several TODOs note that restart-after-epoch-transition (consensus newInner), per-epoch vote weighting (avail block_votes/laneQC), and epoch rotation (consensus pushCommitQC, avail PushCommitQC) will break once transitions are real. This matches the PR's stated "wire the read side first" plan, but the surface area of TODOs is worth tracking as a single follow-up so none is missed when epochs go dynamic.
  • Consistency nit: epoch/registry.go AddEpoch builds the open-ended range with math.MaxUint64 while types/epoch.go exposes OpenRoadRange()/utils.MaxRoadIndex for the same purpose; reusing the helper would keep the "open range" concept in one place.
  • data/state.go NewState relies on inner.qcs[lb.Number] being non-nil when loading blocks from the WAL to resolve the epoch (qc.QC().Proposal().EpochIndex()); this invariant was already assumed by insertBlock/globalBlockAt, but now an additional call site would nil-deref on a block-without-QC WAL state — consider an explicit guard/error since startup-panic is a poor failure mode.
  • 1 suggestion(s)/nit(s) flagged inline on specific lines.


"github.com/sei-protocol/sei-chain/sei-tendermint/autobahn/types"
"github.com/sei-protocol/sei-chain/sei-tendermint/internal/autobahn/consensus/persist"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] Nit: this inserts a blank line between two github.com/sei-protocol/... imports, splitting the same-module import group in two. It won't fail the goimports linter (which preserves manual grouping) but is inconsistent with the surrounding style — consider removing it.

wen-coding and others added 4 commits July 2, 2026 17:35
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…, WAL overrides

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@seidroid seidroid Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A large but carefully-executed refactor that introduces an epoch.Registry, bakes firstBlock/epoch_index into Proposal/View so GlobalRange() is self-contained on the wire, and routes all committee lookups through the registry. No blocking correctness or security issues found; the epoch-rotation machinery is intentionally read-side-only with documented TODOs, and test coverage is extensive.

Findings: 0 blocking | 5 non-blocking | 2 posted inline

Blockers

  • None at the file/PR level.

Non-blocking

  • Cursor's second-opinion review (cursor-review.md) is empty — that pass produced no output.
  • Codex flags that AddEpoch (the execution-bridge activation path) exists while consensus/avail pin their Epoch at startup and never rotate it, so activating a new committee would cause epoch/road mismatches. I confirmed AddEpoch and the rotation-aware code paths are only reached from tests — no production caller invokes AddEpoch — so this is not a live bug in this PR. It is explicitly scoped as follow-up and guarded by TODOs (consensus/inner.go newInner assumes LatestEpoch on restart; avail PushCommitQC has commented-out rotation; avail PushBlock/PushVote verify only the latest epoch via VerifyInWindow). Worth tracking that these must land together with the write side before dynamic epochs are enabled, otherwise a restart after a transition or a message from a non-latest epoch will fail verification.
  • AppProposal.epoch_index and Proposal.global_first are decoded as hard-required (nil rejected) even though the proto marks them optional; this is fine given the PR states autobahn is pre-production with no rolling-upgrade path, but the divergence between proto optional and decoder required is worth a comment in the .proto.
  • 2 suggestion(s)/nit(s) flagged inline on specific lines.

newIdx := s.latest + 1
s.epochs[newIdx] = types.NewEpoch(
newIdx,
types.RoadRange{First: startRoad, Last: math.MaxUint64},

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] Nit: Last: math.MaxUint64 open-ends the new epoch's road range, but the genesis epoch uses types.OpenRoadRange() (which is utils.Max[RoadIndex]()). Since RoadIndex is uint64 these are equal today, but using the same OpenRoadRange()/utils.Max[RoadIndex]() helper here keeps the two construction sites consistent and robust if RoadIndex's width ever changes.

Number: ViewNumber(*m.Number),
Index: RoadIndex(*m.Index),
Number: ViewNumber(*m.Number),
EpochIndex: EpochIndex(*m.EpochIndex), //nolint:gosec // bounded by uint32 max in practice

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] EpochIndex is uint32 but decoded from a wire uint64 with //nolint:gosec. An oversized epoch_index (e.g. 2^32) silently truncates rather than being rejected — for genesis it would truncate to 0 and select the genesis epoch. It can't bypass signature verification, but consider explicitly rejecting values > math.MaxUint32 here for clearer failure semantics.

wen-coding and others added 5 commits July 2, 2026 20:52
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…open road range

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…silently truncating

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…n main)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ow check and nolint

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@seidroid seidroid Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A large, well-tested refactor introducing epoch.Registry/types.Epoch and making the wire Proposal self-contained (global_first/epoch_index) so GlobalRange() no longer needs a committee; it also fixes a real reproposal double-count bug. No blocking correctness/security issues found; only latent footguns and deferred-epoch TODOs remain.

Findings: 0 blocking | 5 non-blocking | 1 posted inline

Blockers

  • None at the file/PR level.

Non-blocking

  • Second-opinion coverage was thin: the Cursor review file is empty, and Codex reported no material findings while noting it could not run Go tests (Go 1.25.6 unavailable, toolchain download network-blocked). No independent test execution backs this refactor beyond the PR's own tests.
  • Epoch transitions are entirely deferred: the registry always returns the genesis epoch and many push/persist paths carry TODOs (per-epoch weight counting in avail pushVote/laneQC, inner.ep rotation in avail/consensus, persistedInner.validate resolving the epoch from the persisted QC rather than LatestEpoch, AddEpoch rejecting startRoad below the commit watermark). Restart after a future epoch transition is explicitly acknowledged to fail validation. Ensure these are tracked before dynamic epochs land.
  • Registry.epochs is an append-only map with no pruning; unbounded over long-running nodes once AddEpoch is wired. Minor now, worth a follow-up.
  • data.PushBlock's new two-phase locking (blockEpoch → verify → re-lock) is safe/idempotent as written, but the correctness now depends on insertBlock's duplicate guard and the qcs[n]==nil re-check; a brief comment on why the drop-on-nil after re-lock is correct would help future readers.
  • 1 suggestion(s)/nit(s) flagged inline on specific lines.

if cQC, ok := vs.CommitQC.Get(); ok {
return cQC.GlobalRange().Next
}
return vs.Epoch.FirstBlock()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] ViewSpec.NextGlobalBlock() (and View()/NextTimestamp()) now unconditionally dereference vs.Epoch. Previously ViewSpec{} was a valid genesis spec; now any ViewSpec constructed without an Epoch will nil-panic here. All production call sites in this PR set Epoch, but this is a sharp latent footgun — consider a defensive nil check or at least documenting on ViewSpec that Epoch is mandatory.

wen-coding and others added 2 commits July 2, 2026 21:28
…_index from TimeoutVote and field 7 from Proposal

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@seidroid seidroid Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A large but clean refactor that introduces epoch.Registry and self-contains firstBlock/epoch_index on the wire, routing all committee lookups through the registry. I found no blocking correctness, security, or test issues; the change is well-covered by tests and the epoch-transition gaps are explicitly deferred behind TODOs.

Findings: 0 blocking | 4 non-blocking | 0 posted inline

Blockers

  • None at the file/PR level.

Non-blocking

  • Both second-opinion passes produced no actionable findings: Codex reported "No material findings" but noted it could not run go test (environment had Go 1.24.13 vs the required 1.25.6), and Cursor's output file was empty — so neither independently exercised the touched packages.
  • Verification is inconsistent across layers: consensus (inner.go) verifies incoming CommitQC/TimeoutQC against the cached i.ep (always LatestEpoch), while avail/state.go and data/state.go resolve the epoch via EpochByIndex(qc.Proposal().EpochIndex()). Both are safe today (single genesis epoch) and are guarded by TODOs, but the two approaches should be reconciled when dynamic epochs are wired up so a restart/transition doesn't fail validation (the TODO in consensus/inner.go newInner already flags the restart-after-transition risk).
  • epoch.NewRegistry returns an error that is currently always nil; consider dropping the error return until it can actually fail, or keep it and document that it's reserved for future validation.
  • The read-side is wired ahead of dynamic-epoch support, so several Push*/persist paths now do an EpochByIndex lookup and return "unknown epoch_index" errors that are effectively unreachable in production today; ensure follow-up PRs add coverage for the multi-epoch branches before AddEpoch is actually invoked.

@wen-coding wen-coding requested a review from pompon0 July 3, 2026 02:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants