Skip to content

perf(ghost): background index hydrate + demand-walk memo + NEDB v2.6.0#64

Open
Eth-Interchained wants to merge 4 commits into
mainfrom
hyperagent/2026-07-02-ghost-background-hydrate
Open

perf(ghost): background index hydrate + demand-walk memo + NEDB v2.6.0#64
Eth-Interchained wants to merge 4 commits into
mainfrom
hyperagent/2026-07-02-ghost-background-hydrate

Conversation

@Eth-Interchained

Copy link
Copy Markdown
Owner

Summary

Stage 1 of the iMac slow-boot fix — improving -ghost-protocol with its own existing machinery, no new engine surface. Root cause (from Mark's 2026-07-02 boot log): the tip→genesis hydrate ran synchronously on GetAncestor callers' critical path, at 2 random NEDB reads per ancestor — ~40/s on seek-bound Fusion/HDD media (~3.5h projected for 507k), vs ~35,700/s on Nemo's NVMe. -dagfastsync was already ON — fsync exonerated by the log.

The three pieces

  1. Demand-walk memo (validation.cpp) — the walk re-read the child's record each step just for hashPrev; that record was read one step earlier. One-entry memo under cs_main2 reads/ancestor → 1. Correctness never depends on it (miss → normal read; hashPrev is immutable per hash).
  2. Background index hydrate (validation.cpp, ghost.h, init.cpp) — GhostStartBackgroundIndexHydrate() after INSTANT BOOT: same walk, same WarmBootLoadParent chokepoint, own thread, chunked cs_main (256 + 1ms breather), cooperating with on-demand loads through the same map. Wires the dormant IndexHydrating → IndexReady states + SetHydratedThrough(tip) at genesis. Joined in Shutdown(); try/catch so it can never take the daemon down.
  3. NEDB pin v2.5.0 → v2.6.0 (nedb-ffi/Cargo.toml) — 15 releases of engine catch-up: pread handle cache on every point read (+23% measured on VPS) + the 2.5.55 correctness batch (WAL flush race under same-key churn — the UTXO shape — seq-guarded tips, fsync ordering, NQL LIMIT, batch index parity, cold-scan seq off-by-one).

Expected on the iMac (estimate — the boot log verifies)

Walk ≈ doubles (memo) + pread trim, and all of it leaves the critical path — header sync and restricted serving no longer wait. Familiar WarmBoot: demand-loaded N lines keep printing (same counter, now background), ending in [GHOST] background index hydrate COMPLETE — N in Xs (Y/s) + index-ready.

Nemo impact

None meaningful — walk completes in seconds either way; it just stops borrowing anyone's thread.

Tests run

Written blind — Mark compiles (established flow). Verification = boot log on the iMac: look for index-hydrating, the same demand-load progress lines, COMPLETE with a rate, index-ready, while headers sync concurrently.

Follow-up (Stage 2, queued — NEDB side first)

for_each_object_sequential() in nedb-engine (segment locations sorted by (segment, offset) = physically sequential reads), one FFI export, then this thread swaps its read loop: random-seek hydrate → ~30s-class sequential sweep even on HDD. This PR's thread is the permanent home it plugs into.

Handoff: docs/HANDOFF_2026-07-02_GHOST_BACKGROUND_HYDRATE.md

© INTERCHAINED LLC × Claude Fable 5

🤖 Generated with Claude Code

Vex (Hyperagent) and others added 4 commits July 2, 2026 14:29
WarmBootLoadParent re-read the CHILD's record every step just to learn
its hashPrev — but that exact record was read one step earlier to
populate it. On seek-bound media (HDD/Fusion iMacs, ~100 IOPS) the
re-read doubled the wall-clock of the whole tip->genesis hydrate
(measured 2026-07-02 iMac log: ~40 ancestors/s => ~3.5h projected for
507k).

One-entry memo keyed by exact hash, guarded by cs_main (already
asserted). Correctness never depends on it: miss -> normal read, and
hashPrev is part of the header, immutable for a given hash.

2 NEDB point reads per ancestor -> 1.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The demand loader IS ghost's hydrate engine, but it only ever ran
synchronously inside whichever GetAncestor caller tripped over a deep
parent — the full tip->genesis walk executed on msghand/validation's
critical path, one ancestor at a time ('hydrate still synchronous', as
the banner honestly said). On seek-bound media that crawl was the whole
slow-boot symptom.

GhostStartBackgroundIndexHydrate(): dedicated thread walks tip->genesis
through the SAME WarmBootLoadParent chokepoint in chunked cs_main holds
(256/chunk + 1ms breather), cooperating with the on-demand path through
the same map under the same lock — whoever loads a parent first wins,
the other skips it free. Wires the until-now-unused readiness states:
IndexHydrating at start, SetHydratedThrough(tip) + IndexReady at
genesis. Spawned after INSTANT BOOT; exits within one chunk of
ShutdownRequested(); joined in Shutdown() before chainstate teardown;
try/catch so a background optimization can never take the daemon down.

No new read paths, no new storage APIs — the boot-blocking crawl
becomes a background fill. Stage 2 (queued): swap this thread's read
loop for NEDB's sequential segment sweep once that primitive lands.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Fifteen releases of engine behind. v2.6.0 brings cached segment read
handles + positional reads (pread) to every demand-load and chainstate
point read (+23% v3 point reads measured on VPS; larger expected where
file-open dominates), plus the 2.5.55 correctness batch: id-index WAL
flush race under same-key churn (the UTXO rewrite shape), seq-guarded
durable tips, MANIFEST fsync ordering, NQL ORDER BY+WHERE+LIMIT
truncation, put_batch sorted-index parity, cold-scan MANIFEST seq
off-by-one.

Handoff: docs/HANDOFF_2026-07-02_GHOST_BACKGROUND_HYDRATE.md — the
measured iMac diagnosis (40/s seek-bound walk), what shipped, expected
outcome, and the queued Stage 2 sequential sweep.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… v2.6.1

Mark's first boot of the hydrate build (2026-07-02 18:44 iMac log) caught
two bugs — one mine, one the engine's:

1. FALSE COMPLETE (mine): the hydrate walk declared genesis at the
   warm-boot window base's parent placeholder — a bare stub with
   nHeight == 0 — and reported 'COMPLETE — 0 ancestor(s)', advancing
   index-ready and setting HydratedThrough=507464 without linking
   anything. The demand path stayed armed (by design), so the node kept
   correct behavior, but the readiness signal lied. Fixes:
   - WarmBootLoadParent now populates the CALLER's own entry from its
     record when it is a stub (nStatus == 0) — every linked stub
     eventually passes through this chokepoint, so zero-field landmines
     (mid-chain nHeight==0) are healed at the single point of truth.
   - The hydrate loop's genesis check requires height 0 AND populated
     (nStatus != 0); stubs fall through to the loader instead.
   - 'Already linked' fast-path also requires populated, so linked
     stubs still get healed rather than skipped.

2. BOOT TAX (engine, fixed in NEDB v2.6.1 — pin bumped): v2.6.0 forced
   a full cold scan on pre-2.5.43 MANIFESTs 'once to upgrade'. On this
   datadir that is 5.4M objects of random reads across three stores,
   racing boot I/O on seek-bound media (warm-boot window 99s -> 284s
   measured), and re-paid every boot if the daemon exits before the
   scans finish. v2.6.1 warm-boots old MANIFESTs; tip fields heal on
   first flush. The forced scan bought itcd nothing (the FFI never
   calls tip()).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant