Skip to content

ci: Standardize benchmarks and input constants#467

Draft
samuelburnham wants to merge 7 commits into
mainfrom
sb/ci-benchmarks
Draft

ci: Standardize benchmarks and input constants#467
samuelburnham wants to merge 7 commits into
mainfrom
sb/ci-benchmarks

Conversation

@samuelburnham

Copy link
Copy Markdown
Member

No description provided.

Lowering the witness cap below Zisk's built-in default (10) was measured to
have a negligible effect on host RAM and prove time for the kernel typecheck
workload, so the CLI override (which defaulted to 5) is removed and the prover
uses EmbeddedOpts::default(). Drop the now-stale flag mentions and per-RAM
tuning guidance from the README, the cost-model doc, and the shard.rs model
comments.
Rebased onto main (post #411 + #459) and integrated with the renamed
bench-main.yml and reworked bencher-track interface.

- Benchmarks/Vectors.csv: single shared source of truth (71 library constants
  from Init/Std/Mathlib/Lean). Consumed by Aiur (bench-typecheck --manifest),
  the zkVM hosts, and shell.
- bench-pr.yml: `!benchmark [aiur] [zisk] [sp1|all] [execute|prove]` over the
  curated set, posting a main-vs-PR table; main results cached by base SHA.
  Hardened: comment body from env (no injection), allowlisted env parsing.
- .github/scripts/{bench.py,run.sh}: parse/manifest/compare/comment, and the
  compile-.ixe + backend driver (cycles/execute-time/throughput/peak-rss, plus
  shards/max-shard-cycles for sharded runs).
- .github/actions/install-{sp1,zisk}: shared zkVM toolchain + deps install,
  used by bench-pr.yml, bench-main.yml, and riscv-bench.yml.
- bench-main.yml: add zkvm-execute (Zisk/SP1 cycle counts + time/RAM) and
  native-check (native parallel `ix check --anon` throughput) jobs, using the
  new bencher-track workload/thresholds interface.
- bench-typecheck: add --constant / --skip-deps (align with the zkVM hosts;
  --skip-deps replaces --subject-only) and --execute-only (fast execute path).
issue_comment workflows only run from the default branch, so the `!benchmark`
path can't be exercised on a PR branch. Add a pull_request trigger (base/head
from the PR payload; empty comment → parser defaults) to test pre-merge.

Revert before merge — delete the `pull_request:` trigger; the dual base/head
resolution and `|| pull_request.number` fallbacks are harmless to keep.
The full curated set (~60 InitStd) is too slow to run on every !benchmark and on
the bencher prove/zkVM jobs. Add a `primary` column to Vectors.csv marking 11
constants spanning shape (nat/list/array/int/string/vector/multiset, defs +
proofs) and the cheap→heavy cost range (incl. 3 shard targets), and make it the
default:

- bench.py manifest --primary; parse honors BENCH_FULL (run the full set).
- bench-pr.yml: !benchmark defaults to the primary subset; BENCH_FULL=1 runs the
  whole curated set.
- bench-main.yml: prove + zkvm-execute derive constants from `manifest --primary`
  (replacing the hardcoded lists), so all backends bench the same set from the one
  source of truth. The tier filter keeps prove on the cheap primaries (heavy ones
  would OOM a single-shard prove); execute/native get the heavy ones for scale.
Bring both surfaces (!benchmark PR comment and bencher.dev) to parity across all
four backends and remove the test scaffolding.

- Native kernel on !benchmark: add a `native` backend (whole-env `ix check
  --anon`, the parallel out-of-circuit kernel) — bench.py backend/runner/metrics,
  a run.sh `native` branch, and a GNU-time install for native PR cells. `all`
  now fans out aiur/zisk/sp1/native.
- run.sh: single-source the Aiur path through a per-constant bench-typecheck loop
  (per-constant peak-rss), add a per-constant `timeout` to the zkVM execute path
  (heavy primaries can't hang a job), and accept the env arg case-insensitively
  (bencher reuses the cached InitStd.ixe; bench-pr compiles initStd.ixe).
- bench-main.yml: the prove and native-check jobs now drive run.sh too (dedup
  with the PR path); zkvm-execute gains the Mathlib env to match prove/native.
- bench-thresholds-reset.yml: register the zisk / sp1 / native-check workloads.
- bench-pr.yml: drop the temporary pull_request trigger (keep the harmless
  dual-SHA / number fallbacks).
- docs/benchmarking.md: document the two surfaces, backends, Vectors.csv, the
  !benchmark grammar, and the bencher workloads / threshold resets.
…maries too

- Aiur prove now covers the whole primary subset: bench.py exempts --primary from
  the prove cheap-tier cap, and run.sh proves each constant whose Aiur fft-cost
  fits the prover RAM ceiling (AIUR_PROVE_MAX_FFT, ~128 GB at 2.34 GB per billion
  fft) and execute-only's the rest — so heavy primaries (only Vector.extract_append
  at ~145 GB) still report execute metrics instead of being dropped. BENCH_FULL
  prove stays capped at the cheap tier so it doesn't balloon.
- native now reports two views per env: the whole env (`ix check --anon`, keyed by
  env) and a per-primary subject check (`ix check --consts`, keyed by constant) —
  apples-to-apples with the zkVM --skip-deps execute. Wired into both the
  native-check bencher job and the !benchmark native backend.
- run.sh: fix a stream-corruption bug — tool stdout and ::warning::/::notice:: now
  go to logs/stderr so only JSON reaches the per-constant `jq -s` merge.
Route every backend's peak-rss and per-phase timings through tracing-texray:

- peak-rss now comes from the fork's process-tree sampler, so Zisk's ASM
  microservices' memory (tens of GB in separate PIDs) is captured — a bare
  /proc/self/status read of the host missed it. Verified on real execute
  runs: sp1 ~9GB, zisk ~49GB peak.

- zisk/sp1 hosts and bench-typecheck self-report cycles/time/throughput/
  peak-rss via --json, retiring run.sh's grep/awk/time parsing. They also
  emit per-phase span timings via --texray-json; native check-rs reports
  peak-rss as a 6th ##check## field.

- run.sh folds span timings into a per-constant `phases` object; bench.py
  renders a collapsible per-phase drill-down in the !benchmark comparison,
  and bench-main.yml tracks phase:<span> measures on bencher.

tracing-texray is pinned to the argumentcomputer/tracing-texray json-ram
branch (rev 15ae57c), which adds the process-tree RSS sampler + JSON sink.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant