ci: Standardize benchmarks and input constants by samuelburnham · Pull Request #467 · argumentcomputer/ix

samuelburnham · 2026-07-02T01:27:25Z

No description provided.

Lowering the witness cap below Zisk's built-in default (10) was measured to have a negligible effect on host RAM and prove time for the kernel typecheck workload, so the CLI override (which defaulted to 5) is removed and the prover uses EmbeddedOpts::default(). Drop the now-stale flag mentions and per-RAM tuning guidance from the README, the cost-model doc, and the shard.rs model comments.

Rebased onto main (post #411 + #459) and integrated with the renamed bench-main.yml and reworked bencher-track interface. - Benchmarks/Vectors.csv: single shared source of truth (71 library constants from Init/Std/Mathlib/Lean). Consumed by Aiur (bench-typecheck --manifest), the zkVM hosts, and shell. - bench-pr.yml: `!benchmark [aiur] [zisk] [sp1|all] [execute|prove]` over the curated set, posting a main-vs-PR table; main results cached by base SHA. Hardened: comment body from env (no injection), allowlisted env parsing. - .github/scripts/{bench.py,run.sh}: parse/manifest/compare/comment, and the compile-.ixe + backend driver (cycles/execute-time/throughput/peak-rss, plus shards/max-shard-cycles for sharded runs). - .github/actions/install-{sp1,zisk}: shared zkVM toolchain + deps install, used by bench-pr.yml, bench-main.yml, and riscv-bench.yml. - bench-main.yml: add zkvm-execute (Zisk/SP1 cycle counts + time/RAM) and native-check (native parallel `ix check --anon` throughput) jobs, using the new bencher-track workload/thresholds interface. - bench-typecheck: add --constant / --skip-deps (align with the zkVM hosts; --skip-deps replaces --subject-only) and --execute-only (fast execute path).

issue_comment workflows only run from the default branch, so the `!benchmark` path can't be exercised on a PR branch. Add a pull_request trigger (base/head from the PR payload; empty comment → parser defaults) to test pre-merge. Revert before merge — delete the `pull_request:` trigger; the dual base/head resolution and `|| pull_request.number` fallbacks are harmless to keep.

The full curated set (~60 InitStd) is too slow to run on every !benchmark and on the bencher prove/zkVM jobs. Add a `primary` column to Vectors.csv marking 11 constants spanning shape (nat/list/array/int/string/vector/multiset, defs + proofs) and the cheap→heavy cost range (incl. 3 shard targets), and make it the default: - bench.py manifest --primary; parse honors BENCH_FULL (run the full set). - bench-pr.yml: !benchmark defaults to the primary subset; BENCH_FULL=1 runs the whole curated set. - bench-main.yml: prove + zkvm-execute derive constants from `manifest --primary` (replacing the hardcoded lists), so all backends bench the same set from the one source of truth. The tier filter keeps prove on the cheap primaries (heavy ones would OOM a single-shard prove); execute/native get the heavy ones for scale.

Bring both surfaces (!benchmark PR comment and bencher.dev) to parity across all four backends and remove the test scaffolding. - Native kernel on !benchmark: add a `native` backend (whole-env `ix check --anon`, the parallel out-of-circuit kernel) — bench.py backend/runner/metrics, a run.sh `native` branch, and a GNU-time install for native PR cells. `all` now fans out aiur/zisk/sp1/native. - run.sh: single-source the Aiur path through a per-constant bench-typecheck loop (per-constant peak-rss), add a per-constant `timeout` to the zkVM execute path (heavy primaries can't hang a job), and accept the env arg case-insensitively (bencher reuses the cached InitStd.ixe; bench-pr compiles initStd.ixe). - bench-main.yml: the prove and native-check jobs now drive run.sh too (dedup with the PR path); zkvm-execute gains the Mathlib env to match prove/native. - bench-thresholds-reset.yml: register the zisk / sp1 / native-check workloads. - bench-pr.yml: drop the temporary pull_request trigger (keep the harmless dual-SHA / number fallbacks). - docs/benchmarking.md: document the two surfaces, backends, Vectors.csv, the !benchmark grammar, and the bencher workloads / threshold resets.

…maries too - Aiur prove now covers the whole primary subset: bench.py exempts --primary from the prove cheap-tier cap, and run.sh proves each constant whose Aiur fft-cost fits the prover RAM ceiling (AIUR_PROVE_MAX_FFT, ~128 GB at 2.34 GB per billion fft) and execute-only's the rest — so heavy primaries (only Vector.extract_append at ~145 GB) still report execute metrics instead of being dropped. BENCH_FULL prove stays capped at the cheap tier so it doesn't balloon. - native now reports two views per env: the whole env (`ix check --anon`, keyed by env) and a per-primary subject check (`ix check --consts`, keyed by constant) — apples-to-apples with the zkVM --skip-deps execute. Wired into both the native-check bencher job and the !benchmark native backend. - run.sh: fix a stream-corruption bug — tool stdout and ::warning::/::notice:: now go to logs/stderr so only JSON reaches the per-constant `jq -s` merge.

Route every backend's peak-rss and per-phase timings through tracing-texray: - peak-rss now comes from the fork's process-tree sampler, so Zisk's ASM microservices' memory (tens of GB in separate PIDs) is captured — a bare /proc/self/status read of the host missed it. Verified on real execute runs: sp1 ~9GB, zisk ~49GB peak. - zisk/sp1 hosts and bench-typecheck self-report cycles/time/throughput/ peak-rss via --json, retiring run.sh's grep/awk/time parsing. They also emit per-phase span timings via --texray-json; native check-rs reports peak-rss as a 6th ##check## field. - run.sh folds span timings into a per-constant `phases` object; bench.py renders a collapsible per-phase drill-down in the !benchmark comparison, and bench-main.yml tracks phase:<span> measures on bencher. tracing-texray is pinned to the argumentcomputer/tracing-texray json-ram branch (rev 15ae57c), which adds the process-tree RSS sampler + JSON sink.

samuelburnham added 7 commits July 1, 2026 19:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ci: Standardize benchmarks and input constants#467

ci: Standardize benchmarks and input constants#467
samuelburnham wants to merge 7 commits into
mainfrom
sb/ci-benchmarks

samuelburnham commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

samuelburnham commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant