Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 16 additions & 15 deletions .github/workflows/riscv-bench.yml
Original file line number Diff line number Diff line change
Expand Up @@ -106,13 +106,13 @@ jobs:
- uses: actions/download-artifact@v4
with:
name: minimal-ixe
- name: Install Zisk toolchain (ziskup, pinned v0.18.0)
# `--version 0.18.0` pins the toolchain to match our deps. Our host links
# the argumentcomputer/zisk `blake3-precompile` fork, which is based on
# v0.18.0 (its cargo-zisk has `check-setup`, used below to regenerate the
# key's const-trees). Without the pin, ziskup installs `releases/latest`,
# which resolves to upstream `v1.0.0-alpha` — a different circuit whose
# cargo-zisk dropped the `check-setup` subcommand, breaking the key step.
- name: Install Zisk toolchain (ziskup, pinned v1.0.0-alpha)
# `--version 1.0.0-alpha` pins the toolchain to match our deps. Our host
# links the argumentcomputer/zisk `blake3-precompile` fork, which is now
# based on upstream v1.0.0-alpha (check-setup lives in the new
# `cargo-zisk-dev` binary, used below to regenerate the key's
# const-trees). Keep the pin anyway so a future upstream release can't
# silently change the toolchain under us.
# `--cpu` picks the CPU build (no GPU on the runner) and `--nokey` skips
# ziskup's key install — both avoid its interactive /dev/tty prompts. We
# keep `--nokey` because the upstream `zisk-setup` bucket only carries the
Expand All @@ -123,29 +123,30 @@ jobs:
# otherwise relocate it).
run: |
curl -L https://raw.githubusercontent.com/0xPolygonHermez/zisk/main/ziskup/install.sh \
| bash -s -- --cpu --nokey -y --version 0.18.0 --prefix "$HOME/.zisk"
| bash -s -- --cpu --nokey -y --version 1.0.0-alpha --prefix "$HOME/.zisk"
echo "$HOME/.zisk/bin" >> "$GITHUB_PATH"
# Execute still needs a proving key present: zisk-host calls
# `client.setup()` (which the SDK runs before the execute branch), and that
# loads the circuit's const-tree files. We host the fork-matching key in a
# public S3 bucket WITHOUT the const-trees — exactly like Zisk's released
# `zisk-provingkey-*.tar.gz` on `storage.googleapis.com/zisk-setup` — and
# regenerate them here with `cargo-zisk check-setup -a`, which is how
# `ziskup` itself populates them. That keeps the artifact ~3 GB (gzip)
# instead of ~48 GB. The object name carries the fork rev so a circuit
# change can't silently reuse a stale key. Public bucket → plain curl, no
# AWS creds.
# regenerate them here with `cargo-zisk-dev check-setup -a`, which is how
# `ziskup` itself populates them. That keeps the artifact a few GB (gzip)
# instead of ~50 GB (the tarball also omits the `*_gpu` const variants the
# first GPU prove would materialize — unused on this CPU runner). The
# object name carries the fork rev so a circuit change can't silently
# reuse a stale key. Public bucket → plain curl, no AWS creds.
- name: Restore Zisk proving key (fork circuit) from S3
run: |
mkdir -p "$HOME/.zisk"
curl -fSL --retry 3 \
https://argument-zisk-setup.s3.amazonaws.com/zisk-provingkey-blake3-8f9e24d5-cpu.tar.gz \
https://argument-zisk-setup.s3.amazonaws.com/zisk-provingkey-blake3-e4057c4c-cpu.tar.gz \
-o /tmp/zisk-provingkey.tar.gz
tar -C "$HOME/.zisk" -xzf /tmp/zisk-provingkey.tar.gz
rm -f /tmp/zisk-provingkey.tar.gz
# Regenerate the const-tree files omitted from the artifact (CPU build,
# so no --gpu). This is the "may take a while" step ziskup prints.
cargo-zisk check-setup --proving-key "$HOME/.zisk/provingKey" -a
cargo-zisk-dev check-setup --proving-key "$HOME/.zisk/provingKey" -a
- name: Zisk — execute minimal.ixe (assert failures == 0)
run: |
cd zisk
Expand Down
27 changes: 14 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -405,18 +405,19 @@ Non-Nix users: install Zisk manually per the
export CUDA_CACHE_MAXSIZE=4294967296 # 4 GB
```

**Warm-batch proving and cold-start.** The first GPU proof on a machine
pays a large one-time cold-start: the proving kernels are JIT-compiled
(PTX→SASS), almost entirely inside `GENERATING_INNER_PROOFS`. Measured on an
RTX PRO 6000, a `nataddcomm.ixe` proof takes **~126 s cold vs ~12 s warm**
(the inner-proof phase alone drops from ~123 s to ~9 s; EXECUTE and
`CALCULATING_CONTRIBUTIONS` are unchanged). Two things reuse that work:
`--ixe` is repeatable, so passing several inputs proves them in one process
and pays the cold-start once for the whole batch; and the JIT output is
cached on disk (`CUDA_CACHE_MAXSIZE` above), so even a *fresh* process stays
warm — the ~12 s figure above is a separate process from the cold run. So a
single small one-off proof looks slow (it eats the cold-start); amortize by
batching, or just disregard it. By default proving is stateful with no
**Warm-batch proving and cold-start.** On the Zisk v0.18 branch the first
GPU proof on a machine paid a large one-time JIT cold-start (PTX→SASS,
~126 s cold vs ~12 s warm for `nataddcomm.ixe` on an RTX PRO 6000). At
Zisk v1.0.0-alpha that per-process JIT penalty is gone: cold ≈ warm, both
**~17.6-17.9 s** for the same `nataddcomm.ixe` proof on the same GPU
(inner-proof phase ~14 s — note that is slower than v0.18's ~9 s warm
figure; upstream prover/recursion changes, not the blake3 port). The one
remaining true first-run cost is one-time and on-disk, not per-process:
the first GPU prove against a freshly generated proving key materializes
~40 GB of `*_gpu` const/consttree variants under `~/.zisk/provingKey`.
`--ixe` is still repeatable, so several inputs can be proved in one warm
process. Keeping `CUDA_CACHE_MAXSIZE` pinned (above) remains harmless but
is no longer load-bearing. By default proving is stateful with no
checkpointing — if a run is killed it loses in-flight shard proofs and
restarts from the first shard; use a proof store (`--store-dir`, see
*Sharding large environments* below) to make a sharded run resumable.
Expand Down Expand Up @@ -468,7 +469,7 @@ Non-Nix users: install Zisk manually per the
[installation docs](https://0xpolygonhermez.github.io/zisk/getting_started/installation.html).

**Heap cap.** The Zisk zkVM has a hard 512 MB RAM cap
([`RAM_SIZE`](https://github.com/0xPolygonHermez/zisk/blob/v0.17.0/core/src/mem.rs#L111)),
([`RAM_SIZE`](https://github.com/0xPolygonHermez/zisk/blob/v1.0.0-alpha/core/src/mem.rs#L111)),
of which ~510 MB is usable heap, and isn't configurable without
rebuilding the proving setup. Envs whose deserialized in-memory
representation exceeds that won't fit (full `TutorialDefs.lean` pulls in
Expand Down
6 changes: 6 additions & 0 deletions docs/zisk-cycle-cost-model.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,12 @@ prove within the RAM cap while minimizing the number of pieces.
Measured on an RTX PRO 6000 (250 GiB host). Inputs, scripts, and raw data:
`~/benchdata/prof/`.

> **Version note.** All numbers below were measured on the Zisk v0.18-based
> `blake3-precompile` branch. On the v1.0.0-alpha port the cycle counts drift
> slightly from upstream zisklib/ROM changes (e.g. `nataddcomm.ixe`:
> 53,239,676 → 53,860,206 STEPS, ~+1.2%); the model's structure and
> coefficients remain a good approximation but have not been re-fit.

---

## Background
Expand Down
Loading