Finalize CollectiveX v1 cross-vendor EP benchmark suite / 完成 CollectiveX v1 跨厂商 EP 基准测试套件#2004
Finalize CollectiveX v1 cross-vendor EP benchmark suite / 完成 CollectiveX v1 跨厂商 EP 基准测试套件#2004Oseltamivir wants to merge 1 commit into
Conversation
| rsync -a --delete --delete-excluded \ | ||
| --exclude='__pycache__/' --exclude='results/' --exclude='.cx_workloads/' \ | ||
| --exclude='configs/platforms.yaml' --exclude='private-infra.md' \ | ||
| --exclude='goal.md' --exclude='notes.md' \ | ||
| "$repo_root/experimental/CollectiveX" "$stage_dir/experimental/" >/dev/null 2>&1 \ | ||
| || cx_die "staging CollectiveX failed" |
There was a problem hiding this comment.
🔴 The setup step writes the shard JSON to experimental/CollectiveX/results/.shard_${matrix.id}.json and sets CX_SHARD_FILE=results/.shard_${matrix.id}.json (relative), but cx_stage_repo (runtime/common.sh:145-150) rsyncs the CollectiveX tree with --exclude='results/' --delete-excluded and drops the shard file — so for every staged single-tray SKU (b300 always; gb200/gb300 with EP4 via CX_NODES<=1), the [ -f "$CX_SHARD_FILE" ] guard at run_in_container.sh:458 fails and execution falls into the single-bench else branch (line 556+), silently running one wrong-config default (uniform/decode/bf16, empty case_id) instead of the shard's N scheduled cases. Downstream make_bundle will catch this via missing_identity/coverage but only after GPU allocation was spent on the wrong workload. Cheap fix: allow-list the shard file through the rsync (--include='experimental/CollectiveX/results/' --include='experimental/CollectiveX/results/.shard_*.json' before the results/ exclude), copy the shard file into the stage dir after the rsync, or resolve CX_SHARD_FILE against the original repo root in run_in_container.sh's SHARD guard the way the rack (EP8) launchers already do (see launch_gb300-nv.sh:92-93 / launch_gb200-nv.sh cx_ep_cases).
Extended reasoning...
The bug
The sweep workflow's shard-fanout step writes the resolved case list to experimental/CollectiveX/results/.shard_${matrix.id}.json:
# .github/workflows/collectivex-sweep.yml
env:
CX_SHARD_FILE: results/.shard_${{ matrix.id }}.json # RELATIVE path
...
- name: Extract shard from matrix artifact
working-directory: experimental/CollectiveX
run: |
...
json.dump({...,'cases':s['cases']}, open('results/.shard_${{ matrix.id }}.json','w'))The physical file therefore lands at $REPO/experimental/CollectiveX/results/.shard_<id>.json, and CX_SHARD_FILE=results/.shard_<id>.json is interpreted relative to the container's cwd, which is /ix/experimental/CollectiveX.
For every SKU that requires CX_STAGE_DIR (b300 always; gb200/gb300 with EP4 via the CX_NODES<=1 delegate path in launch_gb200-nv.sh:57 / launch_gb300-nv.sh:47), the launcher calls:
# launch_b300.sh:34, launch_gb200-nv.sh:52, launch_gb300-nv.sh:24
MOUNT_SRC="$(cx_stage_repo "$REPO_ROOT" "$CX_STAGE_DIR")"which rsyncs the tree with an exclude that drops results/:
# experimental/CollectiveX/runtime/common.sh:145-150
rsync -a --delete --delete-excluded \
--exclude='__pycache__/' --exclude='results/' --exclude='.cx_workloads/' \
--exclude='configs/platforms.yaml' --exclude='private-infra.md' \
--exclude='goal.md' --exclude='notes.md' \
"$repo_root/experimental/CollectiveX" "$stage_dir/experimental/"Both --exclude='results/' and --delete-excluded guarantee that the shard file the workflow just wrote is missing from the stage dir.
The consequence at runtime
The container mounts $MOUNT_SRC:/ix, cwd=/ix/experimental/CollectiveX. Inside run_in_container.sh, the SHARD guard resolves CX_SHARD_FILE relative to that cwd:
# runtime/run_in_container.sh:458
if [ -n "${CX_SHARD_FILE:-}" ] && [ -f "${CX_SHARD_FILE:-/nonexistent}" ]; then
# SHARD mode — sweep every scheduled case
...
else
# Single-bench (workflow_dispatch) path
# uses ${CX_MODE:-normal}, ${CX_PHASE:-decode}, ${CX_ROUTING:-uniform},
# ${CX_DISPATCH_DTYPE:-bf16}, empty CX_CASE_ID/CX_SUITE/CX_WORKLOAD_NAME, ...The file resolves to /ix/experimental/CollectiveX/results/.shard_<id>.json — which is missing because rsync excluded it — so the test fails and the else branch runs a single default case with none of the shard's identity, N times cheaper than the intended N-case sweep.
Why the rack (EP8) paths escape
The rack-scale launchers iterate cases themselves in the launcher on the SUBMIT host (not inside the container). Their case-list helpers explicitly resolve the shard file against the original checkout when the relative path misses:
# launch_gb300-nv.sh cx_ep8_cases (and launch_gb200-nv.sh cx_ep_cases)
local sf="${CX_SHARD_FILE:-}"
[ -n "$sf" ] && [ ! -f "$sf" ] && [ -f "$CX_DIR/$sf" ] && sf="$CX_DIR/$sf"The same workaround is absent from run_in_container.sh:458, so the EP4 single-tray path — which shares the b300/gb200-EP4/gb300-EP4 launchers with the staged mount — hits the missing file.
Affected sweeps
Every single-tray staged shard in the v1 promoted matrix, per sweep_matrix.py + configs/suites.yaml platforms:
- b300 (all shards; launch_b300.sh is single-node)
- gb200 EP4 (CX_NODES<=1 -> run_in_container.sh)
- gb300 EP4 (CX_NODES<=1 -> run_in_container.sh)
The h100-dgxc/h200-dgxc/b200-dgxc/mi325x/mi355x paths do not set CX_STAGE_DIR in this workflow (cx_stage_repo becomes a no-op) and are unaffected.
Concrete walk-through (b300 shard)
- Setup job resolves matrix; writes
experimental/CollectiveX/results/.shard_b300-deepep.jsonon the checkout with e.g. 24 cases (varied phase/dtype/routing/eplb across ep-core-v1 + ep-routing-v1). - Sweep job on the b300 runner exports
CX_SHARD_FILE=results/.shard_b300-deepep.json, checks out the repo, and callslaunch_b300.sh. launch_b300.sh:34->cx_stage_reporsyncs to$CX_STAGE_DIR/job_<id>/experimental/CollectiveX/with--exclude='results/' --delete-excluded. The shard file is not copied.srun --container-workdir=$MOUNT_DIR/experimental/CollectiveX ... run_in_container.sh. cwd inside container =/ix/experimental/CollectiveX.run_in_container.sh:458tests[ -f "results/.shard_b300-deepep.json" ]-> that resolves to/ix/experimental/CollectiveX/results/.shard_b300-deepep.json-> missing.- Execution falls into the else branch at line 556+. It dispatches
${CX_BENCH}once withCX_MODE=normal,CX_PHASE=decode,CX_ROUTING=uniform,CX_DISPATCH_DTYPE=bf16, emptyCX_CASE_ID, emptyCX_SUITE, emptyCX_WORKLOAD_NAME, emptyCX_REQUIRED_PUBLICATION. - One result JSON is produced with no case_id and mismatched identity; the other 23 scheduled cases never run.
- Aggregate job's
make_bundle.py validate_expected_coveragecomputesmissing_identity + missing + identity_mismatchagainstmatrix_full.jsonandraise SystemExit(...)— the whole aggregate fails, after b300 GPU-time was spent on the wrong workload.
Impact
For every b300/gb200-EP4/gb300-EP4 shard promoted through v1 (three of the eight SKUs in ep-core-v1 + ep-routing-v1), the sweep silently runs one wrong-config default point instead of the scheduled N-case sweep. Bundle validation catches the divergence but only post-hoc, so the failure is loud yet wasteful: GPU allocations spent, aggregate job red, invalidating the v1 dataset this PR is producing.
Fix
Any one of:
-
Allow the shard file through the rsync in
cx_stage_repo(runtime/common.sh:146):rsync -a --delete --delete-excluded \ --include='experimental/CollectiveX/results/' \ --include='experimental/CollectiveX/results/.shard_*.json' \ --exclude='__pycache__/' --exclude='results/' ...
-
Copy the shard file into the stage dir after the rsync completes:
[ -n "${CX_SHARD_FILE:-}" ] && [ -f "$repo_root/experimental/CollectiveX/$CX_SHARD_FILE" ] \ && cp -a "$repo_root/experimental/CollectiveX/$CX_SHARD_FILE" \ "$stage_dir/experimental/CollectiveX/$CX_SHARD_FILE"
-
Mirror the rack (EP8) launcher workaround in
run_in_container.sh:458:sf="${CX_SHARD_FILE:-}" # $CX_DIR is not set inside the container; use the fixed workdir instead. [ -n "$sf" ] && [ ! -f "$sf" ] && [ -f "/ix/experimental/CollectiveX/$sf" ] \ && sf="/ix/experimental/CollectiveX/$sf" if [ -n "$sf" ] && [ -f "$sf" ]; then ...
Approach (1) or (2) is the smallest change with the least surface area.
| elif _run(["ibstat", "-l"]): | ||
| devices = [d.strip() for d in _run(["ibstat", "-l"]).splitlines() if d.strip()] | ||
| return { |
There was a problem hiding this comment.
🟡 _rdma() calls _run(["ibstat", "-l"]) twice at env_capture.py:178-179 — once in the elif condition and once in the comprehension body. If the second invocation returns None (which _run does on shutil.which miss, TimeoutExpired/OSError, or nonzero exit), .splitlines() raises AttributeError and takes down env_capture.py under run_in_container.sh's set -euo pipefail. The trigger is genuinely rare (both calls are microseconds apart on a stable IB stack, and this branch runs only when ibv_devinfo is absent), so nit — but the fix is a one-line refactor mirroring the ibv_devinfo branch just above.
Extended reasoning...
The defect. env_capture._rdma() has an asymmetry between its two RDMA-listing branches:
listing = _run(["ibv_devinfo", "-l"]) # assigned once, iterated once
if listing:
for line in listing.splitlines()[1:]:
...
elif _run(["ibstat", "-l"]): # called once (as a truthiness check)
devices = [d.strip() for d in _run(["ibstat", "-l"]).splitlines() if d.strip()] # called AGAINThe ibv_devinfo branch just above does the right thing: assign once, reuse. The ibstat branch does not.
Why the crash is theoretical but real. _run() returns None on any of: shutil.which(cmd[0]) failing (line 51), subprocess.TimeoutExpired/OSError (line 57), or out.returncode != 0 (line 59). If the first call returns a truthy string but the second returns None — a transient OS timer glitch, an OOM-killed helper, a stray nonzero exit under load — then None.splitlines() raises AttributeError. Under run_in_container.sh's set -euo pipefail (line 33), that aborts the whole shard step before any GPU benchmark runs.
Step-by-step proof of the theoretical crash path:
- Node has
ibstatin$PATHbut noibv_devinfo(a real config: MI355X-style stacks withibstatonly). - First call:
_run(["ibstat", "-l"])succeeds → returns"mlx5_0\nmlx5_1\n"→ elif condition is truthy. - Second call: a transient nonzero exit (e.g.
ibstatracing an IB-driver reload, timer wraparound, PID-namespace hiccup) →out.returncode != 0→_runreturnsNone. None.splitlines()→AttributeError: 'NoneType' object has no attribute 'splitlines'→ Python exits nonzero →set -eabortsrun_in_container.sh→ the shard step fails before GPU work.
Why this is nit, not normal. Every verifier converged on the same practical assessment: ibstat -l is a fast local device listing with no network/filesystem dependency, so a transient failure between two back-to-back calls (microseconds apart) is extremely improbable. The elif branch itself only runs when ibv_devinfo is absent, which is uncommon on the target runners since both binaries come from the same InfiniBand userspace stack. And env_capture.py produces a diagnostic/provenance artifact — even a genuine crash here would break provenance capture, not the benchmark measurement. The defect exists but doesn't justify blocking merge.
The fix. One-line refactor to mirror the ibv_devinfo branch:
else:
listing = _run(["ibstat", "-l"])
if listing:
devices = [d.strip() for d in listing.splitlines() if d.strip()]Same idiom the file uses immediately above. Eliminates the wasted subprocess call and the theoretical None-deref in one change. Worth doing as a follow-up cleanup, but the PR does not need to block for it.
| "required_publication": env("CX_REQUIRED_PUBLICATION") or None, | ||
| "backend": backend, | ||
| "phase": phase, | ||
| "ep": integer("CX_EP", integer("CX_NGPUS", 1)), | ||
| "gpus_per_node": integer("CX_GPUS_PER_NODE", integer("CX_NGPUS", 1)), | ||
| "scale_up_domain": integer("CX_SCALE_UP_DOMAIN", integer("CX_NGPUS", 1)), | ||
| "dispatch_dtype": env("CX_DISPATCH_DTYPE", "bf16"), | ||
| "mode": env("CX_MODE", "normal"), | ||
| "contract": env("CX_MEASUREMENT_CONTRACT", "layout-and-dispatch-v1"), | ||
| "routing": env("CX_ROUTING", "uniform"), | ||
| "eplb": enabled("CX_EPLB"), | ||
| "combine_quant_mode": env("CX_COMBINE_QUANT_MODE", "none"), | ||
| "resource_mode": env("CX_RESOURCE_MODE", "tuned"), | ||
| "activation_profile": env("CX_ACTIVATION_PROFILE", "normal"), | ||
| "placement": env("CX_PLACEMENT", "packed"), | ||
| "routing_step": env("CX_ROUTING_STEP", "0"), | ||
| "uneven_tokens": env("CX_UNEVEN_TOKENS", "none"), | ||
| "tokens_ladder": env("CX_TOKENS_LADDER"), | ||
| "canonical": enabled("CX_CANONICAL"), | ||
| "sampling_contract": "fixed-512-v1", | ||
| "samples_per_point": integer("CX_SAMPLES_PER_POINT", 512), | ||
| "iters": integer("CX_ITERS", 8), | ||
| "trials": integer("CX_TRIALS", 64), | ||
| "warmup": integer("CX_WARMUP", 32), | ||
| "warmup_semantics": env( | ||
| "CX_WARMUP_SEMANTICS", "full-roundtrip-per-trial-point-v1" | ||
| ), |
There was a problem hiding this comment.
🟡 cx_emit_ep_failed_case (runtime/common.sh:256-287) builds failure.case without the hidden/topk/experts/nodes keys, but every matrix case emitted by sweep_matrix.py always carries all four. On the first sweep where any case exhausts its retries (flashinfer intermittent MNNVL, HybridEP/UCCL empty-rank, any deterministic rc=5), make_bundle's _identity_differences reports the same case_id four times as hidden=None!=7168,topk=None!=8,experts=None!=256,nodes=None!=1, and validate_expected_coverage piles on by re-listing that case in missing, so the aggregate job aborts with a dual-report that hides the real signal (the case failed all retries — the intended fail-closed behavior). Fix in either place is fine: add the four fields to cx_emit_ep_failed_case from CX_HIDDEN/CX_TOPK/CX_EXPERTS (defaults 7168/8/256) and CX_NGPUS/SLURM_NNODES, or make _identity_differences skip these fields when the actual doc is a failed-case.
Extended reasoning...
The observed behavior
With the PR merged and any sweep that produces a failed-case record for a scheduled case, the aggregate job will fail with a message like:
bundle: expected-matrix coverage failed (
missing_identity=0 missing=['cxv1-...'] extra=[] duplicates=[]
identity_mismatch=['cxv1-...:hidden=None!=7168,topk=None!=8,experts=None!=256,nodes=None!=1'])
The same case_id appears in both missing and identity_mismatch, and the mismatch string names four fields that have nothing to do with why the case actually failed.
Step-by-step proof
Take a concrete promoted case, say h100-dgxc/deepep/decode under ep-core-v1 (uniform, canonical, deepseek-v3-v1 defaults). sweep_matrix.py:181-186 builds the matrix entry with:
{
...,
"hidden": "", # h==7168 -> "" sentinel
"topk": "", # t==8 -> ""
"experts": "", # e==256 -> ""
"nodes": "1", # always str
...
}When every one of the 4 flashinfer attempts wedges on the intermittent MNNVL completion-flag deadlock (documented in run_in_container.sh around line 526), the last attempt's cx_emit_ep_failed_case writes a failed_*.json whose failure.case dict is missing the four keys entirely — the emitter reads CX_DISPATCH_DTYPE/CX_MODE/etc. but has no CX_HIDDEN/CX_TOPK/CX_EXPERTS/SLURM_NNODES reads.
aggregate_results.py keeps that failed-case doc as the newest for that case_id. Then make_bundle.py runs validate_expected_coverage:
_expected_case_identity(matrix_case)—"hidden" in caseis true (value""), soidentity["hidden"] = int("" or 7168) = 7168. Same for topk/experts (8/256)."nodes" in caseis true,identity["nodes"] = int("1") = 1. Expected identity contains{hidden: 7168, topk: 8, experts: 256, nodes: 1, ...}._actual_case_identity(failed_doc)(the failed-case branch, line 184-195) copiesfailure.caseverbatim, calls_expected_case_identity. None ofhidden/topk/experts/nodesare in that dict, so theif field in case:guard skips all four. Actual identity contains everything except the four scheduled shape fields._identity_differencesiterates the expected identity's items;actual_identity.get("hidden")isNone,None != 7168->hidden=None!=7168. Same for the other three.validate_expected_coverage(line 294-298) hits thedifferencesbranch, appends the case_id toidentity_mismatch, and does not add it toactual{}. Thenmissing = set(expected) - set(actual)(line 301) also contains that case_id. Line 319 raises the dual-reportSystemExit.
validate_results.py:validate_doc's failed-case schema (v5, ~lines 234-243) requires a different, smaller field set that happens to match what the emitter writes, so it stays silent about this desync. Only make_bundle notices, and only in a way that obscures the real cause.
Why this fires in practice
The PR explicitly builds in retry logic — CX_FLASHINFER_RETRIES defaults to 3 attempts, and both the container and rack launchers loop attempts and preserve a failed_*.json when all attempts fail. Retry-exhaustion is expected behavior for known intermittents, but the aggregate step will now report those as identity_mismatch + missing for hidden/topk/experts/nodes — the least informative signal possible.
Impact
Bundle validation still correctly rejects the incomplete run (the intended fail-closed behavior), and no incorrect data ships, so this is a diagnostic-clarity regression rather than a correctness bug. It will, however, cost real triage time in CI: an operator staring at hidden=None!=7168,topk=None!=8,experts=None!=256,nodes=None!=1 will not obviously infer "one flashinfer case exhausted its retries."
Fix
Either add the four fields to cx_emit_ep_failed_case (read CX_HIDDEN/CX_TOPK/CX_EXPERTS with defaults 7168/8/256, and CX_NGPUS/SLURM_NNODES for nodes), or teach _identity_differences/_actual_case_identity to drop these fields when the actual doc is a failed-case. Either way the two validators stay in sync.
758fa52 to
1c5b901
Compare
7e5f80a to
28cbac4
Compare
63c2335 to
57efb35
Compare
Freeze the 38-shard cross-vendor EP matrix on one 32-warmup, 512-observation protocol. Add native correctness, closed provenance, three-allocation promotion gates, and an isolated content-addressed filesystem publisher. Close defects exposed by rejected allocations: isolate AMD Enroot state; correct MoRI output shape and unweighted combine semantics; standardize activation-only combine across every adapter; stage pinned DeepEP sources before compute allocation; authenticate reusable build outputs; normalize Hybrid enum identity; query loaded NCCL/RCCL runtimes; and harden cleanup and failure classification. Normalize inherited B300 source-root permissions. Keep DeepEP V2 on PR #605 while pinning the official PR #630 scale-up fix, publish a stable extension evidence label with the real binary hash, require the realized NCCL LSA team to cover the full EP world when GIN is disabled, and key the exact five-kernel JIT evidence by realized topology and device code-generation inputs. 中文:完成隔离式 CollectiveX v1 专家并行基准测试套件。固定 38 个分片的跨厂商矩阵,统一采用 32 次预热和 512 个观测值,并加入原生正确性校验、严格溯源、三次独立分配晋级门槛及本地内容寻址文件系统发布器。 修复已拒绝分配暴露的问题:隔离 AMD Enroot 状态;修正 MoRI 输出形状及无权重 combine 语义;统一所有 adapter 的 activation-only combine 边界;在计算节点分配前暂存固定版本的 DeepEP 源码;校验可复用构建产物;规范化 Hybrid 枚举身份;从实际加载的 NCCL/RCCL 运行库读取版本;同时强化清理和失败分类。 规范化 B300 共享目录继承的权限。DeepEP V2 保持 PR #605 实现,并固定使用官方 PR #630 的纯 scale-up 修复;以稳定标签记录 extension 证据,同时保留真实二进制哈希;禁用 GIN 时要求 NCCL 实际建立的 LSA team 覆盖整个 EP world;并使用实际拓扑及设备代码生成参数隔离五个预期 JIT kernel 的证据。
Summary
Finalizes the isolated CollectiveX v1 expert-parallel communication benchmark under
experimental/CollectiveX/. The branch is ready for three complete no-canary qualification runs;it does not claim or include promoted v1 results yet.
Benchmark contract
with the official PR #630 scale-up fix,
DeepEP Hybrid, UCCL, MoRI, and an NCCL/RCCL reference.
cases / 532 points plus 132 explicit unsupported cases / 308 points.
component/trial/point, and exactly 512 observations for every case.
oracle-checked but are not returned through the timed combine path.
Qualification fixes
source-staging diagnostics only in private logs.
deep_ep._Cin public evidence while hashing the actual binary.full EP world; a smaller realized domain fails before timing or publication.
ranks, and requires exactly one artifact for each of the five expected kernels.
Correctness and artifacts
The native oracle validates expert-specific payloads, destinations, source identity, multiplicity,
weights, receive counts, combine values, and input immutability on every rank. Provenance binds the
verified image and squash bytes, implementation/build identity, loaded collective runtime, runtime
fingerprint, and generated-kernel evidence.
GitHub result artifacts are transient delivery inputs to an isolated local content-addressed filesystem
publisher. The pinned-source artifact is execution-only, is rejected by the publisher, and expires after
three days. Promotion requires exactly three complete independent runs from one source SHA, exact
coverage, stable p50/p99 evidence, stable ordering, and complete controlled cohorts. No managed
database, managed object store, or third-party result hosting is introduced.
The tracked tree and reachable branch history contain none of the private runner endpoint literals.
experimental/CollectiveX/configs/platforms.yamlis absent from Git, ignored, and used only as alocal operator note.
Validation
292e05f8faccaa4971eda527a327190a9943e99d4f71611987f7b95f57f253e8.8:64:32, 512 samples per point, and one warmup contract.bash -n, ShellCheck, Actionlint, andgit diff --check.Git history.
中文说明
本 PR 完成位于
experimental/CollectiveX/的隔离式 CollectiveX v1 专家并行(EP)通信基准测试。当前分支已准备执行三轮完整、无 canary 的资格验证;目前尚未宣称或提交任何已晋级的 v1 结果。
基准测试约定
及官方 PR #630 scale-up 修复、DeepEP Hybrid、
UCCL、MoRI,以及 NCCL/RCCL 参考后端。
532 个数据点,另有 132 个明确标记为不支持的用例 / 308 个数据点。
往返预热,最终严格得到 512 个观测值。
oracle 校验,但不会通过被测 combine 路径返回。
资格验证修复
deep_ep._C标签,同时哈希真实 extension 二进制内容。如果实际 domain 更小,则在计时和发布前直接失败。
要求五个预期 kernel 各有一份产物。
正确性与产物
原生 oracle 在每个 rank 校验专家特定的 payload、目标、源身份、重复次数、权重、接收计数、
combine 数值及输入不可变性。溯源信息绑定已验证的镜像与 squash 内容、实现/构建身份、实际加载的
collective runtime、运行时指纹及生成 kernel 证据。
GitHub 结果产物仅作为临时传输输入,最终写入隔离的本地内容寻址文件系统发布器。固定版本源码产物只用于执行,发布器会明确拒绝该产物,并在三天后过期。只有来自同一 source
SHA 的三轮完整独立运行同时满足精确覆盖、p50/p99 稳定性、排序稳定性及完整受控 cohort,才允许
晋级。不引入托管数据库、托管对象存储或第三方结果托管服务。
受跟踪文件和可达分支历史均不包含私有 runner endpoint 字面值。
experimental/CollectiveX/configs/platforms.yaml不受 Git 跟踪,已被忽略,仅作为本地 operator备注使用。
验证
292e05f8faccaa4971eda527a327190a9943e99d4f71611987f7b95f57f253e8。8:64:32timing、每个数据点 512 个样本及同一预热约定。bash -n、ShellCheck、Actionlint 和git diff --check。