[Klaud Cold] Remove disallowed --hf-overrides indexer override from DSV4 FP4 MI355X ATOM / 移除 DSV4 FP4 MI355X ATOM 中不允许的 --hf-overrides indexer 覆盖 by functionstackx · Pull Request #2037 · SemiAnalysisAI/InferenceX

functionstackx · 2026-07-04T03:28:53Z

Summary

Removes OPT_ARGS=(--hf-overrides '{"use_index_cache": true, "index_topk_freq": 4}') from benchmarks/single_node/fixed_seq_len/dsv4_fp4_mi355x_atom.sh — this override reuses cached indices to skip the indexer on 3 of every 4 layers, reducing model-architecture FLOPs, which PR_REVIEW_CHECKLIST.md explicitly disallows (and the signoff verifier's Check 7 now rejects).
Appends the required perf-changelog.yaml entry for dsv4-fp4-mi355x-atom.
full-sweep-fail-fast label applied for PR validation.

中文说明

从 benchmarks/single_node/fixed_seq_len/dsv4_fp4_mi355x_atom.sh 中移除 OPT_ARGS=(--hf-overrides '{"use_index_cache": true, "index_topk_freq": 4}') — 该覆盖通过复用缓存索引使每 4 层中有 3 层跳过 indexer，减少了模型架构 FLOPs，PR_REVIEW_CHECKLIST.md 明确禁止此类优化（签署验证机器人的 Check 7 现在也会拒绝）。
为 dsv4-fp4-mi355x-atom 追加所需的 perf-changelog.yaml 条目。
已添加 full-sweep-fail-fast 标签用于 PR 验证。

🤖 Generated with Claude Code

…ATOM The --hf-overrides '{"use_index_cache": true, "index_topk_freq": 4}' override skips the indexer on 3 of every 4 layers, reducing model architecture FLOPs — not allowed per PR_REVIEW_CHECKLIST.md. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

github-actions · 2026-07-04T03:28:59Z

Thanks for the contribution! Please reach out to respective companies' CODEOWNER to fill in the latest PR_REVIEW_CHECKLIST.md before pinging core maintainer on Slack for review. In order for the signoff PR check bot to trigger, you must follow the PR_REVIEW_CHECKLIST.md template correctly, including the phrase As a PR reviewer and CODEOWNER, I have reviewed this and have.

For PR verification, add the full-sweep-enabled or full-sweep-fail-fast label to this PR — the benchmark sweep only runs on labeled PRs.

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. See GitHub's docs on re-running failed jobs

感谢你的贡献！请联系相应公司的 CODEOWNER 填写最新的 PR_REVIEW_CHECKLIST.md，然后再在 Slack 上联系核心维护者进行审阅。为了触发 signoff PR 检查机器人，你必须正确遵循 PR_REVIEW_CHECKLIST.md 模板，包括保留英文语句 As a PR reviewer and CODEOWNER, I have reviewed this and have。

如需进行 PR 验证，请为此 PR 添加 full-sweep-enabled 或 full-sweep-fail-fast 标签 — 基准测试 sweep 仅在带有标签的 PR 上运行。

PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动（flake），重新运行失败的任务即可解决。参见 GitHub 关于重新运行失败任务的文档

github-actions · 2026-07-04T03:31:31Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28693549865
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=28693549865

claude · 2026-07-04T03:38:24Z

 export ATOM_MOE_GU_ITLV=1
 MEM_FRAC_STATIC=0.9
-OPT_ARGS=(--hf-overrides '{"use_index_cache": true, "index_topk_freq": 4}')



🟣 The same disallowed --hf-overrides '{"use_index_cache": true, "index_topk_freq": 4}' indexer-skipping override that this PR removes from the single-node ATOM recipe is still present in benchmarks/multi_node/amd_utils/server_atom.sh:155, gated on MODEL_NAME == "DeepSeek-V4-Pro" and reached via the live dsv4-fp4-mi355x-atom-disagg multi-node recipe. This is pre-existing (this PR does not touch the multi-node file) but is a directly-analogous same-class violation on the same model — worth cleaning up in a follow-up so PR_REVIEW_CHECKLIST.md Check 7 is satisfied on both paths.

Extended reasoning...

What the bug is. PR_REVIEW_CHECKLIST.md Check 7 forbids --hf-overrides that skip the indexer every N layers on models that don't natively support it, because that reduces model-architecture FLOPs. This PR correctly removes such an override from the single-node recipe benchmarks/single_node/fixed_seq_len/dsv4_fp4_mi355x_atom.sh. However, the identical override still lives in the multi-node path.\n\nWhere it lives. benchmarks/multi_node/amd_utils/server_atom.sh:152-156 contains:\n\nbash\nHF_OVERRIDES_ARG=""\nif [[ "$MODEL_NAME" == "DeepSeek-V4-Pro" ]]; then\n HF_OVERRIDES_ARG="--hf-overrides '{\"use_index_cache\":true,\"index_topk_freq\":4}'"\nfi\n\n\nSame technique (use_index_cache + index_topk_freq=4, i.e. reuse cached indices to skip the indexer on 3 of every 4 layers), same model (DeepSeek-V4-Pro).\n\nProof this path is live.\n1. configs/amd-master.yaml:2407 defines dsv4-fp4-mi355x-atom-disagg with framework: atom-disagg and multinode: true.\n2. benchmarks/multi_node/amd_utils/server.sh:20 routes ENGINE=atom-disagg to server_atom.sh.\n3. server_atom.sh:155 then appends HF_OVERRIDES_ARG to the server command when the model is DSV4-Pro — the exact same disallowed optimization this PR is removing.\n\nWhy existing code doesn't prevent it. The checklist rule is written in general terms ("benchmark hacks like using --hf-overrides to skipping indexer for every x layers on models that don't natively support this") and its scan target is server args in benchmarks/** — it does not exempt disagg or multi-node. Nothing in the multi-node ATOM path silently disables the override; it applies unconditionally whenever the model name matches.\n\nImpact. DSV4 multi-node ATOM benchmark numbers continue to be produced with reduced model-architecture FLOPs, which the checklist explicitly disallows. If a signoff/PR review of the multi-node recipe applies the same rule this PR is applying to single-node, the multi-node ATOM results will need to be re-generated.\n\nHow to fix. Delete the two-line if [[ "$MODEL_NAME" == "DeepSeek-V4-Pro" ]] block in benchmarks/multi_node/amd_utils/server_atom.sh (and drop the now-unused HF_OVERRIDES_ARG expansion on the server command a few lines below), and add a matching perf-changelog.yaml entry for the affected multi-node config-key (dsv4-fp4-mi355x-atom-disagg).\n\nSeverity. Pre-existing — this PR is scoped to the single-node recipe and does not touch server_atom.sh, so this should not block merging, but the maintainer should be aware there is a directly-analogous cleanup remaining on the multi-node path.

github-actions · 2026-07-04T07:58:52Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28693549865
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=28693549865

functionstackx requested a review from a team July 4, 2026 03:28

github-project-automation Bot added this to InferenceMAX Board Jul 4, 2026

functionstackx added the full-sweep-fail-fast label Jul 4, 2026

functionstackx mentioned this pull request Jul 4, 2026

[Klaud Cold] Remove disallowed --hf-overrides indexer override from DSV4 ATOM disagg / 移除 DSV4 ATOM disagg 中不允许的 --hf-overrides indexer 覆盖 #2038

Open

claude Bot reviewed Jul 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Klaud Cold] Remove disallowed --hf-overrides indexer override from DSV4 FP4 MI355X ATOM / 移除 DSV4 FP4 MI355X ATOM 中不允许的 --hf-overrides indexer 覆盖#2037

[Klaud Cold] Remove disallowed --hf-overrides indexer override from DSV4 FP4 MI355X ATOM / 移除 DSV4 FP4 MI355X ATOM 中不允许的 --hf-overrides indexer 覆盖#2037
functionstackx wants to merge 1 commit into
mainfrom
klaud/dsv4-fp4-mi355x-atom-remove-hf-overrides

functionstackx commented Jul 4, 2026

Uh oh!

github-actions Bot commented Jul 4, 2026

Uh oh!

github-actions Bot commented Jul 4, 2026

Uh oh!

claude Bot Jul 4, 2026

Uh oh!

github-actions Bot commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

functionstackx commented Jul 4, 2026

Summary

中文说明

Uh oh!

github-actions Bot commented Jul 4, 2026

Uh oh!

github-actions Bot commented Jul 4, 2026

Uh oh!

claude Bot Jul 4, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant