Skip to content

[Klaud Cold] Remove disallowed --hf-overrides indexer override from DSV4 FP4 MI355X ATOM / 移除 DSV4 FP4 MI355X ATOM 中不允许的 --hf-overrides indexer 覆盖#2037

Open
functionstackx wants to merge 1 commit into
mainfrom
klaud/dsv4-fp4-mi355x-atom-remove-hf-overrides
Open

[Klaud Cold] Remove disallowed --hf-overrides indexer override from DSV4 FP4 MI355X ATOM / 移除 DSV4 FP4 MI355X ATOM 中不允许的 --hf-overrides indexer 覆盖#2037
functionstackx wants to merge 1 commit into
mainfrom
klaud/dsv4-fp4-mi355x-atom-remove-hf-overrides

Conversation

@functionstackx

Copy link
Copy Markdown
Collaborator

Summary

  • Removes OPT_ARGS=(--hf-overrides '{"use_index_cache": true, "index_topk_freq": 4}') from benchmarks/single_node/fixed_seq_len/dsv4_fp4_mi355x_atom.sh — this override reuses cached indices to skip the indexer on 3 of every 4 layers, reducing model-architecture FLOPs, which PR_REVIEW_CHECKLIST.md explicitly disallows (and the signoff verifier's Check 7 now rejects).
  • Appends the required perf-changelog.yaml entry for dsv4-fp4-mi355x-atom.
  • full-sweep-fail-fast label applied for PR validation.

中文说明

  • benchmarks/single_node/fixed_seq_len/dsv4_fp4_mi355x_atom.sh 中移除 OPT_ARGS=(--hf-overrides '{"use_index_cache": true, "index_topk_freq": 4}') — 该覆盖通过复用缓存索引使每 4 层中有 3 层跳过 indexer,减少了模型架构 FLOPs,PR_REVIEW_CHECKLIST.md 明确禁止此类优化(签署验证机器人的 Check 7 现在也会拒绝)。
  • dsv4-fp4-mi355x-atom 追加所需的 perf-changelog.yaml 条目。
  • 已添加 full-sweep-fail-fast 标签用于 PR 验证。

🤖 Generated with Claude Code

…ATOM

The --hf-overrides '{"use_index_cache": true, "index_topk_freq": 4}'
override skips the indexer on 3 of every 4 layers, reducing model
architecture FLOPs — not allowed per PR_REVIEW_CHECKLIST.md.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

Thanks for the contribution! Please reach out to respective companies' CODEOWNER to fill in the latest PR_REVIEW_CHECKLIST.md before pinging core maintainer on Slack for review. In order for the signoff PR check bot to trigger, you must follow the PR_REVIEW_CHECKLIST.md template correctly, including the phrase As a PR reviewer and CODEOWNER, I have reviewed this and have.

For PR verification, add the full-sweep-enabled or full-sweep-fail-fast label to this PR — the benchmark sweep only runs on labeled PRs.

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. See GitHub's docs on re-running failed jobs


感谢你的贡献!请联系相应公司的 CODEOWNER 填写最新的 PR_REVIEW_CHECKLIST.md,然后再在 Slack 上联系核心维护者进行审阅。为了触发 signoff PR 检查机器人,你必须正确遵循 PR_REVIEW_CHECKLIST.md 模板,包括保留英文语句 As a PR reviewer and CODEOWNER, I have reviewed this and have

如需进行 PR 验证,请为此 PR 添加 full-sweep-enabledfull-sweep-fail-fast 标签 — 基准测试 sweep 仅在带有标签的 PR 上运行。

PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动(flake),重新运行失败的任务即可解决。参见 GitHub 关于重新运行失败任务的文档

@github-actions

github-actions Bot commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

export ATOM_MOE_GU_ITLV=1
MEM_FRAC_STATIC=0.9
OPT_ARGS=(--hf-overrides '{"use_index_cache": true, "index_topk_freq": 4}')

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟣 The same disallowed --hf-overrides '{"use_index_cache": true, "index_topk_freq": 4}' indexer-skipping override that this PR removes from the single-node ATOM recipe is still present in benchmarks/multi_node/amd_utils/server_atom.sh:155, gated on MODEL_NAME == "DeepSeek-V4-Pro" and reached via the live dsv4-fp4-mi355x-atom-disagg multi-node recipe. This is pre-existing (this PR does not touch the multi-node file) but is a directly-analogous same-class violation on the same model — worth cleaning up in a follow-up so PR_REVIEW_CHECKLIST.md Check 7 is satisfied on both paths.

Extended reasoning...

What the bug is. PR_REVIEW_CHECKLIST.md Check 7 forbids --hf-overrides that skip the indexer every N layers on models that don't natively support it, because that reduces model-architecture FLOPs. This PR correctly removes such an override from the single-node recipe benchmarks/single_node/fixed_seq_len/dsv4_fp4_mi355x_atom.sh. However, the identical override still lives in the multi-node path.\n\nWhere it lives. benchmarks/multi_node/amd_utils/server_atom.sh:152-156 contains:\n\nbash\nHF_OVERRIDES_ARG=""\nif [[ "$MODEL_NAME" == "DeepSeek-V4-Pro" ]]; then\n HF_OVERRIDES_ARG="--hf-overrides '{\"use_index_cache\":true,\"index_topk_freq\":4}'"\nfi\n\n\nSame technique (use_index_cache + index_topk_freq=4, i.e. reuse cached indices to skip the indexer on 3 of every 4 layers), same model (DeepSeek-V4-Pro).\n\nProof this path is live.\n1. configs/amd-master.yaml:2407 defines dsv4-fp4-mi355x-atom-disagg with framework: atom-disagg and multinode: true.\n2. benchmarks/multi_node/amd_utils/server.sh:20 routes ENGINE=atom-disagg to server_atom.sh.\n3. server_atom.sh:155 then appends HF_OVERRIDES_ARG to the server command when the model is DSV4-Pro — the exact same disallowed optimization this PR is removing.\n\nWhy existing code doesn't prevent it. The checklist rule is written in general terms ("benchmark hacks like using --hf-overrides to skipping indexer for every x layers on models that don't natively support this") and its scan target is server args in benchmarks/** — it does not exempt disagg or multi-node. Nothing in the multi-node ATOM path silently disables the override; it applies unconditionally whenever the model name matches.\n\nImpact. DSV4 multi-node ATOM benchmark numbers continue to be produced with reduced model-architecture FLOPs, which the checklist explicitly disallows. If a signoff/PR review of the multi-node recipe applies the same rule this PR is applying to single-node, the multi-node ATOM results will need to be re-generated.\n\nHow to fix. Delete the two-line if [[ "$MODEL_NAME" == "DeepSeek-V4-Pro" ]] block in benchmarks/multi_node/amd_utils/server_atom.sh (and drop the now-unused HF_OVERRIDES_ARG expansion on the server command a few lines below), and add a matching perf-changelog.yaml entry for the affected multi-node config-key (dsv4-fp4-mi355x-atom-disagg).\n\nSeverity. Pre-existing — this PR is scoped to the single-node recipe and does not touch server_atom.sh, so this should not block merging, but the maintainer should be aware there is a directly-analogous cleanup remaining on the multi-node path.

@github-actions

github-actions Bot commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant