[Klaud Cold] Remove disallowed --hf-overrides indexer override from DSV4 FP4 MI355X ATOM / 移除 DSV4 FP4 MI355X ATOM 中不允许的 --hf-overrides indexer 覆盖#2037
Conversation
…ATOM
The --hf-overrides '{"use_index_cache": true, "index_topk_freq": 4}'
override skips the indexer on 3 of every 4 layers, reducing model
architecture FLOPs — not allowed per PR_REVIEW_CHECKLIST.md.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
Thanks for the contribution! Please reach out to respective companies' CODEOWNER to fill in the latest PR_REVIEW_CHECKLIST.md before pinging core maintainer on Slack for review. In order for the signoff PR check bot to trigger, you must follow the PR_REVIEW_CHECKLIST.md template correctly, including the phrase For PR verification, add the PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. See GitHub's docs on re-running failed jobs 感谢你的贡献!请联系相应公司的 CODEOWNER 填写最新的 PR_REVIEW_CHECKLIST.md,然后再在 Slack 上联系核心维护者进行审阅。为了触发 signoff PR 检查机器人,你必须正确遵循 PR_REVIEW_CHECKLIST.md 模板,包括保留英文语句 如需进行 PR 验证,请为此 PR 添加 PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动(flake),重新运行失败的任务即可解决。参见 GitHub 关于重新运行失败任务的文档 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28693549865 |
| export ATOM_MOE_GU_ITLV=1 | ||
| MEM_FRAC_STATIC=0.9 | ||
| OPT_ARGS=(--hf-overrides '{"use_index_cache": true, "index_topk_freq": 4}') | ||
|
|
There was a problem hiding this comment.
🟣 The same disallowed --hf-overrides '{"use_index_cache": true, "index_topk_freq": 4}' indexer-skipping override that this PR removes from the single-node ATOM recipe is still present in benchmarks/multi_node/amd_utils/server_atom.sh:155, gated on MODEL_NAME == "DeepSeek-V4-Pro" and reached via the live dsv4-fp4-mi355x-atom-disagg multi-node recipe. This is pre-existing (this PR does not touch the multi-node file) but is a directly-analogous same-class violation on the same model — worth cleaning up in a follow-up so PR_REVIEW_CHECKLIST.md Check 7 is satisfied on both paths.
Extended reasoning...
What the bug is. PR_REVIEW_CHECKLIST.md Check 7 forbids --hf-overrides that skip the indexer every N layers on models that don't natively support it, because that reduces model-architecture FLOPs. This PR correctly removes such an override from the single-node recipe benchmarks/single_node/fixed_seq_len/dsv4_fp4_mi355x_atom.sh. However, the identical override still lives in the multi-node path.\n\nWhere it lives. benchmarks/multi_node/amd_utils/server_atom.sh:152-156 contains:\n\nbash\nHF_OVERRIDES_ARG=""\nif [[ "$MODEL_NAME" == "DeepSeek-V4-Pro" ]]; then\n HF_OVERRIDES_ARG="--hf-overrides '{\"use_index_cache\":true,\"index_topk_freq\":4}'"\nfi\n\n\nSame technique (use_index_cache + index_topk_freq=4, i.e. reuse cached indices to skip the indexer on 3 of every 4 layers), same model (DeepSeek-V4-Pro).\n\nProof this path is live.\n1. configs/amd-master.yaml:2407 defines dsv4-fp4-mi355x-atom-disagg with framework: atom-disagg and multinode: true.\n2. benchmarks/multi_node/amd_utils/server.sh:20 routes ENGINE=atom-disagg to server_atom.sh.\n3. server_atom.sh:155 then appends HF_OVERRIDES_ARG to the server command when the model is DSV4-Pro — the exact same disallowed optimization this PR is removing.\n\nWhy existing code doesn't prevent it. The checklist rule is written in general terms ("benchmark hacks like using --hf-overrides to skipping indexer for every x layers on models that don't natively support this") and its scan target is server args in benchmarks/** — it does not exempt disagg or multi-node. Nothing in the multi-node ATOM path silently disables the override; it applies unconditionally whenever the model name matches.\n\nImpact. DSV4 multi-node ATOM benchmark numbers continue to be produced with reduced model-architecture FLOPs, which the checklist explicitly disallows. If a signoff/PR review of the multi-node recipe applies the same rule this PR is applying to single-node, the multi-node ATOM results will need to be re-generated.\n\nHow to fix. Delete the two-line if [[ "$MODEL_NAME" == "DeepSeek-V4-Pro" ]] block in benchmarks/multi_node/amd_utils/server_atom.sh (and drop the now-unused HF_OVERRIDES_ARG expansion on the server command a few lines below), and add a matching perf-changelog.yaml entry for the affected multi-node config-key (dsv4-fp4-mi355x-atom-disagg).\n\nSeverity. Pre-existing — this PR is scoped to the single-node recipe and does not touch server_atom.sh, so this should not block merging, but the maintainer should be aware there is a directly-analogous cleanup remaining on the multi-node path.
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28693549865 |
Summary
OPT_ARGS=(--hf-overrides '{"use_index_cache": true, "index_topk_freq": 4}')frombenchmarks/single_node/fixed_seq_len/dsv4_fp4_mi355x_atom.sh— this override reuses cached indices to skip the indexer on 3 of every 4 layers, reducing model-architecture FLOPs, which PR_REVIEW_CHECKLIST.md explicitly disallows (and the signoff verifier's Check 7 now rejects).perf-changelog.yamlentry fordsv4-fp4-mi355x-atom.full-sweep-fail-fastlabel applied for PR validation.中文说明
benchmarks/single_node/fixed_seq_len/dsv4_fp4_mi355x_atom.sh中移除OPT_ARGS=(--hf-overrides '{"use_index_cache": true, "index_topk_freq": 4}')— 该覆盖通过复用缓存索引使每 4 层中有 3 层跳过 indexer,减少了模型架构 FLOPs,PR_REVIEW_CHECKLIST.md 明确禁止此类优化(签署验证机器人的 Check 7 现在也会拒绝)。dsv4-fp4-mi355x-atom追加所需的perf-changelog.yaml条目。full-sweep-fail-fast标签用于 PR 验证。🤖 Generated with Claude Code