[Klaud Cold] Update kimik2.5-fp4-b300-vllm vLLM image to v0.24.0 / 将 kimik2.5-fp4-b300-vllm 的 vLLM 镜像 升级至 v0.24.0#2076
Conversation
中文:将 kimik2.5-fp4-b300-vllm 的 vLLM 镜像 升级至 v0.24.0。 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
d5379c3 to
df44c1c
Compare
|
Thanks for the contribution! Please reach out to respective companies' CODEOWNER to fill in the latest PR_REVIEW_CHECKLIST.md before pinging core maintainer on Slack for review. In order for the signoff PR check bot to trigger, you must follow the PR_REVIEW_CHECKLIST.md template correctly, including the phrase For PR verification, add the PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. See GitHub's docs on re-running failed jobs 感谢你的贡献!请联系相应公司的 CODEOWNER 填写最新的 PR_REVIEW_CHECKLIST.md,然后再在 Slack 上联系核心维护者进行审阅。为了触发 signoff PR 检查机器人,你必须正确遵循 PR_REVIEW_CHECKLIST.md 模板,包括保留英文语句 如需进行 PR 验证,请为此 PR 添加 PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动(flake),重新运行失败的任务即可解决。参见 GitHub 关于重新运行失败任务的文档 |
1 similar comment
|
Thanks for the contribution! Please reach out to respective companies' CODEOWNER to fill in the latest PR_REVIEW_CHECKLIST.md before pinging core maintainer on Slack for review. In order for the signoff PR check bot to trigger, you must follow the PR_REVIEW_CHECKLIST.md template correctly, including the phrase For PR verification, add the PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. See GitHub's docs on re-running failed jobs 感谢你的贡献!请联系相应公司的 CODEOWNER 填写最新的 PR_REVIEW_CHECKLIST.md,然后再在 Slack 上联系核心维护者进行审阅。为了触发 signoff PR 检查机器人,你必须正确遵循 PR_REVIEW_CHECKLIST.md 模板,包括保留英文语句 如需进行 PR 验证,请为此 PR 添加 PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动(flake),重新运行失败的任务即可解决。参见 GitHub 关于重新运行失败任务的文档 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28720107429 |
There was a problem hiding this comment.
LGTM — straightforward image bump following the established pattern, with a matching perf-changelog entry.
Extended reasoning...
Overview
This PR bumps the vLLM image for the kimik2.5-fp4-b300-vllm recipe from v0.22.0 to v0.24.0 in configs/nvidia-master.yaml and adds a corresponding perf-changelog.yaml entry pointing at PR #2076. The changes are two lines of substance in total.
Security risks
None. This is a container image tag update to an upstream vLLM image; no auth, crypto, permissions, or user-input paths are touched.
Level of scrutiny
Low. This is a mechanical, well-patterned dependency bump for a single benchmark recipe, gated by the full-sweep-fail-fast label so CI will validate the runtime before merge.
Other factors
The bug hunter flagged a nit about a stale "reuses B200 recipe as-is" comment now that the B200/B300 image tags have diverged. It's a documentation-drift nit, not a correctness issue, and doesn't warrant blocking the PR.
|
|
||
| kimik2.5-fp4-b300-vllm: | ||
| image: vllm/vllm-openai:v0.22.0 | ||
| image: vllm/vllm-openai:v0.24.0 |
There was a problem hiding this comment.
🟡 The comment above this block (lines 2900-2902) says the B300 config "reuses the existing Kimi-K2.5 FP4 B200 vLLM recipe as-is until B300-specific tuning is available," but after this bump the B200 config still pins vllm/vllm-openai:v0.22.0 while B300 is on v0.24.0 — so the image field of the recipe has diverged and "as-is" is no longer strictly accurate. Consider a one-line tweak to the comment (e.g. "derived from the B200 recipe; image has since been bumped independently") or bump the B200 image in the same PR.
上方注释(2900-2902 行)写道 B300 配置"reuses the existing Kimi-K2.5 FP4 B200 vLLM recipe as-is until B300-specific tuning is available",但本次升级后 B200 配置仍固定在 vllm/vllm-openai:v0.22.0,而 B300 已切到 v0.24.0 —— 因此 recipe 中的 image 字段已经出现分歧,"as-is"在字面上已不再成立。建议要么微调该注释(例如改为"派生自 B200 recipe;image 已独立升级"),要么在本 PR 中同步将 B200 的 image 升级到 v0.24.0。
Extended reasoning...
What the comment says vs. what the code now does. Lines 2900-2902 in configs/nvidia-master.yaml contain an explanatory NOTE above the kimik2.5-fp4-b300-vllm block:
At the time of submission, https://docs.vllm.ai/projects/recipes/en/latest/moonshotai/Kimi-K2.5.html does not have a B300-specific recipe, so this config reuses the existing Kimi-K2.5 FP4 B200 vLLM recipe as-is until B300-specific tuning is available.
After this PR:
kimik2.5-fp4-b200-vllm(line 2880):image: vllm/vllm-openai:v0.22.0kimik2.5-fp4-b300-vllm(line 2905):image: vllm/vllm-openai:v0.24.0
Step-by-step check. Diffing the two YAML blocks field-by-field after the bump:
model,model-prefix,precision,framework,runner(differs by design: b200 vs b300),scenarios, and the search-space (tp/ep/concurrency ranges) all still match B200. So the substantive "no B300-specific tuning" claim is still true.image: B200 = v0.22.0, B300 = v0.24.0. The two recipes now differ in exactly one field — the image tag.
Since image is part of the recipe, "reuses ... as-is" is no longer literally accurate.
Why the refutation is worth taking seriously. One verifier argued the comment's essential meaning is about B300-specific tuning — i.e., that no B300-tailored scenario/search-space work has been done — not about the image tag, which is routine dependency maintenance shared with the B200 line. That framing is reasonable, and it's why I'm flagging this as a nit rather than a blocker: the load-bearing claim of the comment (upstream lacks a B300 recipe, so tuning parameters are copied from B200) remains true. The "as-is" phrasing is just slightly loose given the image divergence.
Impact. None on benchmark execution — this is purely comment/documentation drift. The value of flagging it is small (readers of the file may be briefly confused when they see the image tags differ), and the fix is a one-line comment edit or a companion image bump on the B200 recipe.
Suggested fix. Either (a) reword the note to something like "...reuses the B200 recipe's tuning parameters until B300-specific tuning is available (the image tag has been bumped independently)," or (b) also bump the B200 image to v0.24.0 in this PR and add kimik2.5-fp4-b200-vllm to the changelog config-keys. Option (a) is lower-risk since it doesn't require re-running B200 benchmarks.
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28720107429 |
|
Canary blocked by cluster-side model staging, not the image bump: |
|
Closing: blocked by NVIDIA cluster-side model staging (NVFP4 checkpoint missing/unreadable) or image-pull lock issues. Will re-cut once the cluster staging is fixed. |
Summary
Update vLLM image from vllm/vllm-openai:v0.22.0 to vllm/vllm-openai:v0.24.0
Recipes touched:
kimik2.5-fp4-b300-vllm中文说明
将 vLLM 镜像 从 vllm/vllm-openai:v0.22.0 升级至 vllm/vllm-openai:v0.24.0。涉及配置:
kimik2.5-fp4-b300-vllm。Test plan
🤖 Generated with Claude Code