[Klaud Cold] Update kimik2.5-fp4-b300-vllm vLLM image to v0.24.0 / 将 kimik2.5-fp4-b300-vllm 的 vLLM 镜像升级至 v0.24.0 by functionstackx · Pull Request #2076 · SemiAnalysisAI/InferenceX

functionstackx · 2026-07-04T21:24:34Z

Summary

Update vLLM image from vllm/vllm-openai:v0.22.0 to vllm/vllm-openai:v0.24.0

Recipes touched: kimik2.5-fp4-b300-vllm

中文说明

将 vLLM 镜像从 vllm/vllm-openai:v0.22.0 升级至 vllm/vllm-openai:v0.24.0。涉及配置：kimik2.5-fp4-b300-vllm。

Test plan

full-sweep-fail-fast sweep passes.

🤖 Generated with Claude Code

中文：将 kimik2.5-fp4-b300-vllm 的 vLLM 镜像升级至 v0.24.0。 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

github-actions · 2026-07-04T21:24:41Z

Thanks for the contribution! Please reach out to respective companies' CODEOWNER to fill in the latest PR_REVIEW_CHECKLIST.md before pinging core maintainer on Slack for review. In order for the signoff PR check bot to trigger, you must follow the PR_REVIEW_CHECKLIST.md template correctly, including the phrase As a PR reviewer and CODEOWNER, I have reviewed this and have.

For PR verification, add the full-sweep-enabled or full-sweep-fail-fast label to this PR — the benchmark sweep only runs on labeled PRs.

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. See GitHub's docs on re-running failed jobs

感谢你的贡献！请联系相应公司的 CODEOWNER 填写最新的 PR_REVIEW_CHECKLIST.md，然后再在 Slack 上联系核心维护者进行审阅。为了触发 signoff PR 检查机器人，你必须正确遵循 PR_REVIEW_CHECKLIST.md 模板，包括保留英文语句 As a PR reviewer and CODEOWNER, I have reviewed this and have。

如需进行 PR 验证，请为此 PR 添加 full-sweep-enabled 或 full-sweep-fail-fast 标签 — 基准测试 sweep 仅在带有标签的 PR 上运行。

PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动（flake），重新运行失败的任务即可解决。参见 GitHub 关于重新运行失败任务的文档

github-actions · 2026-07-04T21:24:41Z

Thanks for the contribution! Please reach out to respective companies' CODEOWNER to fill in the latest PR_REVIEW_CHECKLIST.md before pinging core maintainer on Slack for review. In order for the signoff PR check bot to trigger, you must follow the PR_REVIEW_CHECKLIST.md template correctly, including the phrase As a PR reviewer and CODEOWNER, I have reviewed this and have.

For PR verification, add the full-sweep-enabled or full-sweep-fail-fast label to this PR — the benchmark sweep only runs on labeled PRs.

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. See GitHub's docs on re-running failed jobs

感谢你的贡献！请联系相应公司的 CODEOWNER 填写最新的 PR_REVIEW_CHECKLIST.md，然后再在 Slack 上联系核心维护者进行审阅。为了触发 signoff PR 检查机器人，你必须正确遵循 PR_REVIEW_CHECKLIST.md 模板，包括保留英文语句 As a PR reviewer and CODEOWNER, I have reviewed this and have。

如需进行 PR 验证，请为此 PR 添加 full-sweep-enabled 或 full-sweep-fail-fast 标签 — 基准测试 sweep 仅在带有标签的 PR 上运行。

PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动（flake），重新运行失败的任务即可解决。参见 GitHub 关于重新运行失败任务的文档

github-actions · 2026-07-04T21:27:09Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28720107429
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=28720107429

claude

LGTM — straightforward image bump following the established pattern, with a matching perf-changelog entry.

Extended reasoning...

Overview

This PR bumps the vLLM image for the kimik2.5-fp4-b300-vllm recipe from v0.22.0 to v0.24.0 in configs/nvidia-master.yaml and adds a corresponding perf-changelog.yaml entry pointing at PR #2076. The changes are two lines of substance in total.

Security risks

None. This is a container image tag update to an upstream vLLM image; no auth, crypto, permissions, or user-input paths are touched.

Level of scrutiny

Low. This is a mechanical, well-patterned dependency bump for a single benchmark recipe, gated by the full-sweep-fail-fast label so CI will validate the runtime before merge.

Other factors

The bug hunter flagged a nit about a stale "reuses B200 recipe as-is" comment now that the B200/B300 image tags have diverged. It's a documentation-drift nit, not a correctness issue, and doesn't warrant blocking the PR.

claude · 2026-07-04T21:28:33Z


 kimik2.5-fp4-b300-vllm:
-  image: vllm/vllm-openai:v0.22.0
+  image: vllm/vllm-openai:v0.24.0


🟡 The comment above this block (lines 2900-2902) says the B300 config "reuses the existing Kimi-K2.5 FP4 B200 vLLM recipe as-is until B300-specific tuning is available," but after this bump the B200 config still pins vllm/vllm-openai:v0.22.0 while B300 is on v0.24.0 — so the image field of the recipe has diverged and "as-is" is no longer strictly accurate. Consider a one-line tweak to the comment (e.g. "derived from the B200 recipe; image has since been bumped independently") or bump the B200 image in the same PR.

上方注释（2900-2902 行）写道 B300 配置"reuses the existing Kimi-K2.5 FP4 B200 vLLM recipe as-is until B300-specific tuning is available"，但本次升级后 B200 配置仍固定在 vllm/vllm-openai:v0.22.0，而 B300 已切到 v0.24.0 —— 因此 recipe 中的 image 字段已经出现分歧，"as-is"在字面上已不再成立。建议要么微调该注释（例如改为"派生自 B200 recipe；image 已独立升级"），要么在本 PR 中同步将 B200 的 image 升级到 v0.24.0。

Extended reasoning...

What the comment says vs. what the code now does. Lines 2900-2902 in configs/nvidia-master.yaml contain an explanatory NOTE above the kimik2.5-fp4-b300-vllm block:

At the time of submission, https://docs.vllm.ai/projects/recipes/en/latest/moonshotai/Kimi-K2.5.html does not have a B300-specific recipe, so this config reuses the existing Kimi-K2.5 FP4 B200 vLLM recipe as-is until B300-specific tuning is available.

After this PR:

kimik2.5-fp4-b200-vllm (line 2880): image: vllm/vllm-openai:v0.22.0

kimik2.5-fp4-b300-vllm (line 2905): image: vllm/vllm-openai:v0.24.0

Step-by-step check. Diffing the two YAML blocks field-by-field after the bump:

model, model-prefix, precision, framework, runner (differs by design: b200 vs b300), scenarios, and the search-space (tp/ep/concurrency ranges) all still match B200. So the substantive "no B300-specific tuning" claim is still true.

image: B200 = v0.22.0, B300 = v0.24.0. The two recipes now differ in exactly one field — the image tag.

Since image is part of the recipe, "reuses ... as-is" is no longer literally accurate.

Why the refutation is worth taking seriously. One verifier argued the comment's essential meaning is about B300-specific tuning — i.e., that no B300-tailored scenario/search-space work has been done — not about the image tag, which is routine dependency maintenance shared with the B200 line. That framing is reasonable, and it's why I'm flagging this as a nit rather than a blocker: the load-bearing claim of the comment (upstream lacks a B300 recipe, so tuning parameters are copied from B200) remains true. The "as-is" phrasing is just slightly loose given the image divergence.

Impact. None on benchmark execution — this is purely comment/documentation drift. The value of flagging it is small (readers of the file may be briefly confused when they see the image tags differ), and the fix is a one-line comment edit or a companion image bump on the B200 recipe.

Suggested fix. Either (a) reword the note to something like "...reuses the B200 recipe's tuning parameters until B300-specific tuning is available (the image tag has been bumped independently)," or (b) also bump the B200 image to v0.24.0 in this PR and add kimik2.5-fp4-b200-vllm to the changelog config-keys. Option (a) is lower-risk since it doesn't require re-running B200 benchmarks.

github-actions · 2026-07-04T21:32:00Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28720107429
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=28720107429

functionstackx · 2026-07-04T21:33:09Z

Canary blocked by cluster-side model staging, not the image bump: /scratch/models/Kimi-K2.5-NVFP4 is permission-denied for the job user on multiple B300 nodes (reproduced on b300-017 and b300-002; vLLM then misparses the unreadable path as an HF repo id). The int4 sibling configs using moonshotai/Kimi-K2.5 load fine on the same cluster with the same v0.24.0 image. Needs a chown/chmod of the NVFP4 model dir on the B300 nodes.
中文：canary 受阻于集群侧模型目录权限，而非镜像升级 - 多个 B300 节点上 /scratch/models/Kimi-K2.5-NVFP4 对作业用户拒绝访问（b300-017 与 b300-002 均复现）。同集群同镜像下使用 moonshotai/Kimi-K2.5 的 int4 配置加载正常。需要在 B300 节点上修复该 NVFP4 模型目录的属主/权限。

functionstackx · 2026-07-04T23:10:18Z

Closing: blocked by NVIDIA cluster-side model staging (NVFP4 checkpoint missing/unreadable) or image-pull lock issues. Will re-cut once the cluster staging is fixed.
中文：关闭 - 受阻于 NVIDIA 集群侧模型未就绪（NVFP4 权重缺失/不可读）或镜像拉取锁问题。集群修复后重新提交。

functionstackx requested a review from a team July 4, 2026 21:24

functionstackx added the full-sweep-fail-fast label Jul 4, 2026

functionstackx requested review from Ankur-singh, jgangani and kedarpotdar-nv as code owners July 4, 2026 21:24

functionstackx added the full-sweep-fail-fast label Jul 4, 2026

github-project-automation Bot added this to InferenceMAX Board Jul 4, 2026

[Klaud Cold] Update kimik2.5-fp4-b300-vllm vLLM image to v0.24.0

df44c1c

中文：将 kimik2.5-fp4-b300-vllm 的 vLLM 镜像升级至 v0.24.0。 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

functionstackx force-pushed the klaud-cold/kimik2.5-fp4-b300-vllm-v0.24.0 branch from d5379c3 to df44c1c Compare July 4, 2026 21:24

claude Bot reviewed Jul 4, 2026

View reviewed changes

functionstackx closed this Jul 4, 2026

functionstackx deleted the klaud-cold/kimik2.5-fp4-b300-vllm-v0.24.0 branch July 4, 2026 23:10

github-project-automation Bot moved this to Done in InferenceMAX Board Jul 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Klaud Cold] Update kimik2.5-fp4-b300-vllm vLLM image to v0.24.0 / 将 kimik2.5-fp4-b300-vllm 的 vLLM 镜像升级至 v0.24.0#2076

[Klaud Cold] Update kimik2.5-fp4-b300-vllm vLLM image to v0.24.0 / 将 kimik2.5-fp4-b300-vllm 的 vLLM 镜像升级至 v0.24.0#2076
functionstackx wants to merge 1 commit into
mainfrom
klaud-cold/kimik2.5-fp4-b300-vllm-v0.24.0

functionstackx commented Jul 4, 2026

Uh oh!

github-actions Bot commented Jul 4, 2026

Uh oh!

github-actions Bot commented Jul 4, 2026

Uh oh!

github-actions Bot commented Jul 4, 2026

Uh oh!

claude Bot left a comment

Uh oh!

claude Bot Jul 4, 2026

Uh oh!

github-actions Bot commented Jul 4, 2026

Uh oh!

functionstackx commented Jul 4, 2026

Uh oh!

functionstackx commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

functionstackx commented Jul 4, 2026

Summary

中文说明

Test plan

Uh oh!

github-actions Bot commented Jul 4, 2026

Uh oh!

github-actions Bot commented Jul 4, 2026

Uh oh!

github-actions Bot commented Jul 4, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

claude Bot Jul 4, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jul 4, 2026

Uh oh!

functionstackx commented Jul 4, 2026

Uh oh!

functionstackx commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant