Skip to content

[Klaud Cold] Update kimik2.5-fp4-b200-vllm vLLM image to v0.24.0 / 将 kimik2.5-fp4-b200-vllm 的 vLLM 镜像 升级至 v0.24.0#2075

Closed
functionstackx wants to merge 1 commit into
mainfrom
klaud-cold/kimik2.5-fp4-b200-vllm-v0.24.0
Closed

[Klaud Cold] Update kimik2.5-fp4-b200-vllm vLLM image to v0.24.0 / 将 kimik2.5-fp4-b200-vllm 的 vLLM 镜像 升级至 v0.24.0#2075
functionstackx wants to merge 1 commit into
mainfrom
klaud-cold/kimik2.5-fp4-b200-vllm-v0.24.0

Conversation

@functionstackx

Copy link
Copy Markdown
Collaborator

Summary

Update vLLM image from vllm/vllm-openai:v0.22.0 to vllm/vllm-openai:v0.24.0

Recipes touched: kimik2.5-fp4-b200-vllm

中文说明

将 vLLM 镜像 从 vllm/vllm-openai:v0.22.0 升级至 vllm/vllm-openai:v0.24.0。涉及配置:kimik2.5-fp4-b200-vllm

Test plan

  • full-sweep-fail-fast sweep passes.

🤖 Generated with Claude Code

中文:将 kimik2.5-fp4-b200-vllm 的 vLLM 镜像 升级至 v0.24.0。

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@functionstackx functionstackx force-pushed the klaud-cold/kimik2.5-fp4-b200-vllm-v0.24.0 branch from 65a6dcf to 2d9ba77 Compare July 4, 2026 21:24
@github-actions

github-actions Bot commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

Thanks for the contribution! Please reach out to respective companies' CODEOWNER to fill in the latest PR_REVIEW_CHECKLIST.md before pinging core maintainer on Slack for review. In order for the signoff PR check bot to trigger, you must follow the PR_REVIEW_CHECKLIST.md template correctly, including the phrase As a PR reviewer and CODEOWNER, I have reviewed this and have.

For PR verification, add the full-sweep-enabled or full-sweep-fail-fast label to this PR — the benchmark sweep only runs on labeled PRs.

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. See GitHub's docs on re-running failed jobs


感谢你的贡献!请联系相应公司的 CODEOWNER 填写最新的 PR_REVIEW_CHECKLIST.md,然后再在 Slack 上联系核心维护者进行审阅。为了触发 signoff PR 检查机器人,你必须正确遵循 PR_REVIEW_CHECKLIST.md 模板,包括保留英文语句 As a PR reviewer and CODEOWNER, I have reviewed this and have

如需进行 PR 验证,请为此 PR 添加 full-sweep-enabledfull-sweep-fail-fast 标签 — 基准测试 sweep 仅在带有标签的 PR 上运行。

PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动(flake),重新运行失败的任务即可解决。参见 GitHub 关于重新运行失败任务的文档

3 similar comments
@github-actions

github-actions Bot commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

Thanks for the contribution! Please reach out to respective companies' CODEOWNER to fill in the latest PR_REVIEW_CHECKLIST.md before pinging core maintainer on Slack for review. In order for the signoff PR check bot to trigger, you must follow the PR_REVIEW_CHECKLIST.md template correctly, including the phrase As a PR reviewer and CODEOWNER, I have reviewed this and have.

For PR verification, add the full-sweep-enabled or full-sweep-fail-fast label to this PR — the benchmark sweep only runs on labeled PRs.

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. See GitHub's docs on re-running failed jobs


感谢你的贡献!请联系相应公司的 CODEOWNER 填写最新的 PR_REVIEW_CHECKLIST.md,然后再在 Slack 上联系核心维护者进行审阅。为了触发 signoff PR 检查机器人,你必须正确遵循 PR_REVIEW_CHECKLIST.md 模板,包括保留英文语句 As a PR reviewer and CODEOWNER, I have reviewed this and have

如需进行 PR 验证,请为此 PR 添加 full-sweep-enabledfull-sweep-fail-fast 标签 — 基准测试 sweep 仅在带有标签的 PR 上运行。

PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动(flake),重新运行失败的任务即可解决。参见 GitHub 关于重新运行失败任务的文档

@github-actions

github-actions Bot commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

Thanks for the contribution! Please reach out to respective companies' CODEOWNER to fill in the latest PR_REVIEW_CHECKLIST.md before pinging core maintainer on Slack for review. In order for the signoff PR check bot to trigger, you must follow the PR_REVIEW_CHECKLIST.md template correctly, including the phrase As a PR reviewer and CODEOWNER, I have reviewed this and have.

For PR verification, add the full-sweep-enabled or full-sweep-fail-fast label to this PR — the benchmark sweep only runs on labeled PRs.

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. See GitHub's docs on re-running failed jobs


感谢你的贡献!请联系相应公司的 CODEOWNER 填写最新的 PR_REVIEW_CHECKLIST.md,然后再在 Slack 上联系核心维护者进行审阅。为了触发 signoff PR 检查机器人,你必须正确遵循 PR_REVIEW_CHECKLIST.md 模板,包括保留英文语句 As a PR reviewer and CODEOWNER, I have reviewed this and have

如需进行 PR 验证,请为此 PR 添加 full-sweep-enabledfull-sweep-fail-fast 标签 — 基准测试 sweep 仅在带有标签的 PR 上运行。

PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动(flake),重新运行失败的任务即可解决。参见 GitHub 关于重新运行失败任务的文档

@github-actions

github-actions Bot commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

Thanks for the contribution! Please reach out to respective companies' CODEOWNER to fill in the latest PR_REVIEW_CHECKLIST.md before pinging core maintainer on Slack for review. In order for the signoff PR check bot to trigger, you must follow the PR_REVIEW_CHECKLIST.md template correctly, including the phrase As a PR reviewer and CODEOWNER, I have reviewed this and have.

For PR verification, add the full-sweep-enabled or full-sweep-fail-fast label to this PR — the benchmark sweep only runs on labeled PRs.

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. See GitHub's docs on re-running failed jobs


感谢你的贡献!请联系相应公司的 CODEOWNER 填写最新的 PR_REVIEW_CHECKLIST.md,然后再在 Slack 上联系核心维护者进行审阅。为了触发 signoff PR 检查机器人,你必须正确遵循 PR_REVIEW_CHECKLIST.md 模板,包括保留英文语句 As a PR reviewer and CODEOWNER, I have reviewed this and have

如需进行 PR 验证,请为此 PR 添加 full-sweep-enabledfull-sweep-fail-fast 标签 — 基准测试 sweep 仅在带有标签的 PR 上运行。

PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动(flake),重新运行失败的任务即可解决。参见 GitHub 关于重新运行失败任务的文档

@github-actions

github-actions Bot commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — mechanical vLLM image bump for kimik2.5-fp4-b200-vllm from v0.22.0 to v0.24.0 with matching perf-changelog entry.

Extended reasoning...

Overview

This PR bumps the vLLM image for the single recipe kimik2.5-fp4-b200-vllm from vllm/vllm-openai:v0.22.0 to vllm/vllm-openai:v0.24.0 in configs/nvidia-master.yaml, and appends a matching entry to perf-changelog.yaml linking back to this PR. Two lines of substantive change across two files.

Security risks

None. This is a container image tag update in a benchmark recipe config — no auth, crypto, permissions, or user-facing surface touched.

Level of scrutiny

Low. This is a mechanical Klaud Cold image bump that matches the exact pattern of recent merged commits (e.g. 2d9ba77 which is the same PR title/shape). The full-sweep-fail-fast label is applied, so the benchmark sweep will empirically validate whether v0.24.0 actually works before merge.

Other factors

The perf-changelog entry is well-formed and correctly references PR #2075. No outstanding reviewer comments beyond the standard recipe-reminder bot messages. Bug hunting system found nothing.

@github-actions

github-actions Bot commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

@functionstackx

Copy link
Copy Markdown
Collaborator Author

Canary blocked by cluster-side model staging, not the image bump: enroot cannot mount /lustre/fsw/models/Kimi-K2.5-NVFP4 (No such file or directory) on the B200 cluster — the NVFP4 checkpoint appears to be missing/unstaged at that path. The int4 sibling configs using moonshotai/Kimi-K2.5 run fine with the same v0.24.0 image. Needs the model re-staged on lustre (or an auto-download fallback like #1864).
中文:canary 受阻于集群侧模型未就绪,而非镜像升级 - B200 集群上 enroot 无法挂载 /lustre/fsw/models/Kimi-K2.5-NVFP4(目录不存在),NVFP4 权重似乎未在该路径就绪。同镜像下使用 moonshotai/Kimi-K2.5 的 int4 配置运行正常。需要在 lustre 上重新就绪该模型(或采用 #1864 式的自动下载回退)。

@functionstackx

Copy link
Copy Markdown
Collaborator Author

Closing: blocked by NVIDIA cluster-side model staging (NVFP4 checkpoint missing/unreadable) or image-pull lock issues. Will re-cut once the cluster staging is fixed.
中文:关闭 - 受阻于 NVIDIA 集群侧模型未就绪(NVFP4 权重缺失/不可读)或镜像拉取锁问题。集群修复后重新提交。

@functionstackx functionstackx deleted the klaud-cold/kimik2.5-fp4-b200-vllm-v0.24.0 branch July 4, 2026 23:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

1 participant