Skip to content

[Klaud Cold] [AMD] gpt-oss-fp4-mi355x (vllm): W4A8 moe optimizations and vllm image bump / gpt-oss-fp4-mi355x(vLLM):W4A8 MoE 优化与 vLLM 镜像升级#2051

Open
xiaohuguo2023 wants to merge 7 commits into
mainfrom
gptoss-mi355x-w4a8-opt
Open

[Klaud Cold] [AMD] gpt-oss-fp4-mi355x (vllm): W4A8 moe optimizations and vllm image bump / gpt-oss-fp4-mi355x(vLLM):W4A8 MoE 优化与 vLLM 镜像升级#2051
xiaohuguo2023 wants to merge 7 commits into
mainfrom
gptoss-mi355x-w4a8-opt

Conversation

@xiaohuguo2023

Copy link
Copy Markdown
Collaborator
  • Image & model

    • Bump image to vllm/vllm-openai-rocm:nightly-68ee8300a047db78fb52bac477daaaac7be11216 (vLLM 0.23.1rc1, aiter 0.1.16.post2, triton 3.6.0).
    • Run amd/gpt-oss-120b-w-mxfp4-a-fp8 (W4A8), exercising the activation-quantized a8w4 Triton MoE path.
  • a8w4 optimizations picked up via the image

    • aiter: updated A8W4 MoE GEMM tuning configs for better pipelining + an output-allocation correctness fix.
    • vLLM: hybrid CDNA4 MX-scale swizzle gate (swizzle ON for TP≤2, OFF for TP4/8).
  • Extend the concs sweep coverage

@github-actions

github-actions Bot commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

Thanks for the contribution! Please reach out to respective companies' CODEOWNER to fill in the latest PR_REVIEW_CHECKLIST.md before pinging core maintainer on Slack for review. In order for the signoff PR check bot to trigger, you must follow the PR_REVIEW_CHECKLIST.md template correctly, including the phrase As a PR reviewer and CODEOWNER, I have reviewed this and have.

For PR verification, add the full-sweep-enabled or full-sweep-fail-fast label to this PR — the benchmark sweep only runs on labeled PRs.

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. See GitHub's docs on re-running failed jobs


感谢你的贡献!请联系相应公司的 CODEOWNER 填写最新的 PR_REVIEW_CHECKLIST.md,然后再在 Slack 上联系核心维护者进行审阅。为了触发 signoff PR 检查机器人,你必须正确遵循 PR_REVIEW_CHECKLIST.md 模板,包括保留英文语句 As a PR reviewer and CODEOWNER, I have reviewed this and have

如需进行 PR 验证,请为此 PR 添加 full-sweep-enabledfull-sweep-fail-fast 标签 — 基准测试 sweep 仅在带有标签的 PR 上运行。

PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动(flake),重新运行失败的任务即可解决。参见 GitHub 关于重新运行失败任务的文档

@functionstackx functionstackx changed the title [AMD] gpt-oss-fp4-mi355x (vllm): W4A8 moe optimizations and vllm image bump [Klaud Cold] [AMD] gpt-oss-fp4-mi355x (vllm): W4A8 moe optimizations and vllm image bump / gpt-oss-fp4-mi355x(vLLM):W4A8 MoE 优化与 vLLM 镜像升级 Jul 4, 2026
perf-changelog.yaml resolved by taking main's entries and re-appending
this PR's gptoss-fp4-mi355x-vllm entry at the tail.

中文:将 origin/main 合并进本分支;perf-changelog.yaml 按惯例处理 -
采用 main 的条目并将本 PR 的条目重新追加到末尾。

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants