Skip to content

[Klaud Cold] CONTRIBUTING: warn against root-owned files on AMD runners / 贡献指南:警告勿在 AMD runner 工作区留下 root 所属文件#2043

Merged
functionstackx merged 1 commit into
mainfrom
klaud-cold/contributing-no-root-owned-files
Jul 4, 2026
Merged

[Klaud Cold] CONTRIBUTING: warn against root-owned files on AMD runners / 贡献指南:警告勿在 AMD runner 工作区留下 root 所属文件#2043
functionstackx merged 1 commit into
mainfrom
klaud-cold/contributing-no-root-owned-files

Conversation

@functionstackx

Copy link
Copy Markdown
Collaborator

Summary

Adds a new section to CONTRIBUTING.md (and its CONTRIBUTING_zh.md counterpart) warning contributors not to leave root-owned files in GitHub Actions runner workspaces on the AMD MI355X TW cluster.

Problem: Multi-node Slurm containers run as root and write logs into the runner workspace. When a job is cancelled before teardown, root-owned directories get stranded. The runner user cannot delete them, causing EACCES: permission denied errors at actions/checkout that brick the runner for every subsequent job. Since all AMD MI355X sweeps share the same runner pool, one stranded directory blocks the entire queue for everyone.

New rules documented:

  1. Never write as root into the runner workspace -- use /tmp or a dedicated staging path
  2. If root writes are unavoidable, add a cleanup trap (trap cleanup EXIT) that removes root-owned files before exit
  3. Test the teardown path by cancelling mid-flight and verifying no root files remain

Includes a pointer to the recovery procedure in .claude/commands/clean-amd-mi355-runner-root-files.md.

Motivated by recurring incidents on gharunner06 -- most recently #2003 (comment).

Test plan

  • Both CONTRIBUTING.md and CONTRIBUTING_zh.md updated in sync
  • No code, config, or workflow changes

🤖 Generated with Claude Code

中文说明

CONTRIBUTING.md 及其中文版 CONTRIBUTING_zh.md 中新增章节,警告贡献者不要在 AMD MI355X TW 集群的 GitHub Actions runner 工作区中留下 root 所属文件。Slurm 容器以 root 运行时写入的日志文件如果在任务取消后未清理,会导致后续所有任务因 EACCES: permission denied 而失败,阻塞整个 runner 队列。新增三条规则:禁止以 root 写入工作区、必须添加清理 trap、测试 teardown 路径。

…rs / 贡献指南:警告勿在 AMD runner 工作区留下 root 所属文件

Add a section to CONTRIBUTING.md (and _zh counterpart) documenting
that Slurm containers must not strand root-owned files in GitHub
Actions runner workspaces on the AMD MI355X TW cluster. A single
stranded directory bricks the entire runner queue for everyone.
Includes rules for avoiding the problem and a pointer to the
recovery procedure.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@functionstackx functionstackx requested a review from a team July 4, 2026 04:57
@functionstackx functionstackx merged commit 2465e2b into main Jul 4, 2026
3 checks passed
@functionstackx functionstackx deleted the klaud-cold/contributing-no-root-owned-files branch July 4, 2026 04:57

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — documentation-only addition with EN/zh in sync.

Extended reasoning...

Overview

This PR adds a new warning section to CONTRIBUTING.md and its Chinese counterpart CONTRIBUTING_zh.md about not leaving root-owned files in GitHub Actions runner workspaces on the AMD MI355X TW cluster. It documents three rules (no root writes to workspace, cleanup trap if unavoidable, test the teardown path) and links to the existing recovery procedure at .claude/commands/clean-amd-mi355-runner-root-files.md (verified to exist).

Security risks

None. This is purely documentation guidance — no executable code, config, or workflow changes.

Level of scrutiny

Low. This is a pure Markdown edit with no functional impact on the build, CI, or runtime. Both language versions are updated in sync per repo convention, and the referenced command file exists.

Other factors

No bugs flagged by the bug hunting system, no prior reviews on the timeline, and the change follows the same bilingual pattern as recent merged PRs in the tree. The motivation (recurring runner brick incidents) is well-established and the guidance matches operational reality reflected in AGENTS.md and existing command files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

1 participant