fix(supervisor): drop sandbox child capability bounding set by alangou · Pull Request #2001 · NVIDIA/OpenShell

alangou · 2026-06-25T13:17:01Z

Summary

Drop the Linux capability bounding set in the common sandbox child privilege-drop path so workloads and openshell connect shells cannot regain container-granted capabilities after exec.

Related Issue

Closes #1452

Changes

Added Linux-only capability bounding set reduction in drop_privileges() after initgroups/setgid and before setuid.
Drops capabilities from 0..=cap_last_cap, with CAP_SETPCAP dropped last and a graceful fallback when CAP_SETPCAP is unavailable.
Added focused Linux-only regression coverage and documented the child-process invariant in architecture/sandbox.md.

Testing

mise run pre-commit passes
Unit tests added/updated
E2E not applicable; no e2e files changed

Additional checks run:

mise run fmt && mise run e2e && mise run e2e:kubernetes && mise run e2e:docker
cargo test -p openshell-supervisor-process --lib process::tests -- --nocapture
cargo test -p openshell-supervisor-process --lib ssh::tests::pre_exec_always_calls_drop_privileges -- --nocapture

Checklist

Follows Conventional Commits
Commits are signed off (DCO)

Reduce the Linux capability bounding set in the common privilege-drop path before executing sandbox workloads or connect shells. Signed-off-by: Adrien Langou <alangou@nvidia.com>

elezar · 2026-06-25T15:25:41Z

+
+#[cfg(target_os = "linux")]
+fn kernel_cap_last_cap() -> u32 {
+    std::fs::read_to_string(PROC_CAP_LAST_CAP_PATH)


My first thought here was whether there's an existing crate that would provide the functionality we need. Codex suggests that https://crates.io/crates/capctl may be applicable:

- Best fit: capctl (https://crates.io/crates/capctl) / GitHub (https://github.com/cptpcrd/capctl) Pure Rust, direct Linux prctl() capability API, and it has the exact bounding-set surface we need: bounding::drop, bounding::ensure_dropped, bounding::clear, bounding::clear_unknown, bounding::read/probe, plus effective-set inspection through CapState. The important bit: it explicitly handles newer kernel capabilities via clear_unknown(), which matches the PR’s current /proc/ sys/kernel/cap_last_cap intent better than most wrappers.

I agree that using a dedicated crate would make the parsing cleaner.

For this patch, I intentionally reused the same low-level mechanism that the codebase already uses for Linux process hardening. The goal was to keep the change narrow, avoid adding a new dependency, and avoid rewriting working capability-related code as part of a defense-in-depth fix.

My concern with adding a capabilities crate only for parsing is that it would be a bit inconsistent: if we introduce a crate that can also wrap the prctl capability operations, then we should probably use it consistently for the whole Linux capabilities path instead of mixing manual prctl calls with crate-based parsing.

So I’d prefer to keep this patch minimal, then consider a separate cleanup/refactor later if we want to move all Linux capability handling to a dedicated crate.

If we are ok to delay this fix for the NeMoClaw team I'm happy to add the crate and rewrite the capability handling with that crate. If the fix can't wait more then I can tackle this in a separate PR as well

I'm ok to do this as a follow-up and apply it consistently.

elezar

A codex-assisted review:

I think this still fails closed incorrectly for the Podman path. drop_capability_bounding_set() returns Ok(()) when CAP_SETPCAP is
not effective, and also when the first PR_CAPBSET_DROP returns EPERM. But the Podman driver currently drops SETPCAP from the
supervisor container, so that path leaves the child bounding set unchanged while still spawning the workload/connect shell.

Could we either keep SETPCAP available to the supervisor until child setup, or fail the spawn when the bounding set cannot be cleared
and is still nonempty? This may also be a good place to use capctl rather than custom /proc parsing and raw prctl;
capctl::caps::bounding::clear() plus an explicit “EPERM is only OK if the bounding set is already empty” check would make the invariant
clearer.

The current regression test skips when CAP_SETPCAP is unavailable, so it would not catch the Podman-relevant failure mode.

johntmyers · 2026-06-25T16:26:55Z

A codex-assisted review:

I think this still fails closed incorrectly for the Podman path....

+1. Seems like different drivers can do different drops on their own which would impact this common code that runs driver-agnostically.

fix(supervisor): drop sandbox child capability bounding set

22e54a6

Reduce the Linux capability bounding set in the common privilege-drop path before executing sandbox workloads or connect shells. Signed-off-by: Adrien Langou <alangou@nvidia.com>

alangou requested review from a team, derekwaynecarr, maxamillion and mrunalp as code owners June 25, 2026 13:17

elezar reviewed Jun 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(supervisor): drop sandbox child capability bounding set#2001

fix(supervisor): drop sandbox child capability bounding set#2001
alangou wants to merge 1 commit into
mainfrom
alangou/os-173-high-docker-sandboxes-run-as-root-with-sys_admin-and

alangou commented Jun 25, 2026 •

edited

Loading

Uh oh!

elezar Jun 25, 2026

Uh oh!

alangou Jun 25, 2026

Uh oh!

elezar Jun 25, 2026

Uh oh!

elezar left a comment

Uh oh!

johntmyers commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

alangou commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Testing

Checklist

Uh oh!

elezar Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

alangou Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

elezar Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

elezar left a comment

Choose a reason for hiding this comment

Uh oh!

johntmyers commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

alangou commented Jun 25, 2026 •

edited

Loading