Skip to content

Add switch --kill and persistent auto-kill to stop Codex before switching#149

Open
bjspi wants to merge 2 commits into
Loongphy:mainfrom
bjspi:codex/switch-kill
Open

Add switch --kill and persistent auto-kill to stop Codex before switching#149
bjspi wants to merge 2 commits into
Loongphy:mainfrom
bjspi:codex/switch-kill

Conversation

@bjspi

@bjspi bjspi commented Jul 1, 2026

Copy link
Copy Markdown

Motivation

When you switch the active account, codex-auth rewrites ~/.codex/auth.json. But a Codex instance that is already running (the CLI or the desktop GUI) has the previous credentials loaded in memory, so the switch does not actually take effect until Codex is restarted manually. In practice this means: switch account → nothing changes → remember to quit and relaunch Codex → try again.

This PR lets codex-auth optionally stop all running Codex processes as part of the switch, and only proceed with the switch once none remain — so the freshly activated account is the one Codex picks up on its next start. This is especially useful for fully automated account rotation, where you don't want to babysit which instance is still holding an old session.

What this adds

  • codex-auth switch --kill / --no-kill — per-invocation override.
  • Persistent auto_kill setting in registry.json, toggled via codex-auth config kill on|off. When enabled, every switch stops Codex first; --kill / --no-kill override it for a single run.
    • Resolution: effective = opts.kill orelse reg.auto_kill.

Behaviour

When kill is in effect, handleSwitch (before any target dispatch — covers picker / query / previous / --live):

  1. Detect whether any Codex process is running; if none, do nothing.
  2. Graceful first: ask Codex to quit (macOS osascript ... to quit, SIGTERM on Unix, windowed close on Windows).
  3. Short wait, then a hard kill for anything that survived.
  4. Re-check. If a Codex process is still alive, the switch is aborted with error.CodexStillRunning and a clear message — the account is not changed.

Both the CLI (codex / codex.exe) and the GUI (macOS bundle com.openai.codex, Windows Codex.exe) are targeted.

Safety

  • Exact process-name matching everywhere (pkill -x codex, taskkill /IM codex.exe, tasklist exact image filter) so codex-auth itself is never terminated.
  • Graceful-before-force reduces the risk of losing unsaved work in the GUI.

Cross-platform

New src/workflows/process_kill.zig, dispatching on builtin.os.tag, reusing the codebase's existing process helpers (std.process.run with the shared app_runtime.io()):

  • Windows: taskkill /IM codex.exe /T then /F; detection via tasklist exact filter.
  • macOS: osascript quit for the GUI + pkill -TERM/-KILL -x codex|Codex; detection via pgrep -x.
  • Linux: pkill -TERM/-KILL -x codex|Codex; detection via pgrep -x (best-effort).

Registry / schema

auto_kill is an additive optional field — no schema-version bump (consistent with how interval_seconds was added). It is parsed tolerantly (absent → default false) and materialized into pre-existing files through currentLayoutNeedsRewrite, so old and new binaries interoperate within schema v4.

Tests

  • Parser: --kill, --no-kill, duplicate, --kill + --no-kill conflict, switch <target> --kill.
  • config kill on|off parsing incl. invalid value.
  • Registry round-trip: auto_kill persists; missing field defaults to false and is rewritten.
  • Help text updated (usage / options / examples for both switch and config).

The real process-killing paths invoke OS commands and are not unit-tested in CI (argument construction / dispatch is); manual verification steps were used for the end-to-end flow.

Notes / limitations

  • There is a small race window: a user could relaunch Codex between the kill and activation (kill happens up-front, before the picker).
  • In --live mode the kill runs once at start; in-session re-switches do not re-trigger it.
  • Linux had no prior Codex detect/kill precedent, so that path is new and best-effort.

🤖 Generated with Claude Code

Stop all running Codex processes (CLI and GUI) before switching accounts,
so the new auth.json takes effect without a manual Codex restart.

- `codex-auth switch --kill` / `--no-kill` override per run; the persistent
  `auto_kill` registry setting (toggle via `codex-auth config kill on|off`)
  applies otherwise. Resolution: opts.kill orelse reg.auto_kill.
- Kill is graceful-first (quit / SIGTERM / windowed close), then a hard kill
  for survivors; the switch is aborted with `error.CodexStillRunning` if any
  Codex process remains. Exact name matching (pkill -x codex, taskkill /IM
  codex.exe) never targets codex-auth itself.
- Cross-platform (Windows taskkill/tasklist, macOS osascript+pkill, Linux
  pkill/pgrep) in new src/workflows/process_kill.zig, hooked once at the top
  of handleSwitch (covers picker/auto/query/previous/live).
- auto_kill is an additive optional registry field (no schema bump); parsed
  tolerantly and materialized into old files via currentLayoutNeedsRewrite.
- Adds parser, config, and registry round-trip tests; updates help text.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@greptile-apps

greptile-apps Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds optional Codex-process termination before an account switch, surfaced as switch --kill / --no-kill per-run flags and a persistent auto_kill registry setting toggled with config kill on|off. The implementation is cross-platform (Windows tasklist/taskkill, macOS osascript+pkill, Linux pkill) and aborts the switch if any Codex process survives the kill sequence.

  • New src/workflows/process_kill.zig implements graceful-then-force termination with exact process-name matching (-x / /FI IMAGENAME eq) to avoid self-termination, and returns error.CodexStillRunning to abort the switch when a process won't die.
  • Registry schema gains an additive auto_kill: bool field (default false) with a tolerant parser and automatic rewrite on first load, consistent with how interval_seconds was introduced.
  • Parser and help text updated for switch --kill|--no-kill and config kill on|off, with full test coverage for all flag combinations and registry round-trips.

Confidence Score: 4/5

Safe to merge after fixing the misleading recovery hint in printCodexStillRunningError.

The printCodexStillRunningError hint tells users to "switch without --kill", which is incorrect advice when the kill was triggered by auto_kill=true rather than an explicit flag. A user with auto_kill enabled who hits this error will follow the hint, re-run codex-auth switch, and get the same error in a loop — the actual escape is --no-kill or config kill off. Everything else (parser, registry schema, process detection, kill sequencing) looks correct.

src/cli/output.zig — the printCodexStillRunningError hint needs updating.

Important Files Changed

Filename Overview
src/workflows/process_kill.zig New cross-platform kill module; logic is sound. Minor issue: runIgnoringFailure buffers 1 MB of output per call despite discarding it — should match the 64 KB cap used by the detection functions.
src/cli/output.zig New printCodexStillRunningError has a misleading recovery hint: "switch without --kill" gives incorrect advice to users who triggered the kill via auto_kill=true rather than an explicit --kill flag.
src/workflows/switch.zig Adds shouldKillBeforeSwitch and wires it into handleSwitch; logic is correct. Kill fires before account selection (picker/live), meaning Codex is terminated even if the user cancels — already noted in a previous review thread.
src/registry/storage.zig Adds auto_kill parsing and triggers a layout rewrite when the field is absent; consistent with how interval_seconds was handled previously.
src/registry/storage_write.zig Serializes auto_kill into RegistryOut; straightforward additive change.
src/cli/commands/switch.zig Correctly handles --kill / --no-kill parsing, including duplicate and conflict detection for all four flag combinations.
src/cli/commands/config.zig Adds `config kill on
src/workflows/config.zig Adds handleKillCommand that mutates auto_kill in the registry and prints confirmation; correct load/modify/save pattern.
tests/cli_behavior_test.zig Comprehensive parser tests for all --kill/--no-kill combinations and `config kill on
tests/registry_test.zig Adds round-trip and default-rewrite tests for auto_kill; both cases are covered correctly.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[codex-auth switch] --> B{opts.kill set?}
    B -- yes --> C{opts.kill == true?}
    B -- no --> D[loadRegistry\nauto_kill field]
    D --> E{auto_kill == true?}
    C -- false --> F[Skip kill\nproceed to switch]
    C -- true --> G[ensureCodexStoppedForSwitch]
    E -- false --> F
    E -- true --> G
    G --> H{Any Codex\nrunning?}
    H -- no --> I[Switch proceeds]
    H -- yes --> J[gracefulKill\nSIGTERM / osascript / taskkill /T]
    J --> K[sleep 700ms]
    K --> L{Still running?}
    L -- no --> I
    L -- yes --> M[forceKill\nSIGKILL / taskkill /F]
    M --> N[sleep 400ms]
    N --> O{Still running?}
    O -- no --> I
    O -- yes --> P[printCodexStillRunningError\nreturn error.CodexStillRunning]
    P --> Q[Switch aborted]
    I --> R[handleSwitchQuery\nor picker\nor previous\nor live]
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[codex-auth switch] --> B{opts.kill set?}
    B -- yes --> C{opts.kill == true?}
    B -- no --> D[loadRegistry\nauto_kill field]
    D --> E{auto_kill == true?}
    C -- false --> F[Skip kill\nproceed to switch]
    C -- true --> G[ensureCodexStoppedForSwitch]
    E -- false --> F
    E -- true --> G
    G --> H{Any Codex\nrunning?}
    H -- no --> I[Switch proceeds]
    H -- yes --> J[gracefulKill\nSIGTERM / osascript / taskkill /T]
    J --> K[sleep 700ms]
    K --> L{Still running?}
    L -- no --> I
    L -- yes --> M[forceKill\nSIGKILL / taskkill /F]
    M --> N[sleep 400ms]
    N --> O{Still running?}
    O -- no --> I
    O -- yes --> P[printCodexStillRunningError\nreturn error.CodexStillRunning]
    P --> Q[Switch aborted]
    I --> R[handleSwitchQuery\nor picker\nor previous\nor live]
Loading

Reviews (2): Last reviewed commit: "docs(help): surface --kill/--no-kill and..." | Re-trigger Greptile

Comment thread src/workflows/switch.zig
Comment on lines +21 to +23
if (try shouldKillBeforeSwitch(allocator, codex_home, opts)) {
try process_kill.ensureCodexStoppedForSwitch(allocator);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Kill fires before interactive selection — Codex dies even on cancel

ensureCodexStoppedForSwitch is called before the picker or live UI is shown. If the user runs codex-auth switch --kill (no explicit target) and then presses Escape in the picker, Codex is already terminated but no account was switched. For the --live mode, once a session begins the same applies for subsequent in-session re-switches. The user would need to manually restart Codex after a cancellation, potentially losing any unsaved in-flight work that SIGKILL didn't give the process time to flush.

A safer flow would be to perform the kill only after a target account is confirmed (i.e., move ensureCodexStoppedForSwitch into each target handler just before activateAccountByKey / saveRegistry).

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/workflows/switch.zig
Line: 21-23

Comment:
**Kill fires before interactive selection — Codex dies even on cancel**

`ensureCodexStoppedForSwitch` is called before the picker or live UI is shown. If the user runs `codex-auth switch --kill` (no explicit target) and then presses Escape in the picker, Codex is already terminated but no account was switched. For the `--live` mode, once a session begins the same applies for subsequent in-session re-switches. The user would need to manually restart Codex after a cancellation, potentially losing any unsaved in-flight work that SIGKILL didn't give the process time to flush.

A safer flow would be to perform the kill only after a target account is confirmed (i.e., move `ensureCodexStoppedForSwitch` into each target handler just before `activateAccountByKey` / `saveRegistry`).

How can I resolve this? If you propose a fix, please make it concise.

Fix in Codex

Comment thread src/workflows/switch.zig
Comment on lines +115 to +124
fn shouldKillBeforeSwitch(
allocator: std.mem.Allocator,
codex_home: []const u8,
opts: cli.types.SwitchOptions,
) !bool {
if (opts.kill) |kill| return kill;
var reg = try registry.loadRegistry(allocator, codex_home);
defer reg.deinit(allocator);
return reg.auto_kill;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Registry loaded twice on the hot path when auto_kill is true

When opts.kill is null and the registry's auto_kill is true, shouldKillBeforeSwitch loads and immediately discards the registry (line 121–123). Every switch target (handleSwitchQuery, handleSwitchPrevious, the picker path) then loads the registry again. Passing the already-loaded registry into handleSwitch or threading auto_kill through SwitchOptions would avoid the redundant I/O on every automated-rotation invocation.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/workflows/switch.zig
Line: 115-124

Comment:
**Registry loaded twice on the hot path when `auto_kill` is true**

When `opts.kill` is `null` and the registry's `auto_kill` is `true`, `shouldKillBeforeSwitch` loads and immediately discards the registry (line 121–123). Every switch target (`handleSwitchQuery`, `handleSwitchPrevious`, the picker path) then loads the registry again. Passing the already-loaded registry into `handleSwitch` or threading `auto_kill` through `SwitchOptions` would avoid the redundant I/O on every automated-rotation invocation.

How can I resolve this? If you propose a fix, please make it concise.

Fix in Codex

Comment on lines +115 to +117
fn sleepMs(ms: i64) void {
app_runtime.io().sleep(std.Io.Duration.fromMilliseconds(ms), .awake) catch {};
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 sleepMs parameter is signed but only meaningful as unsigned

ms: i64 is passed to std.Io.Duration.fromMilliseconds, which likely expects a non-negative value. A negative argument would either wrap to a huge duration or cause a runtime assertion failure. Since the function is only called with literal positive constants today, this is fine in practice, but changing the parameter to u64 makes the contract explicit and prevents a future caller from accidentally passing a negative value.

Suggested change
fn sleepMs(ms: i64) void {
app_runtime.io().sleep(std.Io.Duration.fromMilliseconds(ms), .awake) catch {};
}
fn sleepMs(ms: u64) void {
app_runtime.io().sleep(std.Io.Duration.fromMilliseconds(ms), .awake) catch {};
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/workflows/process_kill.zig
Line: 115-117

Comment:
**`sleepMs` parameter is signed but only meaningful as unsigned**

`ms: i64` is passed to `std.Io.Duration.fromMilliseconds`, which likely expects a non-negative value. A negative argument would either wrap to a huge duration or cause a runtime assertion failure. Since the function is only called with literal positive constants today, this is fine in practice, but changing the parameter to `u64` makes the contract explicit and prevents a future caller from accidentally passing a negative value.

```suggestion
fn sleepMs(ms: u64) void {
    app_runtime.io().sleep(std.Io.Duration.fromMilliseconds(ms), .awake) catch {};
}
```

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Fix in Codex

Add the kill flags and `config kill on|off` to the top-level `--help`
overview so the feature is discoverable there, not only in the per-command
help.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant