Add switch --kill and persistent auto-kill to stop Codex before switching#149
Add switch --kill and persistent auto-kill to stop Codex before switching#149bjspi wants to merge 2 commits into
switch --kill and persistent auto-kill to stop Codex before switching#149Conversation
Stop all running Codex processes (CLI and GUI) before switching accounts, so the new auth.json takes effect without a manual Codex restart. - `codex-auth switch --kill` / `--no-kill` override per run; the persistent `auto_kill` registry setting (toggle via `codex-auth config kill on|off`) applies otherwise. Resolution: opts.kill orelse reg.auto_kill. - Kill is graceful-first (quit / SIGTERM / windowed close), then a hard kill for survivors; the switch is aborted with `error.CodexStillRunning` if any Codex process remains. Exact name matching (pkill -x codex, taskkill /IM codex.exe) never targets codex-auth itself. - Cross-platform (Windows taskkill/tasklist, macOS osascript+pkill, Linux pkill/pgrep) in new src/workflows/process_kill.zig, hooked once at the top of handleSwitch (covers picker/auto/query/previous/live). - auto_kill is an additive optional registry field (no schema bump); parsed tolerantly and materialized into old files via currentLayoutNeedsRewrite. - Adds parser, config, and registry round-trip tests; updates help text. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Greptile SummaryThis PR adds optional Codex-process termination before an account switch, surfaced as
Confidence Score: 4/5Safe to merge after fixing the misleading recovery hint in printCodexStillRunningError. The src/cli/output.zig — the Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[codex-auth switch] --> B{opts.kill set?}
B -- yes --> C{opts.kill == true?}
B -- no --> D[loadRegistry\nauto_kill field]
D --> E{auto_kill == true?}
C -- false --> F[Skip kill\nproceed to switch]
C -- true --> G[ensureCodexStoppedForSwitch]
E -- false --> F
E -- true --> G
G --> H{Any Codex\nrunning?}
H -- no --> I[Switch proceeds]
H -- yes --> J[gracefulKill\nSIGTERM / osascript / taskkill /T]
J --> K[sleep 700ms]
K --> L{Still running?}
L -- no --> I
L -- yes --> M[forceKill\nSIGKILL / taskkill /F]
M --> N[sleep 400ms]
N --> O{Still running?}
O -- no --> I
O -- yes --> P[printCodexStillRunningError\nreturn error.CodexStillRunning]
P --> Q[Switch aborted]
I --> R[handleSwitchQuery\nor picker\nor previous\nor live]
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
A[codex-auth switch] --> B{opts.kill set?}
B -- yes --> C{opts.kill == true?}
B -- no --> D[loadRegistry\nauto_kill field]
D --> E{auto_kill == true?}
C -- false --> F[Skip kill\nproceed to switch]
C -- true --> G[ensureCodexStoppedForSwitch]
E -- false --> F
E -- true --> G
G --> H{Any Codex\nrunning?}
H -- no --> I[Switch proceeds]
H -- yes --> J[gracefulKill\nSIGTERM / osascript / taskkill /T]
J --> K[sleep 700ms]
K --> L{Still running?}
L -- no --> I
L -- yes --> M[forceKill\nSIGKILL / taskkill /F]
M --> N[sleep 400ms]
N --> O{Still running?}
O -- no --> I
O -- yes --> P[printCodexStillRunningError\nreturn error.CodexStillRunning]
P --> Q[Switch aborted]
I --> R[handleSwitchQuery\nor picker\nor previous\nor live]
Reviews (2): Last reviewed commit: "docs(help): surface --kill/--no-kill and..." | Re-trigger Greptile |
| if (try shouldKillBeforeSwitch(allocator, codex_home, opts)) { | ||
| try process_kill.ensureCodexStoppedForSwitch(allocator); | ||
| } |
There was a problem hiding this comment.
Kill fires before interactive selection — Codex dies even on cancel
ensureCodexStoppedForSwitch is called before the picker or live UI is shown. If the user runs codex-auth switch --kill (no explicit target) and then presses Escape in the picker, Codex is already terminated but no account was switched. For the --live mode, once a session begins the same applies for subsequent in-session re-switches. The user would need to manually restart Codex after a cancellation, potentially losing any unsaved in-flight work that SIGKILL didn't give the process time to flush.
A safer flow would be to perform the kill only after a target account is confirmed (i.e., move ensureCodexStoppedForSwitch into each target handler just before activateAccountByKey / saveRegistry).
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/workflows/switch.zig
Line: 21-23
Comment:
**Kill fires before interactive selection — Codex dies even on cancel**
`ensureCodexStoppedForSwitch` is called before the picker or live UI is shown. If the user runs `codex-auth switch --kill` (no explicit target) and then presses Escape in the picker, Codex is already terminated but no account was switched. For the `--live` mode, once a session begins the same applies for subsequent in-session re-switches. The user would need to manually restart Codex after a cancellation, potentially losing any unsaved in-flight work that SIGKILL didn't give the process time to flush.
A safer flow would be to perform the kill only after a target account is confirmed (i.e., move `ensureCodexStoppedForSwitch` into each target handler just before `activateAccountByKey` / `saveRegistry`).
How can I resolve this? If you propose a fix, please make it concise.| fn shouldKillBeforeSwitch( | ||
| allocator: std.mem.Allocator, | ||
| codex_home: []const u8, | ||
| opts: cli.types.SwitchOptions, | ||
| ) !bool { | ||
| if (opts.kill) |kill| return kill; | ||
| var reg = try registry.loadRegistry(allocator, codex_home); | ||
| defer reg.deinit(allocator); | ||
| return reg.auto_kill; | ||
| } |
There was a problem hiding this comment.
Registry loaded twice on the hot path when
auto_kill is true
When opts.kill is null and the registry's auto_kill is true, shouldKillBeforeSwitch loads and immediately discards the registry (line 121–123). Every switch target (handleSwitchQuery, handleSwitchPrevious, the picker path) then loads the registry again. Passing the already-loaded registry into handleSwitch or threading auto_kill through SwitchOptions would avoid the redundant I/O on every automated-rotation invocation.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/workflows/switch.zig
Line: 115-124
Comment:
**Registry loaded twice on the hot path when `auto_kill` is true**
When `opts.kill` is `null` and the registry's `auto_kill` is `true`, `shouldKillBeforeSwitch` loads and immediately discards the registry (line 121–123). Every switch target (`handleSwitchQuery`, `handleSwitchPrevious`, the picker path) then loads the registry again. Passing the already-loaded registry into `handleSwitch` or threading `auto_kill` through `SwitchOptions` would avoid the redundant I/O on every automated-rotation invocation.
How can I resolve this? If you propose a fix, please make it concise.| fn sleepMs(ms: i64) void { | ||
| app_runtime.io().sleep(std.Io.Duration.fromMilliseconds(ms), .awake) catch {}; | ||
| } |
There was a problem hiding this comment.
sleepMs parameter is signed but only meaningful as unsigned
ms: i64 is passed to std.Io.Duration.fromMilliseconds, which likely expects a non-negative value. A negative argument would either wrap to a huge duration or cause a runtime assertion failure. Since the function is only called with literal positive constants today, this is fine in practice, but changing the parameter to u64 makes the contract explicit and prevents a future caller from accidentally passing a negative value.
| fn sleepMs(ms: i64) void { | |
| app_runtime.io().sleep(std.Io.Duration.fromMilliseconds(ms), .awake) catch {}; | |
| } | |
| fn sleepMs(ms: u64) void { | |
| app_runtime.io().sleep(std.Io.Duration.fromMilliseconds(ms), .awake) catch {}; | |
| } |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/workflows/process_kill.zig
Line: 115-117
Comment:
**`sleepMs` parameter is signed but only meaningful as unsigned**
`ms: i64` is passed to `std.Io.Duration.fromMilliseconds`, which likely expects a non-negative value. A negative argument would either wrap to a huge duration or cause a runtime assertion failure. Since the function is only called with literal positive constants today, this is fine in practice, but changing the parameter to `u64` makes the contract explicit and prevents a future caller from accidentally passing a negative value.
```suggestion
fn sleepMs(ms: u64) void {
app_runtime.io().sleep(std.Io.Duration.fromMilliseconds(ms), .awake) catch {};
}
```
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Add the kill flags and `config kill on|off` to the top-level `--help` overview so the feature is discoverable there, not only in the per-command help. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Motivation
When you switch the active account,
codex-authrewrites~/.codex/auth.json. But a Codex instance that is already running (the CLI or the desktop GUI) has the previous credentials loaded in memory, so the switch does not actually take effect until Codex is restarted manually. In practice this means: switch account → nothing changes → remember to quit and relaunch Codex → try again.This PR lets
codex-authoptionally stop all running Codex processes as part of the switch, and only proceed with the switch once none remain — so the freshly activated account is the one Codex picks up on its next start. This is especially useful for fully automated account rotation, where you don't want to babysit which instance is still holding an old session.What this adds
codex-auth switch --kill/--no-kill— per-invocation override.auto_killsetting inregistry.json, toggled viacodex-auth config kill on|off. When enabled, everyswitchstops Codex first;--kill/--no-killoverride it for a single run.effective = opts.kill orelse reg.auto_kill.Behaviour
When kill is in effect,
handleSwitch(before any target dispatch — covers picker / query / previous /--live):osascript ... to quit,SIGTERMon Unix, windowed close on Windows).error.CodexStillRunningand a clear message — the account is not changed.Both the CLI (
codex/codex.exe) and the GUI (macOS bundlecom.openai.codex, WindowsCodex.exe) are targeted.Safety
pkill -x codex,taskkill /IM codex.exe,tasklistexact image filter) socodex-authitself is never terminated.Cross-platform
New
src/workflows/process_kill.zig, dispatching onbuiltin.os.tag, reusing the codebase's existing process helpers (std.process.runwith the sharedapp_runtime.io()):taskkill /IM codex.exe /Tthen/F; detection viatasklistexact filter.osascriptquit for the GUI +pkill -TERM/-KILL -x codex|Codex; detection viapgrep -x.pkill -TERM/-KILL -x codex|Codex; detection viapgrep -x(best-effort).Registry / schema
auto_killis an additive optional field — no schema-version bump (consistent with howinterval_secondswas added). It is parsed tolerantly (absent → defaultfalse) and materialized into pre-existing files throughcurrentLayoutNeedsRewrite, so old and new binaries interoperate within schema v4.Tests
--kill,--no-kill, duplicate,--kill+--no-killconflict,switch <target> --kill.config kill on|offparsing incl. invalid value.auto_killpersists; missing field defaults tofalseand is rewritten.switchandconfig).The real process-killing paths invoke OS commands and are not unit-tested in CI (argument construction / dispatch is); manual verification steps were used for the end-to-end flow.
Notes / limitations
--livemode the kill runs once at start; in-session re-switches do not re-trigger it.🤖 Generated with Claude Code