ESB support: machine status API + status-change webhook#169
Conversation
Implements Milestone 1 of the ESB support feature: the read-only status
API and the shared status representation that the status-change webhook
(Milestone 2) will reuse.
- Add Machine.status (derived: locked_out/oops/in_use/idle/unknown) and
Machine.status_dict (name, display_name, status, relay, oops, locked_out,
current_user, last_checkin, last_update) as the single source of truth for
both the list API and the upcoming webhook payload.
- Add GET /api/machines returning {"machines": [status_dict, ...]} sorted by
name, with MachineStatus / MachineStatusUser / MachinesListResponse pydantic
schemas documented via quart_schema.
- Unit tests for status/status_dict across all states (with and without a
logged-in user) and for the endpoint reflecting live machine state.
Requirement #3 (oops/lockout/clear control endpoints) already exists in
views/machine.py and is well covered, so no code change was needed there.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ents Implements Milestone 2: a fire-and-forget webhook that notifies an external consumer (the Equipment Status Board) of meaningful machine status changes, never on ordinary MCU heartbeats. - Add src/dm_mac/webhook.py with WebhookNotifier: enabled by STATUS_WEBHOOK_URL (disabled/None when unset), builds a payload from Machine.status_dict plus event/timestamp/user, and delivers it via a fire-and-forget asyncio task that POSTs with aiohttp using a per-attempt timeout and bounded exponential-backoff retry. The acting user is carried in a distinct `user` field (the event actor, e.g. who logged out), separate from status_dict's `current_user`. - Wire the notifier into create_app as WEBHOOK_NOTIFIER, mirroring SLACK_HANDLER. - Fire notify() from the meaningful status-change sites in models/machine.py: login, logout, unauthorized, unknown_fob, override_login, oops, unoops, lockout, unlock, reboot, plus the always-enabled RFID-tracking path. - Resolve the notifier via current_app in request-context (MCU/API) paths and via the Slack handler's app (slack.quart) in the Slack-command path, which runs without a request context. - Tests for the notifier (payload shape, retry/backoff, HTTP-error retry, give-up-and-log, from_env) and for the wiring (each event fires, heartbeats do not, Slack path fires, disabled notifier is a no-op). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Implements Milestone 3 (Acceptance Criteria) and completes the ESB support feature. - Document the new surface: STATUS_WEBHOOK_URL in the configuration env-var table; a "Machine Status API" section (GET /api/machines) and a "Status-Change Webhook" section (payload, events, delivery semantics) in docs/source/http-api.rst; and app-config, API endpoint, env-var, and Status-Change Webhook notes in CLAUDE.md. GET /api/machines is picked up by the generated OpenAPI spec and dm_mac.webhook by sphinx-apidoc. - Pin msgpack ^1.2.1 in pyproject.toml's existing "secure versions of transitive dependencies" block to clear a pre-existing GHSA-6v7p-g79w-8964 advisory (msgpack 1.2.0) that was failing `nox -s safety` on main, unrelated to this feature, so the "all nox sessions passing" acceptance criterion is met. - Move docs/features/esb-support.md to docs/features/completed/. All nox sessions pass (tests, mypy, pre-commit, typeguard, docs, safety); webhook.py at 100% coverage, 98% overall. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@claude please conduct an adversarial review of this PR |
|
Claude finished @jantman's task in 4m 42s —— View job Adversarial Review
Overall this is a clean, well-scoped PR. I found one real correctness bug that I'd want fixed before merge, plus two lower-severity reliability/efficiency nits. 1. (High confidence) Webhook payload's
|
Coverage Report
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
There was a problem hiding this comment.
Pull request overview
This PR adds ESB (Equipment Status Board) integration support by introducing a machine-status listing API and a best-effort status-change webhook, built on a shared Machine.status / Machine.status_dict representation. It also updates documentation and pins msgpack to address a pre-existing safety advisory so the full nox suite passes.
Changes:
- Add
GET /api/machines(OpenAPI-documented) returning all machines’ current status sorted by name. - Add
WebhookNotifier(enabled viaSTATUS_WEBHOOK_URL) and wire webhook events to meaningful machine state transitions (login/logout/oops/lockout/etc). - Update docs and dependency pinning (
msgpack) to satisfy CI/safety checks.
Reviewed changes
Copilot reviewed 16 out of 18 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/dm_mac/webhook.py |
New webhook notifier (fire-and-forget aiohttp POST with retry/backoff). |
src/dm_mac/models/machine.py |
Adds Machine.status/status_dict; wires webhook events to state-change chokepoints. |
src/dm_mac/views/api.py |
Adds GET /api/machines endpoint returning status for all machines. |
src/dm_mac/models/api_schemas.py |
Adds Pydantic schemas for machine status list responses. |
src/dm_mac/__init__.py |
Initializes WEBHOOK_NOTIFIER from env during app creation. |
tests/test_webhook.py |
Unit tests for WebhookNotifier behavior (payload, retries, disabled). |
tests/views/test_webhook_wiring.py |
Integration tests ensuring webhook fires only on meaningful events (not heartbeats), including Slack path. |
tests/views/test_api.py |
Tests for GET /api/machines ordering and state reflection. |
tests/models/test_machine.py |
Tests for Machine.status and Machine.status_dict. |
docs/source/http-api.rst |
Documents the new status API and webhook payload/events. |
docs/source/configuration.rst |
Documents STATUS_WEBHOOK_URL env var. |
docs/source/dm_mac.webhook.rst |
Adds Sphinx automodule page for webhook module. |
docs/source/dm_mac.rst |
Includes dm_mac.webhook in Sphinx module index. |
CLAUDE.md |
Updates architectural/config notes and documents new API/webhook. |
docs/features/completed/esb-support.md |
Feature writeup moved/recorded as completed. |
docs/features/completed/mcu-heartbeat-resilience.md |
Adds/updates completed feature doc content (non-code). |
pyproject.toml |
Pins msgpack to a secure version. |
poetry.lock |
Lockfile update for msgpack bump and dependency groups. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. |
Adversarial review (claude bot) and Copilot review findings: - (High) Webhook payload last_update was stale for RFID/oops events: MachineState.update() assigned self.last_update = time() *after* the handler ran, but the handler fires the webhook (reading status_dict) before returning, so the payload's last_update lagged one event behind and did not match its own timestamp (contradicting the documented example). Move the last_update assignment before the handler call in the oops and RFID-change branches. Add a regression test asserting payload["last_update"] == payload["timestamp"] for login/logout/oops fired through update(). - (Medium) Fire-and-forget delivery tasks had no strong reference and could be garbage-collected mid-flight (create_task returns a weakly-held task, and _deliver can run ~27s). Track in-flight tasks in a set and discard via a done-callback (the asyncio-docs pattern). - (Low) _deliver opened a new aiohttp.ClientSession per retry attempt; reuse a single session across attempts so retries benefit from connection pooling. - (Copilot) from_env logged the full STATUS_WEBHOOK_URL, which could leak secrets embedded in userinfo/query. Log only scheme+host via a new _safe_url helper; add tests proving credentials are not logged. - (Copilot) test _machine() helper did not return a fresh status_dict per access like the real property; replace the Mock with a small _FakeMachine whose status_dict property returns a fresh dict. All nox sessions pass; webhook.py at 100% coverage. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Thanks for the thorough adversarial review — all three findings are addressed in ad4c871: 1. (High) Note on 2. (Medium) Fire-and-forget task GC risk — Fixed. 3. (Low) New Also addressed the two Copilot inline nits (redact the webhook URL in logs; make the test |
|
@claude please re-review with the latest changes |
|
Claude finished @jantman's task in 3m 2s —— View job Re-review of ad4c871 (latest changes)
All previously reported findings are correctly fixed in ad4c871, and the fix commit is tightly scoped (only 1. 2. Fire-and-forget task GC risk — fixed correctly. 3. Copilot nit — URL redaction — fixed correctly. Copilot nit — test mutation leak — fixed correctly. No new issues found. Checked for: line-length/lint compliance in the changed lines (none exceed 88 chars), whether the now-unused parts of the old The already-posted coverage report on this thread (commit Nothing further to flag — this is ready to merge from my perspective. |
…hook Addresses a second round of PR review (Copilot inline comments on the control-path/reboot webhooks) — the same last_update staleness class as the earlier adversarial finding, but for the paths the first fix did not cover. The API/Slack-triggered control methods (MachineState.lockout/unlock/oops/ unoops) and _handle_reboot() did not update last_update before firing the status webhook, so lockout/unlock/oops/unoops/reboot events emitted a stale last_update (contradicting the documented "last meaningful state change" semantics and the status_dict schema). Fix by bumping self.last_update = time() at each mutation site: - MachineState.lockout(), unlock(), oops(), unoops() now set last_update inside their locked block (this also covers the MCU oops-button path via _handle_oops -> oops(), so the redundant pre-set previously added in update()'s oops branch is removed). - _handle_reboot() sets last_update (a reboot resets the machine — a meaningful change). - The RFID branch in update() keeps setting last_update before its inline handlers (they have no dedicated mutation method); comment clarified. This makes last_update consistent across all readers (Prometheus, GET /api/machines, and the webhook payload). Updated the three existing tests that asserted the old stale value (lockout/reboot now bump it to "now"), and extended the webhook regression test to assert last_update == timestamp for API oops/unoops/lockout/unlock and reboot. All nox sessions pass; webhook.py at 100% coverage. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@claude please re-review with the latest changes |
|
Claude finished @jantman's task in 2m 2s —— View job Re-review of 7d04a81 (latest changes)
This commit correctly addresses the second round of Copilot findings (the control-path/reboot Verified correct, call-site by call-site. I enumerated all 13
No call site was missed, and no new correctness issue (double-locking, deadlock, or skipped bump) was introduced by consolidating the bump into the mutation methods. Test coverage is solid. The new Note: I wasn't able to execute Nothing further to flag — all five Copilot findings from the second round, plus the original high-confidence adversarial finding, are now correctly fixed. This looks ready to merge from my perspective. |
…t_update) Two Copilot inline findings on machine.py: - status_dict read state.current_user with a check-then-use pattern. Capture the user object once into a local before the guard and the field reads, so the serialization can never trip over current_user being reassigned to None. (No real race exists today — status_dict is synchronous with no await, and no thread mutates current_user — but the local is clearer and defensive.) - The always-enabled branch of MachineState.update() bumped last_update on every check-in, so idle always-on machines showed a perpetually-"now" last_update in GET /api/machines and the machine_last_update_timestamp Prometheus gauge (and it contradicted the "last meaningful state change" semantics this PR standardized). Now bump last_update only on an actual change: the transition into the always-on state (relay was off) or a tracked RFID change (login/logout/unknown). Steady-state always-on heartbeats no longer touch last_update. The RFID-change bump still happens before _handle_rfid_tracking_always_enabled fires its webhook, so always-enabled login/logout webhook payloads keep a fresh last_update. Add a regression test asserting the always-on last_update is bumped on first contact and on RFID login, but not on a steady-state heartbeat. All nox sessions pass; 98% coverage. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
This is part of the fix for DecaturMakers/equipment-status-board#10 |
Two Copilot documentation findings: - The completed feature doc (docs/features/completed/esb-support.md) still led with the "# Feature Template" heading and the in-progress template instruction line, which is confusing for a finished feature. Retitle it "# ESB Support" (matching the sibling completed docs) with a short note that it captures the original request plus the implementation plan/progress. - The GET /api/machines docs implied last_update is only null when the machine has never checked in, but last_update starts null and is bumped only on a meaningful state change, independent of last_checkin. Clarify in docs/source/http-api.rst that a machine which has only sent idle heartbeats has a last_checkin while last_update is still null, and tighten the MachineStatus.last_update schema description to match. Docs-only; docs build (-W) and all nox sessions pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot review: msgpack is only pulled in transitively via cachecontrol, which is a dev-group dependency, and it is not used at runtime by the application. Pinning the secure floor under main [dependencies] wrongly expanded the production dependency set (msgpack appeared in the no-dev poetry export). Move `msgpack ^1.2.1` from main dependencies to group.dev.dependencies and regenerate poetry.lock. The dev export (which `nox -s safety` audits via `poetry export --with=dev`) still gets msgpack>=1.2.1, so the GHSA-6v7p-g79w-8964 advisory stays resolved, while the production export no longer contains msgpack. Updated the feature-doc note accordingly. Verified: production export has no msgpack; dev export has msgpack==1.2.1; `nox -s safety` reports no known vulnerabilities; all tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Implements the minimal changes needed in this application to support the Equipment Status Board integration. The approach keeps changes here small and lets ESB do the heavy lifting: we expose a status API and send it webhook events; ESB stores its own activity history.
Feature doc:
docs/features/completed/esb-support.md.What's included
Machine status API (
ESB Support - 1)Machine.status(derived:locked_out/oops/in_use/idle/unknown) andMachine.status_dictas a single shared representation.GET /api/machinesreturning all machines' status, sorted by name (in the OpenAPI spec).Status-change webhook (
ESB Support - 2)src/dm_mac/webhook.pyWebhookNotifier, enabled bySTATUS_WEBHOOK_URL(disabled when unset).aiohttpPOST with a per-attempt timeout and bounded exponential-backoff retry, so the MCU response is never blocked.login,logout,unauthorized,unknown_fob,override_login,oops,unoops,lockout,unlock,reboot— never on ordinary heartbeats. Wired at the same chokepoints as the Slack notifications.status_dict+event+timestamp+ a distinctuserfield (the event actor, e.g. who logged out). Resolves the notifier viacurrent_appin request-context paths and viaslack.quartin the context-less Slack-command path.Acceptance criteria (
ESB Support - 3)configuration.rst(env var),http-api.rst(status API + webhook),CLAUDE.md.docs/features/completed/.Config
Set
STATUS_WEBHOOK_URLto enable the webhook; leave unset to disable. No auth header is sent (per design discussion).Note on the msgpack pin
nox -s safetywas already failing onmaindue to a pre-existing transitivemsgpack==1.2.0advisory (GHSA-6v7p-g79w-8964), unrelated to this feature. To meet the "all nox sessions passing" acceptance criterion,msgpack ^1.2.1is pinned inpyproject.toml's existing "secure versions of transitive dependencies" block (same mechanism already used for urllib3/werkzeug/marshmallow). Happy to split this out if preferred.Testing
All 7 nox sessions pass (
tests,mypy,pre-commit,typeguard,docs,safety,coverage).webhook.pyat 100% coverage, 98% overall. New unit tests cover the notifier (payload/retry/backoff/disabled) and the wiring (each event fires, heartbeats do not, Slack path, disabled no-op).🤖 Generated with Claude Code