Skip to content

ESB support: machine status API + status-change webhook#169

Open
jantman wants to merge 9 commits into
mainfrom
esb-integration
Open

ESB support: machine status API + status-change webhook#169
jantman wants to merge 9 commits into
mainfrom
esb-integration

Conversation

@jantman

@jantman jantman commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

Implements the minimal changes needed in this application to support the Equipment Status Board integration. The approach keeps changes here small and lets ESB do the heavy lifting: we expose a status API and send it webhook events; ESB stores its own activity history.

Feature doc: docs/features/completed/esb-support.md.

What's included

Machine status API (ESB Support - 1)

  • Machine.status (derived: locked_out/oops/in_use/idle/unknown) and Machine.status_dict as a single shared representation.
  • New read-only GET /api/machines returning all machines' status, sorted by name (in the OpenAPI spec).
  • Requirement Bump codecov/codecov-action from 3.1.0 to 4.4.1 #3 (oops/lockout/clear control endpoints) already existed — no change needed.

Status-change webhook (ESB Support - 2)

  • New src/dm_mac/webhook.py WebhookNotifier, enabled by STATUS_WEBHOOK_URL (disabled when unset).
  • Fire-and-forget aiohttp POST with a per-attempt timeout and bounded exponential-backoff retry, so the MCU response is never blocked.
  • Fires on meaningful events onlylogin, logout, unauthorized, unknown_fob, override_login, oops, unoops, lockout, unlock, reboot — never on ordinary heartbeats. Wired at the same chokepoints as the Slack notifications.
  • Payload = status_dict + event + timestamp + a distinct user field (the event actor, e.g. who logged out). Resolves the notifier via current_app in request-context paths and via slack.quart in the context-less Slack-command path.

Acceptance criteria (ESB Support - 3)

  • Docs updated: configuration.rst (env var), http-api.rst (status API + webhook), CLAUDE.md.
  • Feature file moved to docs/features/completed/.

Config

Set STATUS_WEBHOOK_URL to enable the webhook; leave unset to disable. No auth header is sent (per design discussion).

Note on the msgpack pin

nox -s safety was already failing on main due to a pre-existing transitive msgpack==1.2.0 advisory (GHSA-6v7p-g79w-8964), unrelated to this feature. To meet the "all nox sessions passing" acceptance criterion, msgpack ^1.2.1 is pinned in pyproject.toml's existing "secure versions of transitive dependencies" block (same mechanism already used for urllib3/werkzeug/marshmallow). Happy to split this out if preferred.

Testing

All 7 nox sessions pass (tests, mypy, pre-commit, typeguard, docs, safety, coverage). webhook.py at 100% coverage, 98% overall. New unit tests cover the notifier (payload/retry/backoff/disabled) and the wiring (each event fires, heartbeats do not, Slack path, disabled no-op).

🤖 Generated with Claude Code

jantman and others added 4 commits July 4, 2026 07:49
Implements Milestone 1 of the ESB support feature: the read-only status
API and the shared status representation that the status-change webhook
(Milestone 2) will reuse.

- Add Machine.status (derived: locked_out/oops/in_use/idle/unknown) and
  Machine.status_dict (name, display_name, status, relay, oops, locked_out,
  current_user, last_checkin, last_update) as the single source of truth for
  both the list API and the upcoming webhook payload.
- Add GET /api/machines returning {"machines": [status_dict, ...]} sorted by
  name, with MachineStatus / MachineStatusUser / MachinesListResponse pydantic
  schemas documented via quart_schema.
- Unit tests for status/status_dict across all states (with and without a
  logged-in user) and for the endpoint reflecting live machine state.

Requirement #3 (oops/lockout/clear control endpoints) already exists in
views/machine.py and is well covered, so no code change was needed there.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ents

Implements Milestone 2: a fire-and-forget webhook that notifies an external
consumer (the Equipment Status Board) of meaningful machine status changes,
never on ordinary MCU heartbeats.

- Add src/dm_mac/webhook.py with WebhookNotifier: enabled by STATUS_WEBHOOK_URL
  (disabled/None when unset), builds a payload from Machine.status_dict plus
  event/timestamp/user, and delivers it via a fire-and-forget asyncio task that
  POSTs with aiohttp using a per-attempt timeout and bounded exponential-backoff
  retry. The acting user is carried in a distinct `user` field (the event
  actor, e.g. who logged out), separate from status_dict's `current_user`.
- Wire the notifier into create_app as WEBHOOK_NOTIFIER, mirroring SLACK_HANDLER.
- Fire notify() from the meaningful status-change sites in models/machine.py:
  login, logout, unauthorized, unknown_fob, override_login, oops, unoops,
  lockout, unlock, reboot, plus the always-enabled RFID-tracking path.
- Resolve the notifier via current_app in request-context (MCU/API) paths and
  via the Slack handler's app (slack.quart) in the Slack-command path, which
  runs without a request context.
- Tests for the notifier (payload shape, retry/backoff, HTTP-error retry,
  give-up-and-log, from_env) and for the wiring (each event fires, heartbeats
  do not, Slack path fires, disabled notifier is a no-op).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Implements Milestone 3 (Acceptance Criteria) and completes the ESB support
feature.

- Document the new surface: STATUS_WEBHOOK_URL in the configuration env-var
  table; a "Machine Status API" section (GET /api/machines) and a
  "Status-Change Webhook" section (payload, events, delivery semantics) in
  docs/source/http-api.rst; and app-config, API endpoint, env-var, and
  Status-Change Webhook notes in CLAUDE.md. GET /api/machines is picked up by
  the generated OpenAPI spec and dm_mac.webhook by sphinx-apidoc.
- Pin msgpack ^1.2.1 in pyproject.toml's existing "secure versions of
  transitive dependencies" block to clear a pre-existing GHSA-6v7p-g79w-8964
  advisory (msgpack 1.2.0) that was failing `nox -s safety` on main,
  unrelated to this feature, so the "all nox sessions passing" acceptance
  criterion is met.
- Move docs/features/esb-support.md to docs/features/completed/.

All nox sessions pass (tests, mypy, pre-commit, typeguard, docs, safety);
webhook.py at 100% coverage, 98% overall.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jantman

jantman commented Jul 4, 2026

Copy link
Copy Markdown
Contributor Author

@claude please conduct an adversarial review of this PR

@claude

claude Bot commented Jul 4, 2026

Copy link
Copy Markdown

Claude finished @jantman's task in 4m 42s —— View job


Adversarial Review

  • Gather context (diff, changed files)
  • Review src/dm_mac/webhook.py (new WebhookNotifier)
  • Review src/dm_mac/models/machine.py (status/status_dict)
  • Review wiring in views/routes that fire the webhook
  • Review src/dm_mac/views/api.py new GET /api/machines
  • Review src/dm_mac/models/api_schemas.py
  • Review tests for gaps
  • Review docs / config changes
  • Post final adversarial review

Overall this is a clean, well-scoped PR. I found one real correctness bug that I'd want fixed before merge, plus two lower-severity reliability/efficiency nits.


1. (High confidence) Webhook payload's last_update is stale for RFID-driven and oops-button events

docs/source/http-api.rst's example shows last_update matching the event timestamp for a logout event. That's not what the code actually does for most event types.

In src/dm_mac/models/machine.py::MachineState.update(), self.last_update = time() is assigned by the caller after the handler returns:

elif rfid_value != self.rfid_value:
    if rfid_value is None:
        await self._handle_rfid_remove()
    else:
        await self._handle_rfid_insert(users, rfid_value)
    self.last_update = time()   # <-- line 990, runs AFTER the handler
if oops:
    await self._handle_oops(users)
    self.last_update = time()   # <-- line 970, runs AFTER the handler

But _notify_status_webhook(...) — which builds the payload from machine.status_dict synchronously — is called from inside those handlers, before they return: _handle_rfid_remove (line 1091, logout), _handle_rfid_insert (lines 1122/1143/1189/1219, unknown_fob/override_login/login/unauthorized), and _handle_oops (line 1032, oops). At that point self.last_update still holds the previous event's timestamp, not the one just being fired.

Net effect: for login, logout, unauthorized, unknown_fob, override_login, and heartbeat-triggered oops, the webhook's last_update field is one event behind — it does not equal the event's own timestamp, contradicting the documented example. (Note: GET /api/machines is unaffected since it reads state fresh at request time; and always-enabled machines happen to get this right, since last_update is bumped before _handle_rfid_tracking_always_enabled runs — the bug is specific to the non-always-enabled RFID path and the oops path.)

No existing test catches this — tests/views/test_webhook_wiring.py only asserts event ordering/actor, and tests/models/test_machine.py::TestMachineStatus builds status_dict from a hand-set mock state rather than exercising it through update().

Suggested fix: set self.last_update = time() before invoking the handler (or inside each handler, before calling _notify_status_webhook), and add a regression test asserting payload["last_update"] == payload["timestamp"] for a login/logout/oops event fired through update().

Fix this →

2. (Medium) Fire-and-forget delivery task has no strong reference — GC risk

src/dm_mac/webhook.py:93:

def notify(self, machine, event, user=None) -> None:
    payload = self.build_payload(machine, event, user=user)
    create_task(self._deliver(payload))

The returned Task isn't stored anywhere. Per the asyncio.create_task docs, the event loop only holds a weak reference to tasks — a task with no other strong reference can be garbage-collected mid-flight at any time. This is a pre-existing pattern elsewhere in the codebase for one-shot Slack chat_postMessage calls, but _deliver here can run much longer (up to ~7s of backoff sleeps + up to 4×5s request timeouts ≈ 27s worst case), which meaningfully widens the exposure window compared to the existing single-await Slack calls.

Suggested fix: keep a module/instance-level set of in-flight tasks and discard via a done-callback (the standard pattern from the asyncio docs), e.g.:

self._tasks: set = set()
...
task = create_task(self._deliver(payload))
self._tasks.add(task)
task.add_done_callback(self._tasks.discard)

3. (Low) New aiohttp.ClientSession per retry attempt

_deliver (webhook.py:111) opens a fresh ClientSession (and thus a fresh TCP/TLS handshake) on every attempt inside the retry loop, rather than reusing one session across attempts. aiohttp's own docs discourage creating a session per request. For a low-frequency status webhook this is unlikely to matter in practice, but it does mean the exponential backoff isn't buying connection reuse — each retry pays full connection setup cost again. Not blocking, just a minor efficiency nit.


Everything else — the status/status_dict derivation, the GET /api/machines endpoint, the Slack-path vs. request-context notifier resolution (current_app vs slack.quart), the unauthorized_warn_only / always_enabled interactions, and the disabled-by-default behavior when STATUS_WEBHOOK_URL is unset — all looked correct and were well covered by tests. The msgpack pin bump is reasonable and well-justified in the PR description.

@github-actions

github-actions Bot commented Jul 4, 2026

Copy link
Copy Markdown

Coverage

Coverage Report
FileStmtsMissBranchBrPartCoverMissing
src/dm_mac
   __init__.py75060100% 
   cli_utils.py15000100% 
   neon_fob_adder.py2321560593%79, 116–117, 124, 270, 333–334, 341, 364–367, 454–456
   neongetter.py211154399%309
   slack_handler.py2200560100% 
   utils.py25040100% 
   webhook.py620100100% 
src/dm_mac/models
   __init__.py0000100% 
   api_schemas.py49000100% 
   machine.py648152161697%671, 751, 1111–1113, 1229–1238
   users.py1030320100% 
src/dm_mac/views
   __init__.py0000100% 
   api.py41000100% 
   machine.py1030120100% 
   prometheus.py1320121100% 
TOTAL1916314622598% 

Tests Skipped Failures Errors Time
348 0 💤 0 ❌ 0 🔥 27.923s ⏱️

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds ESB (Equipment Status Board) integration support by introducing a machine-status listing API and a best-effort status-change webhook, built on a shared Machine.status / Machine.status_dict representation. It also updates documentation and pins msgpack to address a pre-existing safety advisory so the full nox suite passes.

Changes:

  • Add GET /api/machines (OpenAPI-documented) returning all machines’ current status sorted by name.
  • Add WebhookNotifier (enabled via STATUS_WEBHOOK_URL) and wire webhook events to meaningful machine state transitions (login/logout/oops/lockout/etc).
  • Update docs and dependency pinning (msgpack) to satisfy CI/safety checks.

Reviewed changes

Copilot reviewed 16 out of 18 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/dm_mac/webhook.py New webhook notifier (fire-and-forget aiohttp POST with retry/backoff).
src/dm_mac/models/machine.py Adds Machine.status/status_dict; wires webhook events to state-change chokepoints.
src/dm_mac/views/api.py Adds GET /api/machines endpoint returning status for all machines.
src/dm_mac/models/api_schemas.py Adds Pydantic schemas for machine status list responses.
src/dm_mac/__init__.py Initializes WEBHOOK_NOTIFIER from env during app creation.
tests/test_webhook.py Unit tests for WebhookNotifier behavior (payload, retries, disabled).
tests/views/test_webhook_wiring.py Integration tests ensuring webhook fires only on meaningful events (not heartbeats), including Slack path.
tests/views/test_api.py Tests for GET /api/machines ordering and state reflection.
tests/models/test_machine.py Tests for Machine.status and Machine.status_dict.
docs/source/http-api.rst Documents the new status API and webhook payload/events.
docs/source/configuration.rst Documents STATUS_WEBHOOK_URL env var.
docs/source/dm_mac.webhook.rst Adds Sphinx automodule page for webhook module.
docs/source/dm_mac.rst Includes dm_mac.webhook in Sphinx module index.
CLAUDE.md Updates architectural/config notes and documents new API/webhook.
docs/features/completed/esb-support.md Feature writeup moved/recorded as completed.
docs/features/completed/mcu-heartbeat-resilience.md Adds/updates completed feature doc content (non-code).
pyproject.toml Pins msgpack to a secure version.
poetry.lock Lockfile update for msgpack bump and dependency groups.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/dm_mac/webhook.py
Comment thread tests/test_webhook.py Outdated
@claude

claude Bot commented Jul 4, 2026

Copy link
Copy Markdown

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

Adversarial review (claude bot) and Copilot review findings:

- (High) Webhook payload last_update was stale for RFID/oops events:
  MachineState.update() assigned self.last_update = time() *after* the
  handler ran, but the handler fires the webhook (reading status_dict)
  before returning, so the payload's last_update lagged one event behind
  and did not match its own timestamp (contradicting the documented
  example). Move the last_update assignment before the handler call in the
  oops and RFID-change branches. Add a regression test asserting
  payload["last_update"] == payload["timestamp"] for login/logout/oops
  fired through update().
- (Medium) Fire-and-forget delivery tasks had no strong reference and
  could be garbage-collected mid-flight (create_task returns a weakly-held
  task, and _deliver can run ~27s). Track in-flight tasks in a set and
  discard via a done-callback (the asyncio-docs pattern).
- (Low) _deliver opened a new aiohttp.ClientSession per retry attempt;
  reuse a single session across attempts so retries benefit from
  connection pooling.
- (Copilot) from_env logged the full STATUS_WEBHOOK_URL, which could leak
  secrets embedded in userinfo/query. Log only scheme+host via a new
  _safe_url helper; add tests proving credentials are not logged.
- (Copilot) test _machine() helper did not return a fresh status_dict per
  access like the real property; replace the Mock with a small _FakeMachine
  whose status_dict property returns a fresh dict.

All nox sessions pass; webhook.py at 100% coverage.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jantman

jantman commented Jul 4, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the thorough adversarial review — all three findings are addressed in ad4c871:

1. (High) last_update stale for RFID/oops webhook events — Fixed. MachineState.update() now assigns self.last_update = time() before invoking _handle_oops / _handle_rfid_remove / _handle_rfid_insert (the always-enabled path was already correct), so the webhook payload built inside those handlers reflects this event. Added a regression test (test_payload_last_update_matches_timestamp) under frozen time asserting payload["last_update"] == payload["timestamp"] for login/logout/oops fired through update() — it fails without the fix.

Note on reboot: I intentionally left the reboot path as-is. update() never set last_update on reboot even before this PR, and changing that would also shift the pre-existing machine_last_update_timestamp Prometheus metric / Slack status semantics, which is out of scope here. A reboot webhook therefore reports the last meaningful state change before the reboot, which is defensible; happy to revisit separately if you'd prefer reboot to bump last_update too.

2. (Medium) Fire-and-forget task GC risk — Fixed. WebhookNotifier now keeps a self._tasks: set of in-flight delivery tasks and discards each via a done-callback (task.add_done_callback(self._tasks.discard)), the standard asyncio-docs pattern, so a long-running _deliver can't be collected mid-flight.

3. (Low) New ClientSession per retry attempt — Fixed. _deliver now opens a single aiohttp.ClientSession outside the retry loop and reuses it across attempts, so retries benefit from connection pooling.

Also addressed the two Copilot inline nits (redact the webhook URL in logs; make the test _machine() helper return a fresh status_dict per access). All nox sessions pass; webhook.py remains at 100% coverage.

@jantman jantman requested a review from Copilot July 4, 2026 13:58
@jantman

jantman commented Jul 4, 2026

Copy link
Copy Markdown
Contributor Author

@claude please re-review with the latest changes

@claude

claude Bot commented Jul 4, 2026

Copy link
Copy Markdown

Claude finished @jantman's task in 3m 2s —— View job


Re-review of ad4c871 (latest changes)


All previously reported findings are correctly fixed in ad4c871, and the fix commit is tightly scoped (only src/dm_mac/models/machine.py, src/dm_mac/webhook.py, and their tests — no unrelated changes).

1. last_update staleness — fixed correctly. MachineState.update() (src/dm_mac/models/machine.py:968-997) now sets self.last_update = time() immediately before calling _handle_oops() / _handle_rfid_remove() / _handle_rfid_insert(), so the webhook payload built synchronously inside those handlers reads the current event's timestamp. The new regression test (tests/views/test_webhook_wiring.py::test_payload_last_update_matches_timestamp) drives login/logout/oops through the real /api/machine/update endpoint under freeze_time and asserts payload["last_update"] == payload["timestamp"] for each — this genuinely exercises the previously-buggy code path rather than a hand-built mock. The reboot exclusion is called out explicitly and is a reasonable, well-justified scope decision (avoids touching pre-existing Prometheus/Slack semantics).

2. Fire-and-forget task GC risk — fixed correctly. WebhookNotifier (src/dm_mac/webhook.py:61) now keeps a self._tasks: Set["Task[None]"], adds each task on creation, and discards it via task.add_done_callback(self._tasks.discard) — exactly the standard asyncio-docs pattern. No test explicitly exercises this (hard to test directly), but the implementation is textbook-correct.

3. ClientSession per retry attempt — fixed correctly. _deliver (src/dm_mac/webhook.py:137) now opens a single session outside the retry loop and reuses it across attempts. Verified that aiohttp.ClientTimeout(total=...) set at the session level still applies per-request (not cumulative across the session's lifetime), so each retry attempt still gets its own full REQUEST_TIMEOUT_SEC budget — behavior is unchanged aside from now benefiting from connection reuse.

Copilot nit — URL redaction — fixed correctly. _safe_url() (src/dm_mac/webhook.py:63-74) strips userinfo/path/query via urlsplit, keeping only scheme://host[:port]. from_env() now logs only the redacted form while self.url (used for delivery) retains the full URL with credentials. New tests (test_does_not_log_secrets, TestSafeUrl) directly assert the userinfo/query are absent from the logged string.

Copilot nit — test mutation leak — fixed correctly. _FakeMachine (tests/test_webhook.py) now exposes status_dict as a @property returning dict(self._status_dict) fresh on each access, matching the real Machine.status_dict property, so build_payload()'s in-place mutation can no longer leak between calls/assertions.

No new issues found. Checked for: line-length/lint compliance in the changed lines (none exceed 88 chars), whether the now-unused parts of the old Mock-based helper left dead imports (no — Mock is still used for the user fixture elsewhere in the file), and whether the reordering of self.last_update assignment could affect any other logic in the oops/RFID branches (it doesn't — nothing else in update() reads last_update before the handler call in a way that would change behavior).

The already-posted coverage report on this thread (commit ad4c871) confirms 346 tests passing, webhook.py at 100% coverage, 98% overall — consistent with what I'd expect from these changes. I wasn't able to execute nox/pytest myself in this session (no poetry/python execution permissions available), so I relied on that CI-generated report plus static analysis of the diff for verification.

Nothing further to flag — this is ready to merge from my perspective.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 18 changed files in this pull request and generated 5 comments.

Comment thread src/dm_mac/models/machine.py
Comment thread src/dm_mac/models/machine.py
Comment thread src/dm_mac/models/machine.py
Comment thread src/dm_mac/models/machine.py
Comment thread src/dm_mac/models/machine.py
…hook

Addresses a second round of PR review (Copilot inline comments on the
control-path/reboot webhooks) — the same last_update staleness class as the
earlier adversarial finding, but for the paths the first fix did not cover.

The API/Slack-triggered control methods (MachineState.lockout/unlock/oops/
unoops) and _handle_reboot() did not update last_update before firing the
status webhook, so lockout/unlock/oops/unoops/reboot events emitted a stale
last_update (contradicting the documented "last meaningful state change"
semantics and the status_dict schema).

Fix by bumping self.last_update = time() at each mutation site:
- MachineState.lockout(), unlock(), oops(), unoops() now set last_update
  inside their locked block (this also covers the MCU oops-button path via
  _handle_oops -> oops(), so the redundant pre-set previously added in
  update()'s oops branch is removed).
- _handle_reboot() sets last_update (a reboot resets the machine — a
  meaningful change).
- The RFID branch in update() keeps setting last_update before its inline
  handlers (they have no dedicated mutation method); comment clarified.

This makes last_update consistent across all readers (Prometheus,
GET /api/machines, and the webhook payload). Updated the three existing
tests that asserted the old stale value (lockout/reboot now bump it to
"now"), and extended the webhook regression test to assert
last_update == timestamp for API oops/unoops/lockout/unlock and reboot.

All nox sessions pass; webhook.py at 100% coverage.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jantman

jantman commented Jul 4, 2026

Copy link
Copy Markdown
Contributor Author

@claude please re-review with the latest changes

@claude

claude Bot commented Jul 4, 2026

Copy link
Copy Markdown

Claude finished @jantman's task in 2m 2s —— View job


Re-review of 7d04a81 (latest changes)

  • Gather context (diff since previous review, latest commit 7d04a81)
  • Verify fix for lockout() last_update
  • Verify fix for unlock() last_update
  • Verify fix for oops() last_update (dedup with _handle_oops)
  • Verify fix for unoops() last_update
  • Verify fix for _handle_reboot() last_update
  • Check regression test coverage for control-path + reboot events
  • Look for any new issues introduced by these fixes
  • Post final review

This commit correctly addresses the second round of Copilot findings (the control-path/reboot last_update staleness), and the fix is well-scoped — only src/dm_mac/models/machine.py plus the two test files, 30/9/33 lines respectively.

Verified correct, call-site by call-site. I enumerated all 13 _notify_status_webhook(...) call sites in machine.py and traced last_update ordering for each:

  • MachineState.lockout() / unlock() / oops() / unoops() (machine.py:857-940) — each now sets self.last_update = time() as the first statement inside its with self._lock (or nullcontext() when do_locking=False) block, before any other mutation. The API/Slack-triggered async wrappers (Machine.lockout/unlock/oops/unoops, lines 271-321) call these sync methods before calling _notify_status_webhook, so the payload is always fresh.
  • _handle_oops()oops(do_locking=False) (line 1052-1053) — correctly reuses the same last_update bump inside oops() (the lock is already held by the caller's update(), hence do_locking=False), so the redundant pre-set removed from update()'s oops branch (line 984-987) is genuinely redundant, not a regression. Confirmed no double-set/skip bug.
  • _handle_reboot() (line 819-849) — now sets last_update immediately after the log line and before any state mutation/webhook, consistent with the "locking handled in update()" comment (reboot runs inside update()'s lock).
  • RFID paths (_handle_rfid_remove, _handle_rfid_insert, _handle_rfid_tracking_always_enabled — lines 1066+, 1247+) — unchanged from the first fix; update() still bumps last_update at lines 998 (always-enabled branch) and 1007 (non-always-enabled branch) before invoking these handlers, so all login/logout/unknown_fob/override_login/unauthorized events remain correctly ordered.

No call site was missed, and no new correctness issue (double-locking, deadlock, or skipped bump) was introduced by consolidating the bump into the mutation methods.

Test coverage is solid. The new test_control_and_reboot_last_update_matches_timestamp in tests/views/test_webhook_wiring.py drives oops/unoops/lockout/unlock (via the real API endpoints) and a reboot (via a lower-uptime heartbeat) through the actual app/client under freeze_time, and asserts payload["last_update"] == payload["timestamp"] for every event — genuinely exercising the previously-buggy paths rather than mocks. The three pre-existing tests in tests/views/test_machine.py that asserted the old stale value were correctly updated to the new "bumped to now" expectation.

Note: I wasn't able to execute nox/pytest myself in this sandboxed session (no network/package-install access for poetry), so this review is based on static analysis of the diff plus the previously-posted CI coverage report. Nothing in the diff suggests the previously-reported 100%/98% coverage numbers would regress.

Nothing further to flag — all five Copilot findings from the second round, plus the original high-confidence adversarial finding, are now correctly fixed. This looks ready to merge from my perspective.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 19 changed files in this pull request and generated 2 comments.

Comment thread src/dm_mac/models/machine.py
Comment thread src/dm_mac/models/machine.py Outdated
…t_update)

Two Copilot inline findings on machine.py:

- status_dict read state.current_user with a check-then-use pattern. Capture
  the user object once into a local before the guard and the field reads, so
  the serialization can never trip over current_user being reassigned to None.
  (No real race exists today — status_dict is synchronous with no await, and
  no thread mutates current_user — but the local is clearer and defensive.)

- The always-enabled branch of MachineState.update() bumped last_update on
  every check-in, so idle always-on machines showed a perpetually-"now"
  last_update in GET /api/machines and the machine_last_update_timestamp
  Prometheus gauge (and it contradicted the "last meaningful state change"
  semantics this PR standardized). Now bump last_update only on an actual
  change: the transition into the always-on state (relay was off) or a tracked
  RFID change (login/logout/unknown). Steady-state always-on heartbeats no
  longer touch last_update. The RFID-change bump still happens before
  _handle_rfid_tracking_always_enabled fires its webhook, so always-enabled
  login/logout webhook payloads keep a fresh last_update.

Add a regression test asserting the always-on last_update is bumped on first
contact and on RFID login, but not on a steady-state heartbeat.

All nox sessions pass; 98% coverage.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 20 changed files in this pull request and generated 2 comments.

Comment thread docs/features/completed/esb-support.md Outdated
Comment thread docs/source/http-api.rst Outdated
@jantman

jantman commented Jul 4, 2026

Copy link
Copy Markdown
Contributor Author

This is part of the fix for DecaturMakers/equipment-status-board#10

Two Copilot documentation findings:

- The completed feature doc (docs/features/completed/esb-support.md) still led
  with the "# Feature Template" heading and the in-progress template
  instruction line, which is confusing for a finished feature. Retitle it
  "# ESB Support" (matching the sibling completed docs) with a short note that
  it captures the original request plus the implementation plan/progress.

- The GET /api/machines docs implied last_update is only null when the machine
  has never checked in, but last_update starts null and is bumped only on a
  meaningful state change, independent of last_checkin. Clarify in
  docs/source/http-api.rst that a machine which has only sent idle heartbeats
  has a last_checkin while last_update is still null, and tighten the
  MachineStatus.last_update schema description to match.

Docs-only; docs build (-W) and all nox sessions pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 20 changed files in this pull request and generated 1 comment.

Comment thread pyproject.toml
Copilot review: msgpack is only pulled in transitively via cachecontrol, which
is a dev-group dependency, and it is not used at runtime by the application.
Pinning the secure floor under main [dependencies] wrongly expanded the
production dependency set (msgpack appeared in the no-dev poetry export).

Move `msgpack ^1.2.1` from main dependencies to group.dev.dependencies and
regenerate poetry.lock. The dev export (which `nox -s safety` audits via
`poetry export --with=dev`) still gets msgpack>=1.2.1, so the
GHSA-6v7p-g79w-8964 advisory stays resolved, while the production export no
longer contains msgpack. Updated the feature-doc note accordingly.

Verified: production export has no msgpack; dev export has msgpack==1.2.1;
`nox -s safety` reports no known vulnerabilities; all tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants