feat(dronecan): on-demand GetNodeInfo, GetSet, ExecuteOpcode, RestartNode via async slot#11683
Draft
daijoubu wants to merge 95 commits into
Draft
feat(dronecan): on-demand GetNodeInfo, GetSet, ExecuteOpcode, RestartNode via async slot#11683daijoubu wants to merge 95 commits into
daijoubu wants to merge 95 commits into
Conversation
SJW=8 was overly conservative (80% of bit time at 1Mbps with 10 quanta). SJW=3 is the standard value also used by the F7 driver. Tested with 6037 arm/disarm cycles at 500kbps: TEC=0, REC=0, zero errors.
…DCAN PLL2Q was 3 (266 MHz, invalid for FDCAN ≤ 80 MHz). Fix to 10 (80 MHz). Extend PLL2 guard from USE_SDCARD_SDIO to USE_SDCARD_SDIO || USE_DRONECAN so H7 boards with CAN but no SD card get PLL2 configured. Adopt upstream PLL2M/N formula (VCI=1.6 MHz, VCO=800 MHz) and error check on HAL_RCCEx_PeriphCLKConfig.
…pport Remove redundant PeriphClkInitStruct clock config from canardSTM32CAN1_Init. system_stm32h7xx.c already configures FDCAN to use PLL2Q (80 MHz) when USE_DRONECAN is defined; duplicating it in the driver overwrites with PLL1. Also add CAN1 pin definitions and USE_DRONECAN to KAKUTEH7WING target (PD0/PD1, CAN1_STANDBY PD3 disabled by default).
Use HAL_RCCEx_GetPeriphCLKFreq(RCC_PERIPHCLK_FDCAN) instead of HAL_RCC_GetPCLK1Freq() for bit timing calculation. FDCAN is clocked from PLL2Q (80 MHz) configured in system_stm32h7xx.c; using PCLK1 (100 MHz) produced a ~25% baud rate error causing immediate bus-off. Restore SJW to 3 for better synchronisation tolerance.
Remove high-frequency LOG_DEBUG messages from GNSS Fix/Fix2/Auxiliary handlers, onTransferReceived, dronecanInit, and gps_dronecan HDOP path that fired at 25 Hz and flooded the log. Fix PLL2 VCO input to target 1.6 MHz (PLL2M = HSE/1600000, PLL2N = 500) rather than 2.0 MHz, keeping the operating point clearly within VCIRANGE_0 (1-2 MHz) as the original SDCARD-only code did with PLL2M=5. VCO output remains 800 MHz; FDCAN (80 MHz via PLL2Q=10) and SDMMC (200 MHz via PLL2R=4) outputs are unchanged.
Drop high-frequency and verbose-but-low-value LOG_DEBUG(CAN messages: - dronecan.c: Battery Info (x2), GetNodeInfo, NodeStatus, TX success, RX loop, commented-out debug blocks - canard_stm32h7xx_driver.c: timing computation intermediates (Baudrate, Max Quanta, Prescaler BS, Prescaler, Timings summary) - canard_stm32f7xx_driver.c: same timing intermediates, TX success, In CAN Init, commented-out clock and RX blocks Retain error-path messages (decode failed, TX/RX error, init failures) and the single-line Prescaler/SJW/BS summary logged at init.
… operation Tested at 1 Mbps on KAKUTEH7WING hardware and confirmed bus operational.
…ivers Remove CubeMX boilerplate markers, commented-out dead code, and development-time question comments from both drivers.
Fix: DroneCAN GNSS messages were being applied to gpsSolDRV regardless of the configured GPS provider. Guard added in gps_dronecan.c where it belongs, keeping CAN transport layer unaware of GPS config.
…g difference F7 bxCAN HAL writes SJW directly to BTR register where hardware adds 1, so stored value 3 gives 4 tq. This wider SJW is needed for reliable bus operation on F7 targets and is different from H7 where SJW=1 is actual tq.
Prevents state machine from continuing in INIT state when the CAN peripheral fails to initialize.
Prevents out-of-bounds access when STATE_DRONECAN_FAILED is active.
Prevents stale pre-bus-off frames from storming the bus on recovery.
With AutoRetransmission=ENABLE, frames that fail on a degraded bus occupy FIFO slots indefinitely. All 32 slots fill, HAL_FDCAN_AddMessage returns HAL_ERROR, and all outgoing traffic stalls permanently with no indication until full bus-off. DroneCAN reliability is handled at the application layer via periodic republishing.
Matches the H7 driver pattern. Previously the return value was silently discarded; if timing computation failed, uninitialized stack bytes were passed to HAL_CAN_Init.
The H7 FDCAN 128x11 recessive-bit recovery sequence takes up to 11.264ms at 125kbps. The 1ms delay was restarting the counter before it could complete, preventing the node from ever exiting bus-off. 20ms gives safe margin above worst-case and allows time to detect immediate re-entry.
Guard against non-DroneCAN GPS provider at the transport boundary (handle_GNSS* functions) rather than in each leaf function in gps_dronecan.c. Also adds the guard to handle_GNSSRCTMStream which had none. Removes stale UNUSED(pgnssAux) and placeholder comment from dronecanGPSReceiveGNSSAuxiliary.
canardSTM32GetProtocolStatus() was called on every dronecanUpdate() invocation (~500Hz) to detect bus-off. Moved into the existing 1Hz task block — bus-off detection latency of up to 1s is acceptable. Adds LOG_DEBUG to report BusOff and ErrorPassive flags each second for bench diagnostics.
AutoBusOff=ENABLE handles the 128x11 recovery sequence automatically, but ESR.BOFF is a sticky read-only flag that is NOT cleared when hardware recovery completes. GetProtocolStatus() reads this flag, so the state machine was permanently stuck in STATE_DRONECAN_BUS_OFF after any bus-off event on F7 targets. Stop/Start re-enters init mode which clears ESR.BOFF, allowing recovery detection to work correctly.
Comment incorrectly stated '25MHz' as a supported HSE value — 25MHz fails the assert. CMake always provides HSE_VALUE per-target via -DHSE_VALUE=<n> so the stm32h7xx_hal_conf.h fallback of 25MHz is never used. Current targets use 8MHz (default) or 16MHz (KAKUTEH7WING).
…adence GetProtocolStatus() was called every dronecanUpdate() cycle (~500Hz) in BUS_OFF state. Moved inside the 20ms recovery timer block so it runs at the same cadence as RecoverFromBusOff() — still detects recovery within 20ms but reduces MMIO reads from ~500/sec to ~50/sec.
HAL_CAN_Stop/Start called from the scheduler context with CAN interrupts active caused a full FC lockup. Reverted to empty stub pending investigation of a safe mechanism to clear the sticky ESR.BOFF flag on F7.
Unconditional 1Hz LOG_DEBUG was flooding the bootlog with healthy status messages. Now only logs when an error condition is actually present.
HAL_CAN_AddTxMessage returns non-OK when all mailboxes are busy — a normal transient condition at startup. The log was noise. Matches the H7 driver which already handles this path silently.
DroneCAN float16 optional fields encode NaN when unpopulated. Without a guard, NaN * 100 converts to 0 on Cortex-M (ARM VCVT saturation), permanently blocking the HDOP fallback path. Also passes values through gpsConstrainHDOP() to prevent uint16_t overflow for extreme DOP values.
…ompatible gpsSolDRV has hdop but no vdop field. VDOP and EPV are not interchangeable (different units, conversion requires receiver UERE). lastVDOP was a dead store with no valid consumer.
…ames size Guards the CLI state name array against future enum additions — if a new state is added without updating the array, the build fails immediately.
The buffer size check for MSP2_INAV_DRONECAN_NODE_INFO now uses an explicit per-field sum matching the writes that follow, so the check stays in sync when fields are added or removed.
…FAILED Previously busoffTimeUs was reset unconditionally, causing endless 20ms recovery attempts on a permanently-faulted bus with no exit path. After 50 attempts (~1 second at 20ms cadence) without clearing BusOff, transition to STATE_DRONECAN_FAILED and log the failure. The retry counter resets on successful recovery so a transient fault is handled correctly.
… lookups Add a private findNodeByID() helper in dronecan.c and a public dronecanGetNodeByID() accessor in the driver API. Use them to consolidate three independent linear-search loops that all performed the same nodeID lookup: - handle_NodeStatus: replace loop with findNodeByID() - handle_GetNodeInfoResponse: replace loop with findNodeByID() - MSP2_INAV_DRONECAN_NODE_INFO handler: replace loop+found flag with dronecanGetNodeByID(), simplifying to a flat early-exit structure
…SIZE constant Define the constant near the DroneCAN include with a breakdown comment showing each field's contribution. The check site now reads clearly and the single definition is the one place to update when the wire format changes.
…nav.h The size constant belongs alongside the message code definition, not inside the implementation file. Removed from fc_msp.c, added with field breakdown comment next to MSP2_INAV_DRONECAN_NODE_INFO.
Replace eight individual zero-assignments with memset() on the whole dronecanNodeInfo_t slot before setting non-zero fields. Future struct additions are zeroed automatically. Update coverage comment in dronecan_application_unittest.cc — GetNodeInfo response acceptance has been implemented since Phase 3; remove the stale 'Phase 3 will change this' note.
- Extend dronecanNodeInfo_t name field from 32 to 80 bytes to match the full UAVCAN GetNodeInfo name length (spec allows up to 80 bytes) - Use sizeof(node->name) in truncation logic so it tracks the struct - Update MSP2_INAV_DRONECAN_NODE_INFO wire format: 71 -> 119 bytes - Log when node table is full and a new node is silently dropped
…ase artifact Orphaned blob (old non-static TX for-loop + stray closing brace + duplicate non-static process1HzTasks) slipped through the 27-commit getnodeinfo rebase. The correct versions already exist: processCanardTxQueue() for ISR-driven TX and static process1HzTasks() at line 426 with NVIC masking.
…t symbols Two fixes found by running unit tests after the getnodeinfo rebase: 1. shouldAcceptTransfer response block was empty — GetNodeInfo responses were silently dropped by canard before handle_GetNodeInfoResponse could run. Restores the UAVCAN_PROTOCOL_GETNODEINFO_RESPONSE_SIGNATURE case that was present on the pre-rebase branch but clobbered during conflict resolution. 2. UNIT_TEST build: expose handle_NodeStatus, shouldAcceptTransfer, and onTransferReceived as non-static so dronecan_application_unittest.cc can call them directly. Add txErrCount/busOffCount to UNIT_TEST block. Fix stub typo canardSTM32Recieve→canardSTM32Receive and add missing canardSTM32GetAndClearRxDropCount stub. All 13 dronecan_application_unittest + 13 dronecan_getnodeinfo_unittest pass.
- Add transfer ID guard in handle_GetNodeInfoResponse to reject stale or replayed frames (uses per-node getNodeInfo_transfer_id counter) - Use sizeof(node->hw_unique_id) instead of magic number 16 in memcpy - Fix comment label: "Canard Senders" -> "Canard Handlers and Senders" - Remove extra blank line before #endif at end of file - Rename MSP field last_seen_ms -> elapsed_ms to match notes wording - Fix TwoDistinctNodesStoredSeparately test: interleave encode+dispatch so node 10 is not inserted with node 20's payload data
Replace auto-fetch-on-discovery with a single shared async slot that serves any DroneCAN service request (GetNodeInfo, param GetSet) via two new MSP commands: MSP2_INAV_DRONECAN_ASYNC_REQUEST (0x2043) — fires any service request MSP2_INAV_DRONECAN_ASYNC_RESULT (0x2044) — polls result with seq guard Design changes: - Strip node table to runtime fields only (nodeID, health, mode, uptime_sec, vendor_status_code, last_seen_ms); saves ~3.5 KB RAM - Remove automatic GetNodeInfo fetch on node discovery - Add dronecanAsyncRequest() as single entry point for all service calls - Add handle_AsyncServiceResponse() dispatching on service_id - Add 2-second timeout with seq generation counter to detect stale polls - Add UAVCAN_PROTOCOL_PARAM_GETSET_ID to shouldAcceptTransfer response block - Extend MSP2_INAV_DRONECAN_NODES wire format with uptime_sec and vendor_status_code (13 bytes/record, was 7)
…expiry - Add ExecuteOpcode (service 10) and RestartNode (service 5) to async slot framework: request builder, response decoder, shouldAccept filter, MSP ASYNC_REQUEST/RESULT handlers - Fix GetSet write: zero-initialise encode buffer so reserved bits are not contaminated by stack garbage - Fix GetSet write: include parameter name in request; ArduPilot nodes require the name field to accept writes - Prune node table entries silent for >10s so stale nodes (e.g. after a node ID change) are removed automatically
- Change service_id to uint8_t throughout (slot, function sig, fc_msp local) - Replace obvious buffer zero-init comment with explanation of why (UAVCAN reserved bits) - Fix extra indentation on timeout expiry assignment - MSP reads service_id as u16 for protocol compat; casts to u8 on use
… comments - GetNodeInfo name buffer: clamp to sizeof-1, add null terminator (matches param result pattern); grow name[80] -> name[81] to hold the terminator - Add DRONECAN_STATE_NOT_READY 0xFF constant to header; use it in fc_msp.c instead of a bare magic number - Update accepted/busy comment to note 1 also covers unrecognised service_id
…t, logging - Add UAVCAN-spec transfer_id match in handle_AsyncServiceResponse to reject stale frames after bus-off recovery (in-flight id is (transfer_id-1) & 0x1F) - Add comment clarifying buf_ptr=NULL is intentional for zero-length payloads - Add LOG_DEBUG for ExecuteOpcodeResponse decode failure (matches GetNodeInfo and ParamGetSet pattern)
- PARAM_GETSET minimum-size guard: < 2 → < 3 (index+is_write required); remove fallback ternary for is_write - handle_AsyncServiceResponse: update stale comment to name service_id specifically rather than vague "parameter"
Decode NumericValue min_value and max_value from UAVCAN param.GetSet response and thread them to the configurator. The FC now serializes min/max type and value bytes after the param value in the async result MSP payload.
…e field zeroing, named constants, struct alignment - dronecan.c: map NUMERICVALUE_* → DRONECAN_PARAM_TYPE_* explicitly in min/max decode switch; zero inactive fields before each switch to prevent stale values across consecutive GetSet calls - dronecan.h: reorder min/max struct fields (int64_t first) to eliminate 7-byte ARM alignment padding per min/max group - fc_msp.c: replace bare 1/2 with DRONECAN_PARAM_TYPE_INT/FLOAT; add braces to else-if blocks in min/max serializer
…mResult_t Reorder min/max fields to match existing value field convention (type, int64, float). Fix comment that incorrectly claimed the prior ordering eliminated ARM alignment padding — both orderings produce identical struct size.
Replace the removed MSP2_INAV_DRONECAN_NODE_INFO (0x2043) entry with documentation for the new async slot pattern introduced in the dronecan-param-getset feature: - MSP2_INAV_DRONECAN_NODES (0x2042): fix per-node record size 7→13 bytes (added uptime_sec u32 + vendor_status_code u16) - MSP2_INAV_DRONECAN_ASYNC_REQUEST (0x2043): document all four services (GetNodeInfo, RestartNode, ExecuteOpcode, ParamGetSet read/write) including param type encoding and sequence number correlation - MSP2_INAV_DRONECAN_ASYNC_RESULT (0x2044): document state machine, all service-specific result payloads, and slot-reset behaviour
Replace stale MSP2_INAV_DRONECAN_NODE_INFO entry and fix MSP2_INAV_DRONECAN_NODES record size to match the current dronecan-param-getset implementation: - MSP2_INAV_DRONECAN_NODES (8258): expand per-node record from 7 to 13 bytes — add uptime_sec(u32) and vendor_status_code(u16) fields - MSP2_INAV_DRONECAN_ASYNC_REQUEST (8259): replace removed NODE_INFO entry; documents common header + service-specific request fields for all four services (GetNodeInfo, RestartNode, ExecuteOpcode, ParamGetSet) - MSP2_INAV_DRONECAN_ASYNC_RESULT (8260): new entry documenting the 5-byte common header and all service-specific result payloads
Re-run gen_msp_md.py after updating msp_messages.json to keep the generated Markdown in sync with the JSON source of truth.
…rRespond dronecanAsyncRequest() is called from the main loop (via MSP handler). canardRequestOrRespond() mutates the libcanard TX queue, which is also accessed by the ISR-driven processCanardTxQueue(). Without masking, a TX-complete ISR could corrupt the queue mid-enqueue.
The struct dronecanNodeInfo_t was stripped of GetNodeInfo fields in 96f8a4b (shared async slot) to save ~3.5 KB RAM. GetNodeInfo results now go to dronecanAsyncSlot.result.node_info (DRONECAN_ASYNC_READY) instead of the node table. Update the test to: - Prime the async slot (PENDING + service_id + node_id) before dispatch, since handle_AsyncServiceResponse guards on all three - Assert slot state transitions to DRONECAN_ASYNC_READY - Assert result fields via dronecanAsyncSlot.result.node_info - Reset dronecanAsyncSlot in SetUp() for test isolation
- Replace hardcoded 2000 with DRONECAN_ASYNC_TIMEOUT_MS in timeout check and use >= to align with dronecanAsyncRequest guard - Add DRONECAN_NODE_STALE_TIMEOUT_MS constant; replace magic 10000 - Fix double-indented body in timeout check - Fix brace alignment in shouldAcceptTransfer response cases - Clear async slot to IDLE after reading ERROR state in MSP result handler - Add comment to dronecanParamResult_t.name[93] explaining DSDL capacity - Document 0xFF (bus not ready) sentinel in MSP async request JSON/README
…ART_NODE response decode Extends dronecan_application_unittest.cc with 13 new tests covering: - handle_AsyncServiceResponse guard rejections (IDLE state, wrong node_id, wrong transfer_id, wrong service_id) - PARAM_GETSET response decode for all five Value types (INT, FLOAT, BOOL, STRING, EMPTY) including integer and float min/max range fields - EXECUTE_OPCODE response decode (ok=true and ok=false) - RESTART_NODE response decode (ok=true) - dronecanAsyncRequest re-entry guard (slot PENDING → returns false) 26/26 tests pass.
- dronecan.c: clamp name and string-value lengths before memcpy in dronecanAsyncRequest() — safe today because MSP pre-clamps, but the function is part of a public API and must not rely on caller preconditions - dronecan.c: update stale block comment on handle_AsyncServiceResponse - fc_msp.c: replace manual 32-bit split/join of int64 wire values with memcpy through uint64_t, eliminating signed left-shift and implementation-defined right-shift on negative values - fc_msp.c: remove trailing whitespace on USE_DRONECAN #endif - unittest: narrow GAP-S7 re-entry guard test to RESTART_NODE so the null-payload check cannot mask a missing re-entry guard
- fc_msp.c: add missing sbufAdvance after sbufReadData in STRING write arm of PARAM_GETSET decoder — without it, req_name_len read the first byte of the string value, silently targeting the wrong param on the node - fc_msp.c: add sbufAdvance after name read for consistency with INT path - fc_msp.c: replace signed right-shift with memcpy/uint64_t for min_int and max_int range encoding, consistent with the value_int fix in pass 2 - dronecan.c, fc_msp.c: remove trailing whitespace on blank lines in new code
- dronecan.c: expand handle_AsyncServiceResponse comment to explain the single-slot design rationale - dronecan.c: normalise shouldAcceptTransfer to consistent 4-space indentation (was mixed tabs/spaces from prior edits) - unittest: add shouldAcceptTransfer tests for PARAM_GETSET, EXECUTE_OPCODE, and RESTART_NODE response acceptance (29 tests total)
|
Test firmware build ready — commit Download firmware for PR #11683 240 targets built. Find your board's
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
dronecanAsyncSlot) that serialises GetNodeInfo, param GetSet, ExecuteOpcode, and RestartNode requests through a single IDLE→PENDING→READY/ERROR state machine, avoiding per-service response queuesMSP2_INAV_DRONECAN_ASYNC_REQUEST/MSP2_INAV_DRONECAN_ASYNC_RESULT) so the configurator can read/write node parameters and trigger node restarts without custom DroneCAN toolingelapsed_ms(time since last NodeStatus)shouldAcceptTransferfilter, all four guard-rejection paths, all fiveparam.Valuetypes (INT/FLOAT/BOOL/STRING/EMPTY) with min/max range fields, ExecuteOpcode ok/fail, RestartNode, and the re-entry guardDependencies
42dabdc1a.Configurator branch:
daijoubu:feature/dronecan-configurator-tabcovers the UI for GetNodeInfo and param GetSet.Test plan