Skip to content

OTA-over-LoRa: signed full/delta firmware updates (.mota) for ESP32 & nRF52 🤖🤖#2864

Open
vk496 wants to merge 15 commits into
meshcore-dev:devfrom
vk496:feature/ota-lora
Open

OTA-over-LoRa: signed full/delta firmware updates (.mota) for ESP32 & nRF52 🤖🤖#2864
vk496 wants to merge 15 commits into
meshcore-dev:devfrom
vk496:feature/ota-lora

Conversation

@vk496

@vk496 vk496 commented Jun 29, 2026

Copy link
Copy Markdown

TL;DR

Eventually upgradable. Low priority.

firmware OTA over LoRa. DHT/bittorrent propagation. Support ESP32 and nRF52. nRF52 requires special bootloader to apply OTA.. Serve folder with multiple mOTA through WiFI/serial (motatool)

Tested with Heltec V3, RAK4631, T114

Test firmware: https://github.com/vk496/MeshCore/releases/tag/dev-latest
User Quick Start: docs/ota_user_guide.md.
Dev specs: docs/ota_protocol.md

What this adds

Over-the-air firmware updates that travel over the existing LoRa mesh — no
internet, BLE, or USB needed at the target node. A node discovers update sources
among its neighbours, fetches a signed image block-by-block, verifies it, and
applies it. Images can be full or delta against a known base (deltas are
far smaller — essential on LoRa's tiny bandwidth).

Motivation: repeaters/sensors are often deployed somewhere physically awkward to
reach. Today updating them means going to the device. This lets you push a signed
update through the mesh itself.

Discussion/issue:

How it works

  • .mota container — a signed update package: manifest + payload + a merkle
    tree over fixed-size blocks, plus a 56-byte EndF self-identity trailer baked
    into every firmware (target id, version, hw id, image hash). Spec in
    docs/ota_protocol.md.
  • Discovery (two-tier, anti-storm) — a tiny periodic OTA_ADV beacon
    (seeder + count + set-digest); interested peers send OTA_QUERY and build a
    catalog from OTA_HAVE. Jitter + overhear-suppression keep it quiet.
  • Transfer — 1 KB logical blocks split into ~160 B DATA fragments + a merkle
    PROOF, reassembled and verified per block; resumable across reboots.
  • Apply — ESP32 stages into the inactive A/B slot (stock partition scheme,
    no bootloader change); nRF52 applies the delta in place via a companion
    bootloader (see Dependencies). Refuses on hw-id mismatch (brick-safety).
  • Trust — Ed25519 signatures + a per-node signer allowlist; autofetch and
    autoinstall policies are opt-in and conservative by default.
  • CLIota status | ls | get <#> | install | cancel | announce | self | ….

What's in the PR (layered for review)

The 13 commits build up in dependency order — container format → vendored decoder
→ transfer protocol → platform apply → folder relay → CLI/node integration →
build enablement + tooling → host packager → tests → docs → WiFi serve →
discovery tuning → nRF52 variant enablement. Each is independently reviewable.

Also included:

  • motatool — a portable C++17 host tool to build/verify/inspect/serve/keygen
    .mota (cross-checked byte-for-byte against the Python reference).
  • Folder relay — a node (or motatool) can serve a folder of .mota to peers
    over USB serial or WiFi/TCP (ESP32 companion: dedicated seeder port 5001 +
    an OTA text console on 5002, both coexisting with the phone-app port).
  • Tests — native unit tests (pio test -e native), motatool ctest, and a
    Python reference suite; generated tables (OtaTargets.h, mota_vectors.h) are
    committed.
  • Docs — protocol spec + a user guide.

Platform support

  • ESP32 (A/B OTA): works with the stock partition scheme, no bootloader change.
  • nRF52 (in-place delta): enabled on the 14 existing variants the companion
    bootloader supports (RAK4631, Heltec T114, LilyGo T-Echo family, ThinkNode,
    Wio-Tracker, Xiao nRF52, ProMicro, T1000-E, …). No new variants are introduced.

Dependencies

nRF52 in-place apply requires a companion bootloader change
(Adafruit_nRF52_Bootloader_OTAFIX, separate repo/PR — link TBD). ESP32 needs
nothing extra. Full/verify/serve paths need no bootloader on any platform.

Testing

  • ✅ Native unit tests, motatool ctest, Python reference suite all pass.
  • ✅ Builds: ESP32 (Heltec V3) and nRF52 (RAK4631, ThinkNode M1) with EndF injection.
  • HW-validated end-to-end on RAK4631 (nRF52840), Heltec T114 (nRF52840), and
    Heltec V3 (ESP32-S3): mesh discovery, full + delta fetch, signature/hash verify,
    and apply (ESP32 A/B + nRF52 in-place).

Notes / scope

  • detools 0.53.0 decoder is vendored (isolated, decoder-only, 3rd-party) — the
    delta codec is never reimplemented.
  • No public-API breakage; additive NodePrefs fields are versioned/back-compatible.

Checklist

  • Examples updated (companion/repeater hooks)
  • Code commented; protocol + user docs added
  • Builds + native tests pass; HW-validated
  • Linked feature issue / maintainer 👍 (per CONTRIBUTING for larger features)

vk496 added 13 commits June 29, 2026 20:17
… tooling (.mota reference, EndF/target/vector generators)
…eeder/console)

Extends the USB-serial folder relay to WiFi so an ESP32 companion can both
serve .mota and be operated headlessly:

- motatool `serve --tcp <host[:port]>`: a TcpTransport sibling of the serial
  transport (default port 5001). SeederCore/Folder are reused unchanged — the
  COUNT/DESCRIBE/READ protocol is transport-agnostic.
- ESP32 companion: a dedicated OTA seeder port (5001) for `serve --tcp`, plus
  an OTA text console on 5002 (`nc <ip> 5002` -> `ota status|ls|announce|...`,
  the same handle_ota_command CLI serial nodes have). Both run alongside the
  phone-app port (5000); all three coexist.
- WiFi.setSleep(false): ESP32 STA mode's modem power-save periodically sleeps
  the modem/CPU and stalls the SX1262 SPI+DIO servicing, leaving LoRa deaf
  while WiFi is associated. Disabling it restores the radio (HW-validated:
  a V3 WiFi companion is then discovered over LoRa and discovers its peers).
- docs: serving .mota over WiFi (protocol §10.2 + user guide).
Discovery was hard to use — a node only advertised at boot, so a peer that
ran `ota ls` minutes later saw "no neighbours".

- First self-advert ~8s after boot, then a short burst (~1 min), then
  re-announce at a random 3-10 min interval so a long-running node stays
  discoverable without all nodes beaconing in lockstep. The beacon is tiny,
  lowest-priority and duty-gated, so a few-minute cadence is cheap.
- `ota ls` now shows the raw target id (`hw XXXXXXXX`) when the env name is
  not in this build's OtaTargets.h table, instead of a blank "[other hw]".
Switch these existing variants from `nrf52_base` to `rak4631_hw` (which adds
ENABLE_OTA + the in-place flash store + the EndF post-build hook), so they get
OTA-over-LoRa. Scope is limited to variants already covered by the
Adafruit_nRF52_Bootloader_OTAFIX in-place apply — no new variants are added.
OtaTargets.h is regenerated to include their target ids.
@dreirund

dreirund commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Nice :-).

I see -- not knowing programming or the protocol in depth -- some issues:

OTA_ADV beacon interval should be configurable.

Does it really go multi-hop, or only zero-hop? (If multi-hop, it should really be thought how to get network burden down and to stop that just any remote evil person can flush the whole network with evil updates. I opt for zero-hop.)

I would further opt to advertise only manuall by default.

I tried to follow your links to docs/ota_user_guide.md and docs/ota_protocol.md, but they lead me only to some technical GitHub page saying "There isn’t anything to compare." I do not understand.

@vk496

vk496 commented Jun 29, 2026

Copy link
Copy Markdown
Author

Nice :-).

I see -- not knowing programming or the protocol in depth -- some issues:

OTA_ADV beacon interval should be configurable.

Does it really go multi-hop, or only zero-hop? (If multi-hop, it should really be thought how to get network burden down and to stop that just any remote evil person can flush the whole network with evil updates. I opt for zero-hop.)

I would further opt to advertise only manuall by default.

I tried to follow your links to docs/ota_user_guide.md and docs/ota_protocol.md, but they lead me only to some technical GitHub page saying "There isn’t anything to compare." I do not understand.

Thanks, links fixed.

By default I set max 3 hops, but I agree to actually change that dynamically (will merge in the following commits)

Advertising manually would prevent propagation of the firmware. I don't know what config should be by default, but beacons are intended to be small and cheap. Advertising one beacon (20 bytes IIRC) every 3-10min randomly should not have big impact in the mesh

@dreirund

dreirund commented Jun 29, 2026 via email

Copy link
Copy Markdown
Contributor

@vk496

vk496 commented Jun 29, 2026

Copy link
Copy Markdown
Author

I don't mind to follow what community agrees. You can always change the default behaviour. The only downside about every week is that you need to wait a week to know what firmwares are available around surrounding (ota ls)

Yes, the default behaviour is supported. Your target node must discover the firmware from your relay node and start pulling it.

The problem is that you need days (or even weeks in heavy traffic scenarios) to pull it. Just 1 device is fine, but if you have 7 nodes, that's a headache. That's why the idea of DHT/P2P firmware sharing. All node can share its own full firmware installed in the flash + mOTA (if available). Special nodes like ESP32 with WiFi can rely a folder from a host full of mOTA, so they don't need to store them in order to share them.

The goal is: if you have N same nodes, push once and let the firmware spread

@akiraysh

Copy link
Copy Markdown

Hey vk496, noticed you built the .mota OTA-over-LoRa protocol with merkle tree verification and delta updates across ESP32 and nRF52, that's rlly rlly cool, especially the DHT style propagation for nodes that are hard to physically reach.

We're building an open core OTA platform with a similar goal, pushing updates to fleets you can't easily access, though our approach goes through WiFi/cellular rather than mesh.

Would you be down for a quick call to talk through how you implemented the delta updates and merkle verification? Always good to compare notes with someone solving the same class of problem from a different angle.

…ved-set change

The discovery beacon previously re-announced at a random 3-10 min interval.
Replace that with a fixed, user-configurable cadence:

- OtaManager::advert_mins() — re-advertise every N minutes after the boot
  burst; 0 disables periodic re-advertise (boot burst only). Default 24h.
- Persisted in NodePrefs (CommonCLI) and runtime-tunable: `ota config advert
  <minutes>` (0..10080; 0 = off), and shown in `ota config`.
- When periodic advert is disabled, the scheduler still re-checks the config
  on a slow timer, so a later `ota config advert <mins>` takes effect live.

Also advertise immediately whenever the served set changes — when a motatool
folder is attached to / detached from the ESP32 WiFi seeder — so peers learn
about newly-available firmware without waiting for the next interval (the
`ota folder` serial path already announced on attach).

Docs: protocol beacon-cadence note + user-guide `ota config advert`.
@vk496

vk496 commented Jun 30, 2026

Copy link
Copy Markdown
Author

Hey vk496, noticed you built the .mota OTA-over-LoRa protocol with merkle tree verification and delta updates across ESP32 and nRF52, that's rlly rlly cool, especially the DHT style propagation for nodes that are hard to physically reach.

We're building an open core OTA platform with a similar goal, pushing updates to fleets you can't easily access, though our approach goes through WiFi/cellular rather than mesh.

Would you be down for a quick call to talk through how you implemented the delta updates and merkle verification? Always good to compare notes with someone solving the same class of problem from a different angle.

Hi. If you want quicker interaction, feel free to join the Discord thread:

https://discord.com/channels/1495203904898728149/1518163443750797332

@akiraysh

Copy link
Copy Markdown

Hey vk496, noticed you built the .mota OTA-over-LoRa protocol with merkle tree verification and delta updates across ESP32 and nRF52, that's rlly rlly cool, especially the DHT style propagation for nodes that are hard to physically reach.
We're building an open core OTA platform with a similar goal, pushing updates to fleets you can't easily access, though our approach goes through WiFi/cellular rather than mesh.
Would you be down for a quick call to talk through how you implemented the delta updates and merkle verification? Always good to compare notes with someone solving the same class of problem from a different angle.

Hi. If you want quicker interaction, feel free to join the Discord thread:

https://discord.com/channels/1495203904898728149/1518163443750797332

Hey, thanks for sharing, the link doesn't seem to be working for me, getting an invalid invite error. Mind sending a fresh one?

@vk496

vk496 commented Jun 30, 2026

Copy link
Copy Markdown
Author

Hey vk496, noticed you built the .mota OTA-over-LoRa protocol with merkle tree verification and delta updates across ESP32 and nRF52, that's rlly rlly cool, especially the DHT style propagation for nodes that are hard to physically reach.
We're building an open core OTA platform with a similar goal, pushing updates to fleets you can't easily access, though our approach goes through WiFi/cellular rather than mesh.
Would you be down for a quick call to talk through how you implemented the delta updates and merkle verification? Always good to compare notes with someone solving the same class of problem from a different angle.

Hi. If you want quicker interaction, feel free to join the Discord thread:
https://discord.com/channels/1495203904898728149/1518163443750797332

Hey, thanks for sharing, the link doesn't seem to be working for me, getting an invalid invite error. Mind sending a fresh one?

https://discord.gg/9sRhx5wvJ (OTA over LoRa in the development)

…rd RAM guard

Bound OTA-over-LoRa duty cycle across repeaters with one runtime-tunable,
persisted limit (OtaManager::max_hops, `ota config hops <0..8>`, default 3):

- Accept-gate: a node ignores OTA that arrived from more than max_hops hops
  away (neither processes nor relays it). 0 = direct only.
- Forward-cap: relay a flood only while still under max_hops, appending this
  node's path-hash (hop count increments like the mesh flood routing).
- RAM guard: relay an OTA flood only while more than OTA_FWD_MIN_FREE packet-
  pool slots stay free, so heavy OTA (best-effort, lowest-priority) can never
  monopolise the shared pool and starve real traffic — a dropped relay is
  re-requested by the source.

Persisted in NodePrefs (CommonCLI) and shown in `ota config`. Docs updated.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants