Skip to content

Fix peer storage size cap to account for encryption overhead#4746

Open
shivv23 wants to merge 3 commits into
lightningdevkit:mainfrom
shivv23:fix/peer-storage-size-limit
Open

Fix peer storage size cap to account for encryption overhead#4746
shivv23 wants to merge 3 commits into
lightningdevkit:mainfrom
shivv23:fix/peer-storage-size-limit

Conversation

@shivv23

@shivv23 shivv23 commented Jun 24, 2026

Copy link
Copy Markdown

Description

Fixes #4698: send_peer_storage() does not account for encryption/wire overhead when capping the peer storage blob size, causing it to emit PeerStorage messages that can exceed BOLT #1's 65531-byte limit and panic BOLT #8's transport layer.

Changes

1. lightning/src/chain/chainmonitor.rssend_peer_storage() size cap

Root cause: The function caps the sum of monitors' serialized_length() at MAX_PEER_STORAGE_SIZE (65531), but the message that goes on the wire is ~50 bytes larger due to:

  • 2 bytes: Vec<PeerStorageMonitorHolder> CollectionLength prefix added by encode()
  • 16 bytes: ChaCha20Poly1305 AEAD authentication tag
  • 32 bytes: random_bytes nonce seed appended during encryption

Fix: Introduced PEER_STORAGE_OVERHEAD = 2 + 16 + 32 and changed the size check from > MAX_PEER_STORAGE_SIZE to > MAX_PEER_STORAGE_SIZE - PEER_STORAGE_OVERHEAD.

The constant is deliberately decomposed (2, 16, 32) rather than hardcoded as 50, making the overhead sources auditable.

2. lightning/src/chain/channelmonitor.rsdummy_monitor() fix

dummy_monitor() creates a ChannelMonitor via ChannelMonitor::new(), which sets current_counterparty_commitment_number = 1 << 48 as a sentinel (INITIAL_COMMITMENT_NUMBER + 1). This 49-bit value overflows be48_to_array() on serialization, causing a panic in tests that serialize dummy monitors (e.g., the peer storage regression test).

Fix: After construction, override the field to 0 — a valid 48-bit value. The production constructor's sentinel is preserved unchanged.

3. Regression test

Added test_peer_storage_blob_within_size_limit (#[cfg(peer_storage)]) which:

  • Registers 600 dummy monitors (total plaintext ~180K, forcing selection to hit the cap)
  • Triggers send_peer_storage() and asserts msg.data.len() <= MAX_PEER_STORAGE_SIZE
  • Asserts msg.data.len() > MAX_PEER_STORAGE_SIZE - 500 to prove the boundary was reached

send_peer_storage() caps the sum of serialized_length() at
MAX_PEER_STORAGE_SIZE (65531), but the final wire message
includes a 2-byte CollectionLength prefix, 16-byte AEAD tag,
and 32-byte nonce seed. Without headroom, the encrypted blob
can exceed BOLT lightningdevkit#1's 65531-byte limit and BOLT lightningdevkit#8's 65535-byte
transport limit, causing a panic.

Also fixes a latent bug in dummy_monitor where
current_counterparty_commitment_number was set to 1 << 48
(2^48), which overflows the 48-bit field in be48_to_array.
@ldk-reviews-bot

ldk-reviews-bot commented Jun 24, 2026

Copy link
Copy Markdown

I've assigned @joostjager as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

Comment thread lightning/src/chain/channelmonitor.rs Outdated
counterparty_fulfilled_htlcs: new_hash_map(),

current_counterparty_commitment_number: 1 << 48,
current_counterparty_commitment_number: (1 << 48) - 1,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is in the production ChannelMonitor::new() constructor (line 1872), not in dummy_monitor() as the PR description claims. dummy_monitor() (line 6973) merely calls ChannelMonitor::new(), so this alters the initial value for every monitor created in production.

The original 1 << 48 is a deliberate sentinel (INITIAL_COMMITMENT_NUMBER + 1) meaning "the initial counterparty commitment has not been provided yet", distinct from INITIAL_COMMITMENT_NUMBER = (1 << 48) - 1 which means "at the initial counterparty commitment". This distinction is observable in is_closed_without_updates() (line 4435, == INITIAL_COMMITMENT_NUMBER).

In normal operation the value is always overwritten by provide_initial_counterparty_commitment_tx() before the monitor is ever serialized (see the invariant documented at channel.rs:6531-6535), so 1 << 48 is never actually written via be48_to_array in production — the panic only occurs because dummy_monitor() skips provide_initial_counterparty_commitment_tx().

Recommend fixing this in dummy_monitor() (e.g. by providing an initial counterparty commitment, or setting the field there) rather than weakening the documented sentinel/invariant in the production constructor. At minimum the PR description should be corrected, since this is a behavioral change to production code, not a test helper.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ldk-claude-review-bot thanks for the thorough review. Point by point:

1. current_counterparty_commitment_number sentinel in production constructor — Correct, this should not have been changed in ChannelMonitor::new(). Reverted back to 1 << 48 and moved the fix into dummy_monitor() instead, where it sets the field to 0 after construction. The production sentinel is now preserved.

2. Tightness of the cap — The math is intentional: data.len() = sum(serialized_length) + 2 (CollectionLength) + 16 (tag) + 32 (nonce) = sum + 50. With the cap at 65531 - 50 = 65481, the wire message is exactly 2 + 2 + 65531 = 65535 — at, not over, LN_MAX_MSG_LEN.

3. #[cfg(peer_storage)] gating — This is required because send_peer_storage() itself is #[cfg(peer_storage)], so the test must be gated identically to compile. It runs in CI when RUSTFLAGS="--cfg peer_storage" is set.

@ldk-claude-review-bot

ldk-claude-review-bot commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

The current state matches my memory exactly. The fix is correctly placed in dummy_monitor() (line 7039), the production constructor sentinel is preserved, and the overhead constant (50 = 2+16+32) is correct and tight.

No new issues found. My previously-flagged concern (the fix being in the production constructor) has been resolved in commit 4fed9891b — the fix is now correctly isolated to the dummy_monitor() test helper, preserving the documented production sentinel invariant.

Summary of review:

  • lightning/src/chain/chainmonitor.rs:994,1051 — The PEER_STORAGE_OVERHEAD = 2 + 16 + 32 constant correctly accounts for the CollectionLength prefix, ChaCha20Poly1305 AEAD tag, and nonce seed. Subtracting it from MAX_PEER_STORAGE_SIZE in the size check is correct and tight.
  • lightning/src/chain/channelmonitor.rs:7039 — The current_counterparty_commitment_number = 0 override is now correctly localized to dummy_monitor(), avoiding the be48_to_array panic on serialization without weakening the production 1 << 48 sentinel.

Minor (non-blocking, previously noted):

  • The new test test_peer_storage_blob_within_size_limit is gated behind #[cfg(peer_storage)], so it won't run in default CI configurations.
  • The PR description (now empty) previously claimed the value was set to (1 << 48) - 1, but the code sets 0. Not a code bug.

No new issues found.

…er instead of production constructor

The production constructor's 1 << 48 is a sentinel meaning 'counterparty
commitment not yet received'. Changing it to (1 << 48) - 1 aligns it with
INITIAL_COMMITMENT_NUMBER, which is observable in is_closed_without_updates().
Instead, fix the value in dummy_monitor() after construction so the sentinel
is preserved in production code.
@codecov

codecov Bot commented Jun 24, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.95%. Comparing base (e225350) to head (ff9c956).
⚠️ Report is 3224 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4746      +/-   ##
==========================================
+ Coverage   84.51%   86.95%   +2.43%     
==========================================
  Files         137      161      +24     
  Lines       77446   111636   +34190     
  Branches    77446   111636   +34190     
==========================================
+ Hits        65456    97071   +31615     
- Misses       9949    12060    +2111     
- Partials     2041     2505     +464     
Flag Coverage Δ
fuzzing-fake-hashes 8.43% <ø> (?)
fuzzing-real-hashes 32.40% <ø> (?)
tests 86.27% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

/// in `peer_channel_encryptor`).
#[test]
#[cfg(peer_storage)]
fn test_peer_storage_blob_within_size_limit() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd make sure this test tests right on the boundary. Currently it also passes if I manually set overhead to 0 again.

@shivv23 shivv23 Jun 25, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With 200 monitors the total serialized_length() was insufficient to reach the cap, so the test passed regardless of the overhead value. I've increased num_monitors to 600 and added a lower-bound assertion (msg.data.len() > MAX_PEER_STORAGE_SIZE - 500) to verify the boundary is actually exercised. Fixed in ff9c956.

// sentinel meaning "counterparty commitment not yet received". This value (2^48) exceeds the
// 48-bit field in be48_to_array and causes a panic on serialization. Since dummy monitors
// may be serialized in tests, reset it to a valid value.
monitor.inner.lock().unwrap().current_counterparty_commitment_number = 0;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems ok, even though in reality commitment numbers start high and go down. So 0 isn't really the default for a fresh channel. But it is consistent with the dummy holder commitment.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that 0 isn't the real-world default — commitment numbers start high and decrement. It is consistent with the other dummy values (e.g., the dummy holder commitment uses commitment number 0), and being a valid 48-bit value that serializes without panicking is the main objective here. Happy to change it to INITIAL_COMMITMENT_NUMBER (= (1 << 48) - 1) if you'd prefer greater realism.

Increase num_monitors from 200 to 600 and add a lower-bound assertion
to ensure the test actually reaches the MAX_PEER_STORAGE_SIZE boundary.
This prevents the test from silently passing when PEER_STORAGE_OVERHEAD
is not properly accounted for (e.g., set to 0).

Addresses joostjager's review feedback on PR lightningdevkit#4746.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Peer storage can send more than wire limit size

4 participants