Skip to content

Exit quiescence when splice_init and tx_init_rbf are rejected#4495

Open
jkczyz wants to merge 7 commits intolightningdevkit:mainfrom
jkczyz:2026-03-splicing-rbf-follow-ups
Open

Exit quiescence when splice_init and tx_init_rbf are rejected#4495
jkczyz wants to merge 7 commits intolightningdevkit:mainfrom
jkczyz:2026-03-splicing-rbf-follow-ups

Conversation

@jkczyz
Copy link
Copy Markdown
Contributor

@jkczyz jkczyz commented Mar 18, 2026

Summary

  • Fix quiescence not being exited when tx_init_rbf or splice_init is rejected with Abort (e.g., insufficient RBF feerate, negotiation in progress), which left the channel stuck in a quiescent state
  • Refactor both handlers to return InteractiveTxMsgError, reusing the same pattern already used by the interactive TX message handlers, with a shared handle_interactive_tx_msg_err helper in channelmanager

Test plan

  • Existing test_splice_rbf_insufficient_feerate updated to verify quiescence is properly exited after tx_abort
  • Existing test_splice_feerate_too_high updated to verify quiescence is properly exited after splice_init rejection
  • All splicing tests pass

🤖 Generated with Claude Code

Based on #4494.

@ldk-reviews-bot
Copy link
Copy Markdown

ldk-reviews-bot commented Mar 18, 2026

👋 Thanks for assigning @wpaulino as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

@jkczyz jkczyz changed the title Exit quiescence when splice_init and tx_init_rbf are rejected Exit quiescence when splice_init and tx_init_rbf are rejected Mar 18, 2026
@jkczyz jkczyz force-pushed the 2026-03-splicing-rbf-follow-ups branch from a5d670c to 57aad34 Compare March 18, 2026 18:57
@jkczyz jkczyz self-assigned this Mar 19, 2026
@jkczyz jkczyz mentioned this pull request Mar 19, 2026
36 tasks
@jkczyz jkczyz force-pushed the 2026-03-splicing-rbf-follow-ups branch 3 times, most recently from 8cdc480 to 959b553 Compare March 31, 2026 00:16
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 31, 2026

Codecov Report

❌ Patch coverage is 84.72222% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.04%. Comparing base (2adb690) to head (bdc17dc).
⚠️ Report is 15 commits behind head on main.

Files with missing lines Patch % Lines
lightning/src/ln/channelmanager.rs 78.78% 12 Missing and 2 partials ⚠️
lightning/src/ln/channel.rs 89.74% 7 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4495      +/-   ##
==========================================
- Coverage   87.09%   87.04%   -0.05%     
==========================================
  Files         163      163              
  Lines      108856   108649     -207     
  Branches   108856   108649     -207     
==========================================
- Hits        94808    94575     -233     
- Misses      11563    11606      +43     
+ Partials     2485     2468      -17     
Flag Coverage Δ
fuzzing 40.17% <55.55%> (-0.03%) ⬇️
tests 86.13% <84.72%> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jkczyz jkczyz force-pushed the 2026-03-splicing-rbf-follow-ups branch from 959b553 to 69a1a1c Compare March 31, 2026 22:28
@jkczyz jkczyz marked this pull request as ready for review April 2, 2026 17:03
@ldk-reviews-bot ldk-reviews-bot requested a review from wpaulino April 2, 2026 17:04
Comment on lines +13970 to +13974
fn quiescent_negotiation_err(&mut self, err: ChannelError) -> InteractiveTxMsgError {
let exited_quiescence =
if matches!(err, ChannelError::Abort(_)) { self.exit_quiescence() } else { false };
InteractiveTxMsgError { err, splice_funding_failed: None, exited_quiescence }
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In splice_init, resolve_queued_contribution (which can return Abort(FeeRateTooHigh)) is called before validate_splice_init checks is_quiescent(). If a misbehaving peer sends splice_init during the STFU handshake (before quiescence is established), and the holder has a queued contribution with a low max_feerate, the Abort error would flow into quiescent_negotiation_errexit_quiescence(), which has debug_assert!(!is_local_stfu_sent()) / debug_assert!(!is_remote_stfu_sent()).

Those asserts would fire because the STFU flags are still set (handshake incomplete). In release builds, exit_quiescence would harmlessly return false, but in debug/test builds this would panic.

Same issue applies to tx_init_rbf at line 12762 (and line 12780).

Consider guarding with an is_quiescent() check before calling exit_quiescence:

Suggested change
fn quiescent_negotiation_err(&mut self, err: ChannelError) -> InteractiveTxMsgError {
let exited_quiescence =
if matches!(err, ChannelError::Abort(_)) { self.exit_quiescence() } else { false };
InteractiveTxMsgError { err, splice_funding_failed: None, exited_quiescence }
}
fn quiescent_negotiation_err(&mut self, err: ChannelError) -> InteractiveTxMsgError {
let exited_quiescence = if matches!(err, ChannelError::Abort(_))
&& self.context.channel_state.is_quiescent()
{
self.exit_quiescence()
} else {
false
};
InteractiveTxMsgError { err, splice_funding_failed: None, exited_quiescence }
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, though I took the opportunity to have validate_splice_init and validate_ tx_init_rbf no longer call validate_splice_contributions and FundingScope::for_splice. Rather they are called at those call sites. This let's us check if we are quiescent before checking fee rates when calling resolve_queued_contribution.

@ldk-claude-review-bot
Copy link
Copy Markdown
Collaborator

ldk-claude-review-bot commented Apr 2, 2026

No new issues found beyond the prior review comments.

Prior comment status:

  • lightning/src/ln/channel.rs:14248 (quiescent_negotiation_err debug_assert) — On deeper analysis, this appears to be a false positive. Both validate_splice_init (line 12687) and validate_tx_init_rbf (line 12957) check is_quiescent() early and return WarnAndDisconnect if not quiescent. All Abort-producing code paths are gated behind these checks, so debug_assert!(is_quiescent()) in quiescent_negotiation_err should always hold. The new tests test_splice_init_before_quiescence_sends_warning and test_tx_init_rbf_before_quiescence_sends_warning confirm the correct behavior.

  • lightning/src/ln/channel.rs:9358 (on_tx_signatures_exchange else branch) — On re-reading the diff, exit_quiescence() at line 9360 is positioned after the inner if let / else block but still inside the outer if let Some(pending_splice) block. It is called for both the success and debug_assert!(false) branches. My prior comment was incorrect.

  • Minor observations about internal_tx_complete duplication and needs_holding_cell_release being broader than the old flag remain valid but harmless, as confirmed by the can_generate_new_commitment() guard (line 8671) and assert!(!is_quiescent()) (line 8686) in the holding cell release path.

Comment on lines 12688 to 12690
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we'll now abort an in-progress RBF if the counterparty misbehaves by sending us tx_init_rbf. The spec isn't really clear on what to do here. There's a similar case in validate_splice_init where we'll disconnect if we already have a pending splice (even if negotiation is still in progress). The spec does indicate this should be done. So maybe both of these are fine?

When tx_init_rbf is rejected with ChannelError::Abort (e.g.,
insufficient RBF feerate, negotiation in progress, feerate too high),
the error is converted to a tx_abort message but quiescence is never
exited and holding cells are never freed. This leaves the channel stuck
in a quiescent state.

Fix this by intercepting ChannelError::Abort before try_channel_entry!
in internal_tx_init_rbf, calling exit_quiescence on the channel, and
returning the error with exited_quiescence set so that handle_error
frees holding cells. Also make exit_quiescence available in non-test
builds by removing its cfg gate.

Update tests to use the proper RBF initiation flow (with tampered
feerates) so that handle_tx_abort correctly echoes the abort and exits
quiescence, rather than manually crafting tx_init_rbf messages that
leave node 0 without proper negotiation state.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jkczyz jkczyz force-pushed the 2026-03-splicing-rbf-follow-ups branch from d275551 to 3c55d3c Compare April 2, 2026 23:16
@jkczyz
Copy link
Copy Markdown
Contributor Author

jkczyz commented Apr 2, 2026

Rebased

&self.logger,
);
if let Err(ChannelError::Abort(_)) = &init_res {
funded_channel.exit_quiescence();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm I'm not sure we want to use this since the debug assertions there are reachable, but the mark_response_received call makes me realize we may need to do this elsewhere to avoid an unintentional disconnect (maybe write a test to confirm?).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm I'm not sure we want to use this since the debug assertions there are reachable,

The fixup commit makes sure that we don't return ChannelError::Abort until we know we are quiescent, so I think those asserts are unreachable from here?

but the mark_response_received call makes me realize we may need to do this elsewhere to avoid an unintentional disconnect (maybe write a test to confirm?).

Like when processing splice_init, splice_ack, tx_init_rbf, tx_ack_rbf, or interactive-tx messages?

One thing I noticed when exploring this as since we don't have a FundedChannel for v2 establishment, we use UNFUNDED_CHANNEL_AGE_LIMIT_TICKS to timeout instead. So updating sent_message_awaiting_response on the ChannelContext has no effect there. We'd effectively have two different mechanisms for timing out. Are we ok with that? Or should we try to use the UNFUNDED_CHANNEL_AGE_LIMIT_TICKS for splice funding, too?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like when processing splice_init, splice_ack, tx_init_rbf, tx_ack_rbf, or interactive-tx messages?

It should be when processing a message that triggers the end of quiescence (see FundedChannel::should_disconnect_peer_awaiting_response), so either tx_abort or tx_signatures.

One thing I noticed when exploring this as since we don't have a FundedChannel for v2 establishment, we use UNFUNDED_CHANNEL_AGE_LIMIT_TICKS to timeout instead. So updating sent_message_awaiting_response on the ChannelContext has no effect there. We'd effectively have two different mechanisms for timing out. Are we ok with that? Or should we try to use the UNFUNDED_CHANNEL_AGE_LIMIT_TICKS for splice funding, too?

The quiescence timeout needs to be much shorter because we have an active open channel that can handle payments.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Updated and added a few tests. PTAL.

);

// Node 1 handles the echo (no-op since it already aborted).
nodes[1].node.handle_tx_abort(node_id_0, &tx_abort_echo);
Copy link
Copy Markdown
Contributor

@wpaulino wpaulino Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should include a payment such that the outbound HTLC gets sent out after the abort as a result of freeing the holding cell.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@jkczyz jkczyz requested a review from wpaulino April 3, 2026 21:23
jkczyz and others added 4 commits April 3, 2026 16:34
Add a payment during quiescence in test_splice_rbf_insufficient_feerate
to verify that outbound HTLCs queued in the holding cell are freed when
quiescence is exited by the tx_abort exchange.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The same bug fixed in the prior commit for tx_init_rbf also exists in
internal_splice_init: when splice_init triggers FeeRateTooHigh in
resolve_queued_contribution, the ChannelError::Abort goes through
try_channel_entry! without exiting quiescence.

Apply the same fix: intercept ChannelError::Abort before
try_channel_entry!, call exit_quiescence, and return the error with
exited_quiescence set.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The prior two commits manually intercepted ChannelError::Abort in the
channelmanager handlers for splice_init and tx_init_rbf to exit
quiescence before returning, since the channel methods didn't signal
this themselves. The interactive TX message handlers already solved this
by returning InteractiveTxMsgError which bundles exited_quiescence into
the error type.

Apply the same pattern: change splice_init and tx_init_rbf to return
InteractiveTxMsgError, adding a quiescent_negotiation_err helper on
FundedChannel that exits quiescence for Abort errors and passes through
other variants unchanged. Extract handle_interactive_tx_msg_err in
channelmanager to deduplicate the error handling across internal_tx_msg,
internal_splice_init, and internal_tx_init_rbf.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
validate_splice_init and validate_tx_init_rbf check preconditions
including quiescence. Move them before resolve_queued_contribution
so that a misbehaving peer sending splice_init or tx_init_rbf before
quiescence is rejected early. This also moves validate_splice_contributions
and FundingScope construction into the callers since they depend on the
resolved contribution.

Have validate_tx_init_rbf return the last candidate's funding pubkeys
so the caller can construct FundingScope without re-accessing
pending_splice.

Add a debug_assert in quiescent_negotiation_err that Abort errors only
occur when the channel is quiescent, since exit_quiescence requires it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jkczyz jkczyz force-pushed the 2026-03-splicing-rbf-follow-ups branch from 3c55d3c to e4b1c6b Compare April 3, 2026 21:51
@ldk-reviews-bot
Copy link
Copy Markdown

🔔 1st Reminder

Hey @wpaulino! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

debug_assert!(!err.exited_quiescence || matches!(err.err, ChannelError::Abort(_)));

MsgHandleErrInternal::from_chan_no_close(err.err, channel_id)
.with_exited_quiescence(err.exited_quiescence)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does feel a bit like a leaky abstraction - can we move exited_quiescence into the ChannelError and just call it needs_holding_cell_release? Even moreso because the variable doesn't make sense if we disconnect, close, or send an error.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think we can infer it from tx_abort. At least I managed to get Clause to convince me that. Also, renamed the variables to needs_holding_cell_release and include similar inference elsewhere.

Let me know if this is looks good. I can make it a fixup if you prefer. May need a careful look, though.

@ldk-reviews-bot
Copy link
Copy Markdown

🔔 2nd Reminder

Hey @wpaulino! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

The `exited_quiescence` field on `MsgHandleErrInternal` and
`InteractiveTxMsgError` is a leaky abstraction -- the channelmanager
error handling shouldn't know about quiescence, only whether the holding
cell needs to be released.

Infer this from the presence of a `tx_abort` instead, since exiting
quiescence via an error always produces one. Remove `exited_quiescence`
from `InteractiveTxMsgError`, `MsgHandleErrInternal`, and the return
type of `Channel::tx_abort`, along with the `with_exited_quiescence`
builder.

For unfunded v2 channels, `tx_abort` may be present without quiescence
having been exited, but the holding cell release is a no-op since an
unfunded channel won't have any HTLCs. Similarly, the unreachable
`debug_assert!(false)` branch in `fail_interactive_tx_negotiation` for
funded channels produces a `tx_abort` without exiting quiescence, but
the holding cell release is a no-op since the channel is still quiescent.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment on lines 9358 to 9360
} else {
debug_assert!(false);
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: regression in defensive coding — channel can get stuck in quiescence in release builds.

The old code called clear_quiescent() before the if let, so it always ran regardless of which branch was taken. The new code only calls exit_quiescence() in the success branch. If this debug_assert!(false) branch is somehow reached in a release build (e.g., due to a bug elsewhere), the channel would be stuck in quiescence indefinitely — unable to generate new commitments — until a disconnect/reconnect cycle recovers it.

Consider moving exit_quiescence() after the if/else, or adding it to the else branch:

Suggested change
} else {
debug_assert!(false);
}
} else {
debug_assert!(false);
self.exit_quiescence();
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. BTW, the reason it is moved after rather than left before is we can't take two mutable references to self.

Several code paths exit quiescence by calling `clear_quiescent()` directly
without also clearing the disconnect timer via `mark_response_received()`.
This causes the timer to fire after the splice completes or is aborted,
spuriously disconnecting the peer.

Replace `clear_quiescent()` with `exit_quiescence()` in
`on_tx_signatures_exchange`, `reset_pending_splice_state`, and
`peer_connected_get_handshake`, which clears both the quiescent state and
the disconnect timer.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jkczyz jkczyz force-pushed the 2026-03-splicing-rbf-follow-ups branch from b861c68 to bdc17dc Compare April 8, 2026 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

5 participants