Exit quiescence when `splice_init` and `tx_init_rbf` are rejected by jkczyz · Pull Request #4495 · lightningdevkit/rust-lightning

jkczyz · 2026-03-18T18:48:42Z

Summary

Fix quiescence not being exited when tx_init_rbf or splice_init is rejected with Abort (e.g., insufficient RBF feerate, negotiation in progress), which left the channel stuck in a quiescent state
Refactor both handlers to return InteractiveTxMsgError, reusing the same pattern already used by the interactive TX message handlers, with a shared handle_interactive_tx_msg_err helper in channelmanager

Test plan

Existing test_splice_rbf_insufficient_feerate updated to verify quiescence is properly exited after tx_abort
Existing test_splice_feerate_too_high updated to verify quiescence is properly exited after splice_init rejection
All splicing tests pass

🤖 Generated with Claude Code

Based on #4494.

ldk-reviews-bot · 2026-03-18T18:48:47Z

👋 Thanks for assigning @wpaulino as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

codecov · 2026-03-31T01:21:31Z

Codecov Report

❌ Patch coverage is 84.72222% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.04%. Comparing base (2adb690) to head (bdc17dc).
⚠️ Report is 15 commits behind head on main.

Files with missing lines	Patch %	Lines
lightning/src/ln/channelmanager.rs	78.78%	12 Missing and 2 partials ⚠️
lightning/src/ln/channel.rs	89.74%	7 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4495      +/-   ##
==========================================
- Coverage   87.09%   87.04%   -0.05%     
==========================================
  Files         163      163              
  Lines      108856   108649     -207     
  Branches   108856   108649     -207     
==========================================
- Hits        94808    94575     -233     
- Misses      11563    11606      +43     
+ Partials     2485     2468      -17

Flag	Coverage Δ
fuzzing	`40.17% <55.55%> (-0.03%)`	⬇️
tests	`86.13% <84.72%> (-0.07%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ldk-claude-review-bot · 2026-04-02T17:12:22Z

lightning/src/ln/channel.rs

+	fn quiescent_negotiation_err(&mut self, err: ChannelError) -> InteractiveTxMsgError {
+		let exited_quiescence =
+			if matches!(err, ChannelError::Abort(_)) { self.exit_quiescence() } else { false };
+		InteractiveTxMsgError { err, splice_funding_failed: None, exited_quiescence }
+	}


In splice_init, resolve_queued_contribution (which can return Abort(FeeRateTooHigh)) is called before validate_splice_init checks is_quiescent(). If a misbehaving peer sends splice_init during the STFU handshake (before quiescence is established), and the holder has a queued contribution with a low max_feerate, the Abort error would flow into quiescent_negotiation_err → exit_quiescence(), which has debug_assert!(!is_local_stfu_sent()) / debug_assert!(!is_remote_stfu_sent()).

Those asserts would fire because the STFU flags are still set (handshake incomplete). In release builds, exit_quiescence would harmlessly return false, but in debug/test builds this would panic.

Same issue applies to tx_init_rbf at line 12762 (and line 12780).

Consider guarding with an is_quiescent() check before calling exit_quiescence:

Suggested change

fn quiescent_negotiation_err(&mut self, err: ChannelError) -> InteractiveTxMsgError {

let exited_quiescence =

if matches!(err, ChannelError::Abort(_)) { self.exit_quiescence() } else { false };

InteractiveTxMsgError { err, splice_funding_failed: None, exited_quiescence }

}

fn quiescent_negotiation_err(&mut self, err: ChannelError) -> InteractiveTxMsgError {

let exited_quiescence = if matches!(err, ChannelError::Abort(_))

&& self.context.channel_state.is_quiescent()

{

self.exit_quiescence()

} else {

false

};

InteractiveTxMsgError { err, splice_funding_failed: None, exited_quiescence }

}

Fixed, though I took the opportunity to have validate_splice_init and validate_ tx_init_rbf no longer call validate_splice_contributions and FundingScope::for_splice. Rather they are called at those call sites. This let's us check if we are quiescent before checking fee rates when calling resolve_queued_contribution.

ldk-claude-review-bot · 2026-04-02T17:12:33Z

No new issues found beyond the prior review comments.

Prior comment status:

lightning/src/ln/channel.rs:14248 (quiescent_negotiation_err debug_assert) — On deeper analysis, this appears to be a false positive. Both validate_splice_init (line 12687) and validate_tx_init_rbf (line 12957) check is_quiescent() early and return WarnAndDisconnect if not quiescent. All Abort-producing code paths are gated behind these checks, so debug_assert!(is_quiescent()) in quiescent_negotiation_err should always hold. The new tests test_splice_init_before_quiescence_sends_warning and test_tx_init_rbf_before_quiescence_sends_warning confirm the correct behavior.
lightning/src/ln/channel.rs:9358 (on_tx_signatures_exchange else branch) — On re-reading the diff, exit_quiescence() at line 9360 is positioned after the inner if let / else block but still inside the outer if let Some(pending_splice) block. It is called for both the success and debug_assert!(false) branches. My prior comment was incorrect.
Minor observations about internal_tx_complete duplication and needs_holding_cell_release being broader than the old flag remain valid but harmless, as confirmed by the can_generate_new_commitment() guard (line 8671) and assert!(!is_quiescent()) (line 8686) in the holding cell release path.

jkczyz · 2026-04-02T23:09:37Z

lightning/src/ln/channel.rs

Note that we'll now abort an in-progress RBF if the counterparty misbehaves by sending us tx_init_rbf. The spec isn't really clear on what to do here. There's a similar case in validate_splice_init where we'll disconnect if we already have a pending splice (even if negotiation is still in progress). The spec does indicate this should be done. So maybe both of these are fine?

When tx_init_rbf is rejected with ChannelError::Abort (e.g., insufficient RBF feerate, negotiation in progress, feerate too high), the error is converted to a tx_abort message but quiescence is never exited and holding cells are never freed. This leaves the channel stuck in a quiescent state. Fix this by intercepting ChannelError::Abort before try_channel_entry! in internal_tx_init_rbf, calling exit_quiescence on the channel, and returning the error with exited_quiescence set so that handle_error frees holding cells. Also make exit_quiescence available in non-test builds by removing its cfg gate. Update tests to use the proper RBF initiation flow (with tampered feerates) so that handle_tx_abort correctly echoes the abort and exits quiescence, rather than manually crafting tx_init_rbf messages that leave node 0 without proper negotiation state. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jkczyz · 2026-04-02T23:19:28Z

Rebased

wpaulino · 2026-04-03T17:21:56Z

lightning/src/ln/channelmanager.rs

 						&self.logger,
 					);
+					if let Err(ChannelError::Abort(_)) = &init_res {
+						funded_channel.exit_quiescence();


Hm I'm not sure we want to use this since the debug assertions there are reachable, but the mark_response_received call makes me realize we may need to do this elsewhere to avoid an unintentional disconnect (maybe write a test to confirm?).

Hm I'm not sure we want to use this since the debug assertions there are reachable,

The fixup commit makes sure that we don't return ChannelError::Abort until we know we are quiescent, so I think those asserts are unreachable from here?

but the mark_response_received call makes me realize we may need to do this elsewhere to avoid an unintentional disconnect (maybe write a test to confirm?).

Like when processing splice_init, splice_ack, tx_init_rbf, tx_ack_rbf, or interactive-tx messages?

One thing I noticed when exploring this as since we don't have a FundedChannel for v2 establishment, we use UNFUNDED_CHANNEL_AGE_LIMIT_TICKS to timeout instead. So updating sent_message_awaiting_response on the ChannelContext has no effect there. We'd effectively have two different mechanisms for timing out. Are we ok with that? Or should we try to use the UNFUNDED_CHANNEL_AGE_LIMIT_TICKS for splice funding, too?

Like when processing splice_init, splice_ack, tx_init_rbf, tx_ack_rbf, or interactive-tx messages?

It should be when processing a message that triggers the end of quiescence (see FundedChannel::should_disconnect_peer_awaiting_response), so either tx_abort or tx_signatures.

One thing I noticed when exploring this as since we don't have a FundedChannel for v2 establishment, we use UNFUNDED_CHANNEL_AGE_LIMIT_TICKS to timeout instead. So updating sent_message_awaiting_response on the ChannelContext has no effect there. We'd effectively have two different mechanisms for timing out. Are we ok with that? Or should we try to use the UNFUNDED_CHANNEL_AGE_LIMIT_TICKS for splice funding, too?

The quiescence timeout needs to be much shorter because we have an active open channel that can handle payments.

Gotcha. Updated and added a few tests. PTAL.

wpaulino · 2026-04-03T17:24:35Z

lightning/src/ln/splicing_tests.rs

+	);
+
+	// Node 1 handles the echo (no-op since it already aborted).
+	nodes[1].node.handle_tx_abort(node_id_0, &tx_abort_echo);


We should include a payment such that the outbound HTLC gets sent out after the abort as a result of freeing the holding cell.

Add a payment during quiescence in test_splice_rbf_insufficient_feerate to verify that outbound HTLCs queued in the holding cell are freed when quiescence is exited by the tx_abort exchange. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The same bug fixed in the prior commit for tx_init_rbf also exists in internal_splice_init: when splice_init triggers FeeRateTooHigh in resolve_queued_contribution, the ChannelError::Abort goes through try_channel_entry! without exiting quiescence. Apply the same fix: intercept ChannelError::Abort before try_channel_entry!, call exit_quiescence, and return the error with exited_quiescence set. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The prior two commits manually intercepted ChannelError::Abort in the channelmanager handlers for splice_init and tx_init_rbf to exit quiescence before returning, since the channel methods didn't signal this themselves. The interactive TX message handlers already solved this by returning InteractiveTxMsgError which bundles exited_quiescence into the error type. Apply the same pattern: change splice_init and tx_init_rbf to return InteractiveTxMsgError, adding a quiescent_negotiation_err helper on FundedChannel that exits quiescence for Abort errors and passes through other variants unchanged. Extract handle_interactive_tx_msg_err in channelmanager to deduplicate the error handling across internal_tx_msg, internal_splice_init, and internal_tx_init_rbf. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

validate_splice_init and validate_tx_init_rbf check preconditions including quiescence. Move them before resolve_queued_contribution so that a misbehaving peer sending splice_init or tx_init_rbf before quiescence is rejected early. This also moves validate_splice_contributions and FundingScope construction into the callers since they depend on the resolved contribution. Have validate_tx_init_rbf return the last candidate's funding pubkeys so the caller can construct FundingScope without re-accessing pending_splice. Add a debug_assert in quiescent_negotiation_err that Abort errors only occur when the channel is quiescent, since exit_quiescence requires it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ldk-reviews-bot · 2026-04-06T00:00:18Z

🔔 1st Reminder

Hey @wpaulino! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

TheBlueMatt · 2026-04-06T16:12:08Z

lightning/src/ln/channelmanager.rs

+		debug_assert!(!err.exited_quiescence || matches!(err.err, ChannelError::Abort(_)));
+
+		MsgHandleErrInternal::from_chan_no_close(err.err, channel_id)
+			.with_exited_quiescence(err.exited_quiescence)


This does feel a bit like a leaky abstraction - can we move exited_quiescence into the ChannelError and just call it needs_holding_cell_release? Even moreso because the variable doesn't make sense if we disconnect, close, or send an error.

Ok, I think we can infer it from tx_abort. At least I managed to get Clause to convince me that. Also, renamed the variables to needs_holding_cell_release and include similar inference elsewhere.

Let me know if this is looks good. I can make it a fixup if you prefer. May need a careful look, though.

ldk-reviews-bot · 2026-04-08T00:01:01Z

🔔 2nd Reminder

Hey @wpaulino! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

The `exited_quiescence` field on `MsgHandleErrInternal` and `InteractiveTxMsgError` is a leaky abstraction -- the channelmanager error handling shouldn't know about quiescence, only whether the holding cell needs to be released. Infer this from the presence of a `tx_abort` instead, since exiting quiescence via an error always produces one. Remove `exited_quiescence` from `InteractiveTxMsgError`, `MsgHandleErrInternal`, and the return type of `Channel::tx_abort`, along with the `with_exited_quiescence` builder. For unfunded v2 channels, `tx_abort` may be present without quiescence having been exited, but the holding cell release is a no-op since an unfunded channel won't have any HTLCs. Similarly, the unreachable `debug_assert!(false)` branch in `fail_interactive_tx_negotiation` for funded channels produces a `tx_abort` without exiting quiescence, but the holding cell release is a no-op since the channel is still quiescent. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ldk-claude-review-bot · 2026-04-08T01:11:33Z

lightning/src/ln/channel.rs

 			} else {
 				debug_assert!(false);
 			}


Bug: regression in defensive coding — channel can get stuck in quiescence in release builds.

The old code called clear_quiescent() before the if let, so it always ran regardless of which branch was taken. The new code only calls exit_quiescence() in the success branch. If this debug_assert!(false) branch is somehow reached in a release build (e.g., due to a bug elsewhere), the channel would be stuck in quiescence indefinitely — unable to generate new commitments — until a disconnect/reconnect cycle recovers it.

Consider moving exit_quiescence() after the if/else, or adding it to the else branch:

Suggested change

} else {

debug_assert!(false);

}

} else {

debug_assert!(false);

self.exit_quiescence();

}

Done. BTW, the reason it is moved after rather than left before is we can't take two mutable references to self.

Several code paths exit quiescence by calling `clear_quiescent()` directly without also clearing the disconnect timer via `mark_response_received()`. This causes the timer to fire after the splice completes or is aborted, spuriously disconnecting the peer. Replace `clear_quiescent()` with `exit_quiescence()` in `on_tx_signatures_exchange`, `reset_pending_splice_state`, and `peer_connected_get_handshake`, which clears both the quiescent state and the disconnect timer. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jkczyz changed the title ~~Exit quiescence when splice_init and tx_init_rbf are rejected~~ Exit quiescence when splice_init and tx_init_rbf are rejected Mar 18, 2026

jkczyz force-pushed the 2026-03-splicing-rbf-follow-ups branch from a5d670c to 57aad34 Compare March 18, 2026 18:57

jkczyz added this to Weekly Goals Mar 19, 2026

jkczyz self-assigned this Mar 19, 2026

jkczyz mentioned this pull request Mar 19, 2026

Splicing / Dual Funding #4284

Open

36 tasks

jkczyz force-pushed the 2026-03-splicing-rbf-follow-ups branch 3 times, most recently from 8cdc480 to 959b553 Compare March 31, 2026 00:16

jkczyz force-pushed the 2026-03-splicing-rbf-follow-ups branch from 959b553 to 69a1a1c Compare March 31, 2026 22:28

jkczyz marked this pull request as ready for review April 2, 2026 17:03

ldk-reviews-bot requested a review from wpaulino April 2, 2026 17:04

ldk-claude-review-bot reviewed Apr 2, 2026

View reviewed changes

jkczyz commented Apr 2, 2026

View reviewed changes

jkczyz force-pushed the 2026-03-splicing-rbf-follow-ups branch from d275551 to 3c55d3c Compare April 2, 2026 23:16

wpaulino reviewed Apr 3, 2026

View reviewed changes

jkczyz requested a review from wpaulino April 3, 2026 21:23

jkczyz and others added 4 commits April 3, 2026 16:34

jkczyz force-pushed the 2026-03-splicing-rbf-follow-ups branch from 3c55d3c to e4b1c6b Compare April 3, 2026 21:51

TheBlueMatt reviewed Apr 6, 2026

View reviewed changes

ldk-claude-review-bot reviewed Apr 8, 2026

View reviewed changes

jkczyz force-pushed the 2026-03-splicing-rbf-follow-ups branch from b861c68 to bdc17dc Compare April 8, 2026 14:04

Conversation

jkczyz commented Mar 18, 2026

Summary

Test plan

Uh oh!

ldk-reviews-bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ldk-claude-review-bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkczyz commented Apr 2, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wpaulino Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ldk-reviews-bot commented Apr 6, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ldk-reviews-bot commented Apr 8, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ldk-reviews-bot commented Mar 18, 2026 •

edited

Loading

codecov bot commented Mar 31, 2026 •

edited

Loading

ldk-claude-review-bot commented Apr 2, 2026 •

edited

Loading

wpaulino Apr 3, 2026 •

edited

Loading