Misc updates to tee up async `ChannelMonitorUpdate` persist for claims against closed channels #3413

TheBlueMatt · 2024-11-18T19:55:23Z

#3355 did a lot of the most complex work towards being able to do async ChannelMonitorUpdate persistence for updates writing a preimage for a closed channel, and I'd intended to get the rest of it done in one PR. Sadly, things kept coming up, so there's a laundry list of small-ish changes which need to land first. This PR tees up those small changes (plus one relatively straightforward refactor that touches a lot of lines), with the final PR coming separately.

jkczyz

Probably need to stare more at e9dbd83. Lot going on there, so any tips on reviewing it would be appreciated.

lightning/src/ln/reorg_tests.rs

lightning/src/ln/channelmanager.rs

jkczyz · 2024-11-22T23:44:22Z

lightning/src/ln/channelmanager.rs

+				if remaining_in_flight != 0 {
+					return;
+				}


Can this be pulled out and done unconditionally prior to the channel assignment? We're checking it again later, which would be unnecessary if done earlier? IIUC, we'd only skip the logging below.

Yea, I was trying to retain the log, which I think is pretty important. I cleaned the flow up and added more logging though.

Makes `test_durable_preimages_on_closed_channel` more robust against changes to the order in which transactions are broadcast.

TheBlueMatt · 2024-11-24T15:01:00Z

Probably need to stare more at e9dbd83. Lot going on there, so any tips on reviewing it would be appreciated.

Its somewhat mechanical - basically just taking the ShutdownResult and passing it into the macro with locks held rather than letting it sit until finish_shutdown. The macro changes themselves are basically shifting the monitor update application to the macro, but leaving the post-apply updates for finish_shutdown (unlike the existing monitor update handling macro which does both, dropping the lock in the middle).

TheBlueMatt · 2024-11-24T15:01:07Z

Also rebased.

codecov · 2024-11-24T15:03:28Z

Codecov Report

Attention: Patch coverage is 93.21267% with 15 lines in your changes missing coverage. Please review.

Project coverage is 90.63%. Comparing base (2d6720e) to head (db85fbc).
Report is 32 commits behind head on main.

Files with missing lines	Patch %	Lines
lightning/src/ln/channelmanager.rs	91.36%	7 Missing and 5 partials ⚠️
lightning/src/ln/monitor_tests.rs	95.08%	0 Missing and 3 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3413      +/-   ##
==========================================
+ Coverage   89.24%   90.63%   +1.39%     
==========================================
  Files         130      130              
  Lines      106959   112982    +6023     
  Branches   106959   112982    +6023     
==========================================
+ Hits        95452   102401    +6949     
+ Misses       8718     8175     -543     
+ Partials     2789     2406     -383

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

valentinewallace

Still a few occurrences on rg update_maps_on_chan_removal

lightning/src/ln/channelmanager.rs

lightning/src/chain/channelmonitor.rs

lightning/src/ln/channelmanager.rs

valentinewallace · 2024-11-25T19:04:01Z

lightning/src/ln/channelmanager.rs

+				if !in_flight_updates.contains(&update) {
+					in_flight_updates.push(update.clone());
+				}
+				let event = BackgroundEvent::MonitorUpdateRegeneratedOnStartup {


Should we also check for duplicates before pushing this background event, like how we do for in_flight_updates just above?

The reason we check for duplicates is because we replay the background events by just running them again through the normal process which leads to duplicates. We shouldn't have dups in the background events themselves, I believe.

valentinewallace · 2024-11-25T20:21:42Z

lightning/src/ln/channelmanager.rs

 }

 macro_rules! handle_new_monitor_update {
-	($self: ident, $update_res: expr, $chan: expr, _internal, $completed: expr) => { {


I feel like this macro is losing readability in this PR and the follow-up due to more cases being jammed into it. Thoughts on splitting this into a few different macros, like handle_monitor_update_res_internal for the first block, handle_initial_monitor_res for the INITIAL_MONITOR case, etc?

Hmm, I mean sure I can move the _internal case out into a different macro, but is that all that more readable? All the macro variants have an explicit string in them so its easy to see which one is being called. Its less clear that a freestanding macro is "inner" to the handle_new_monitor_update macro vs the file or whatever.

lightning/src/ln/channelmanager.rs

valentinewallace · 2024-11-25T20:46:03Z

lightning/src/ln/channelmanager.rs

+						mem::drop(peer_state);
+						mem::drop(per_peer_state);
+
+						self.handle_monitor_update_completion_actions(update_actions);


Missing test coverage for !update_actions.is_empty() case.

I think this is actually currently unreachable :/. In order to hit it we have to have a channel-closing-update which has attached post-update actions, which they never do. Otherwise, post-update actions always complete with their update (either immediately or when the update completion comes in).

valentinewallace · 2024-11-25T20:52:02Z

lightning/src/ln/channelmanager.rs

+				if let Some(peer_state_mtx) = per_peer_state.get(&shutdown_res.counterparty_node_id) {
+					let mut peer_state = peer_state_mtx.lock().unwrap();
+					if peer_state.in_flight_monitor_updates.get(&funding_txo).map(|l| l.is_empty()).unwrap_or(true) {


I wonder if it would make sense to put this handling in locked_close_channel so we don't need to acquire the lock again here? Probably doesn't matter too much though.

Yea, we probably should. I was trying to avoid changing the ShutdownRes type in this PR, if its alright I'm gonna leave it for a followup with a TODO here.

valentinewallace · 2024-11-25T21:02:32Z

lightning/src/ln/channelmanager.rs

+				log_trace!(logger, "Channel is open and awaiting update, resuming it");
+				handle_monitor_update_completion!(self, peer_state_lock, peer_state, per_peer_state, chan);
+			} else {
+				log_trace!(logger, "Channel is open but not awaiting update");


Is this reachable? Seems like it might not be since we already checked that there are no-inflight updates. Missing test coverage if so

In practice I believe its unreachable, but technically its reachable from the public API - someone can return repeated MonitorEvent::Completeds.

lightning/src/ln/channelmanager.rs

TheBlueMatt · 2024-11-30T17:30:39Z

Okay I believe I've addressed all feedback here.

valentinewallace

I'm happy with this after Jeff takes another look

lightning/src/ln/channelmanager.rs

valentinewallace · 2024-12-04T18:14:45Z

lightning/src/ln/channelmanager.rs

+				let in_flight_updates = $peer_state.in_flight_monitor_updates.entry(funding_txo)
+					.or_insert_with(Vec::new);
+				if !in_flight_updates.contains(&update) {
+					in_flight_updates.push(update.clone());
+				}


Thinking more on this race condition to make sure I'm following. Is one possible consequence of running the post-completion actions early that we may free an upstream channel before a preimage is persisted on the downstream channel? If there's a money-losing condition like that, it does seem worth testing, currently commenting out pushing the in_flight_update lines passes tests. Not gonna hold up the PR on it though.

In theory, yes, but this specific race condition on this line is entirely theoretical - to trigger this race we'd have to have s post-completion action tied to the close event, which we never do. Sadly that makes it untestable.

jkczyz · 2024-12-05T17:00:36Z

Its somewhat mechanical - basically just taking the ShutdownResult and passing it into the macro with locks held rather than letting it sit until finish_shutdown. The macro changes themselves are basically shifting the monitor update application to the macro, but leaving the post-apply updates for finish_shutdown (unlike the existing monitor update handling macro which does both, dropping the lock in the middle).

Thanks, I went through this all again. LGTM. Feel free to squash.

TheBlueMatt · 2024-12-05T17:10:25Z

Squashed without further updates.

TheBlueMatt · 2024-12-05T21:11:55Z

Oops, sorry, fixed a no-std issue and an intermediary-commit issue:

$ git diff-tree -U1 8cae62161 db85fbc8f
diff --git a/lightning/src/sync/nostd_sync.rs b/lightning/src/sync/nostd_sync.rs
index 56fb5e954..19faa1b5e 100644
--- a/lightning/src/sync/nostd_sync.rs
+++ b/lightning/src/sync/nostd_sync.rs
@@ -12,3 +12,3 @@ pub struct Mutex<T: ?Sized> {
 #[cfg(test)]
-unsafe impl<T: ?Sized> RefUnwindSafe for Mutex<T> {}
+impl<T: ?Sized> core::panic::RefUnwindSafe for Mutex<T> {}

valentinewallace · 2024-12-05T21:51:39Z

CI is still sad

TheBlueMatt · 2024-12-05T21:52:45Z

Ugh, sorry, fixed.

When deciding if we should remove a `PeerState` entry we want to ensure we don't remove if there are pending updates in `in_flight_monitor_updates`. Previously this was done with a simple `in_flight_monitor_updates.is_empty()`, however this can prevent removal of `PeerState` entries if a channel had an update at some point (leaving an entry in the map) but the update was ultimately completed. Instead, we need to iterate over the entries in `in_flight_monitor_updates` and decline to remove `PeerState`s only if there is an entry for a pending update still in-flight.

On startup, if we have a channel which was closed immediately before shutdown such that the `ChannelMonitorUpdate` marking the channel as closed is still in-flight, it doesn't make sense to generate a fresh `ChannelMonitorUpdate` marking the channel as closed immediately after the existing in-flight one. Here we detect this case and drop the extra update, though its not all that harmful it does avoid some test changes in the coming commits.

During block connection, we cannot apply `ChannelMonitorUpdate`s if we're running during the startup sequence (i.e. before the user has called any methods outside of block connection). We previously handled this by simply always pushing any `ChannelMonitorUpdate`s generated during block connection into the `pending_background_events` queue. However, this results in `ChannelMonitorUpdate`s going through the queue when we could just push them immediately. Here we explicitly check `background_events_processed_since_startup` and use that to decide whether to push updates through the background queue instead.

In the coming commits we'll start handling `ChannelMonitorUpdate`s during channel closure in-line rather than after dropping locks via `finish_close_channel`. In order to make that easy, here we add a new `REMAIN_LOCKED_UPDATE_ACTIONS_PROCESSED_LATER` variant to `handle_new_monitor_update!` which can attempt to apply an update without dropping the locks and processing `MonitorUpdateCompletionAction`s immediately.

Closing channels requires a two step process - first `update_maps_on_chan_removal` is called while holding the same per-peer lock under which the channel reached the terminal state, then after dropping the same lock(s), `finish_close_channel` is called. Because the channel is closed and thus no further `ChannelMonitorUpdate`s are generated for the off-chain state, we'd previously applied the `ChannelMonitorUpdate` in `finish_close_channel`. This was tweaked somewhat in c99d3d7 when we stopped using `u64::MAX` for any updates after closure. However, we worked around the races that implied by setting the `update_id` only when we go to apply the `ChannelMonitorUpdate`, rather than when we create it. In a coming commit, we'll need to have an `update_id` immediately upon creation (to track in-flight updates that haven't reached application yet). This implies that we can no longer apply closure `ChannelMonitorUpdate`s after dropping the per-peer lock(s), as the updates must be well-ordered with any later updates to the same channel, even after it has been closed. Thus, here, we add `ChannelMonitorUpdate` handling to `update_maps_on_chan_removal`, renaming it `locked_close_channel` to better capture its new purpose.

c99d3d7 updated `ChannelMonitorUpdate::update_id` to continue counting up even after the channel is closed. It, however, accidentally updated the `ChannelMonitorUpdate` application logic to skip testing that `ChannelMonitorUpdate`s are well-ordered after the channel has been closed (in an attempt to ensure other checks in the same conditional block were applied). This fixes that oversight.

When we handle a `ChannelMonitorUpdate` completion we always complete everything that was waiting on any updates to the same channel all at once. Thus, we need to skip all updates if there's pending updates besides the one that was just completed. We handled this correctly for open channels, but the shortcut for closed channels ignored any other pending updates entirely. Here we fix this, which is ultimately required for tests which are added in a few commits to pass.

TheBlueMatt · 2024-12-06T01:04:42Z

Grrrr

$ git diff-tree -U1 db85fbc8f b50354dc4
diff --git a/lightning/src/ln/channelmanager.rs b/lightning/src/ln/channelmanager.rs
index c33e25462..9ca99056e 100644
--- a/lightning/src/ln/channelmanager.rs
+++ b/lightning/src/ln/channelmanager.rs
@@ -9646,3 +9646,3 @@ where
 				};
-				if let Some(shutdown_result) = shutdown_result {
+				if let Some(mut shutdown_result) = shutdown_result {
 					let context = &chan.context();

TheBlueMatt added this to the 0.1 milestone Nov 18, 2024

TheBlueMatt mentioned this pull request Nov 18, 2024

Support async ChannelMonitorUpdate persist for claims against closed channels #3414

Merged

jkczyz reviewed Nov 22, 2024

View reviewed changes

Make test_durable_preimages_on_closed_channel more robust

aba57bb

Makes `test_durable_preimages_on_closed_channel` more robust against changes to the order in which transactions are broadcast.

TheBlueMatt force-pushed the 2024-11-async-persist-claiming-from-closed-chan-1 branch from 7ec1631 to 0578332 Compare November 24, 2024 14:55

TheBlueMatt force-pushed the 2024-11-async-persist-claiming-from-closed-chan-1 branch from 0578332 to 55b712e Compare November 25, 2024 20:01

valentinewallace reviewed Nov 25, 2024

View reviewed changes

TheBlueMatt force-pushed the 2024-11-async-persist-claiming-from-closed-chan-1 branch 4 times, most recently from ad59652 to e75fbd8 Compare November 30, 2024 17:27

TheBlueMatt mentioned this pull request Dec 2, 2024

Clean channelmonitor.rs code #3429

Merged

valentinewallace reviewed Dec 2, 2024

View reviewed changes

lightning/src/ln/channelmanager.rs Show resolved Hide resolved

lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved

valentinewallace reviewed Dec 4, 2024

View reviewed changes

TheBlueMatt force-pushed the 2024-11-async-persist-claiming-from-closed-chan-1 branch 2 times, most recently from 3add30b to 068d2ce Compare December 5, 2024 01:27

TheBlueMatt force-pushed the 2024-11-async-persist-claiming-from-closed-chan-1 branch from 068d2ce to 8cae621 Compare December 5, 2024 17:10

jkczyz previously approved these changes Dec 5, 2024

View reviewed changes

valentinewallace previously approved these changes Dec 5, 2024

View reviewed changes

TheBlueMatt dismissed stale reviews from valentinewallace and jkczyz via 2a38a70 December 5, 2024 21:11

TheBlueMatt force-pushed the 2024-11-async-persist-claiming-from-closed-chan-1 branch from 8cae621 to 2a38a70 Compare December 5, 2024 21:11

TheBlueMatt force-pushed the 2024-11-async-persist-claiming-from-closed-chan-1 branch from 2a38a70 to db85fbc Compare December 5, 2024 21:52

jkczyz previously approved these changes Dec 5, 2024

View reviewed changes

valentinewallace previously approved these changes Dec 5, 2024

View reviewed changes

TheBlueMatt added 7 commits December 6, 2024 01:02

TheBlueMatt dismissed stale reviews from valentinewallace and jkczyz via b50354d December 6, 2024 01:04

TheBlueMatt force-pushed the 2024-11-async-persist-claiming-from-closed-chan-1 branch from db85fbc to b50354d Compare December 6, 2024 01:04

jkczyz approved these changes Dec 6, 2024

View reviewed changes

valentinewallace approved these changes Dec 6, 2024

View reviewed changes

TheBlueMatt merged commit b96b19a into lightningdevkit:main Dec 6, 2024
18 of 19 checks passed

Misc updates to tee up async ChannelMonitorUpdate persist for claims against closed channels #3413

Misc updates to tee up async ChannelMonitorUpdate persist for claims against closed channels #3413

Uh oh!

Conversation

TheBlueMatt commented Nov 18, 2024

Uh oh!

jkczyz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt commented Nov 24, 2024

Uh oh!

TheBlueMatt commented Nov 24, 2024

Uh oh!

codecov bot commented Nov 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

valentinewallace left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt Nov 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

TheBlueMatt commented Nov 30, 2024

Uh oh!

valentinewallace left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkczyz commented Dec 5, 2024

Uh oh!

TheBlueMatt commented Dec 5, 2024

Uh oh!

TheBlueMatt commented Dec 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

valentinewallace commented Dec 5, 2024

Uh oh!

TheBlueMatt commented Dec 5, 2024

Uh oh!

TheBlueMatt commented Dec 6, 2024

Uh oh!

Uh oh!

Reviewers

Misc updates to tee up async `ChannelMonitorUpdate` persist for claims against closed channels #3413

Misc updates to tee up async `ChannelMonitorUpdate` persist for claims against closed channels #3413

codecov bot commented Nov 24, 2024 •

edited

Loading

TheBlueMatt Nov 30, 2024 •

edited

Loading

TheBlueMatt commented Dec 5, 2024 •

edited

Loading