1507 followups #1966

ViktorTigerstrom · 2023-01-18T01:39:04Z

Closes #1945.

This PR addresses the remaining feedback raised in #1507 by @TheBlueMatt and @valentinewallace. In general this PR refactors a many aspects of #1507 which could have been done in a more suitable way, as well as fixes a memory leak.

Let me know if there are more aspects of #1507 that needs fixing :).

ViktorTigerstrom · 2023-01-18T01:40:57Z

lightning/src/ln/channelmanager.rs

+	/// limit the negative effects on parallelism as much as possible.
+	///
+	/// See `ChannelManager` struct-level documentation for lock order requirements.
+	pending_peers_awaiting_removal: Mutex<HashSet<PublicKey>>,


Let me know if we're better of just taking the write lock of the per_peer_state after force closing a channel and we're already disconnected with no channels left :).

I don't think we need to do either - you call self.remove_peers_awaiting_removal in timer_tick_occurred immediately after we iterate over the peer_peer_state set - instead of storing the set of nodes to remove explicitly in the ChannelManager, we can just calculate it on the fly in that same iteration. Then we can do the removal with a calculated set.

ViktorTigerstrom · 2023-01-18T01:44:21Z

lightning/src/ln/channelmanager.rs


-		nodes[0].node.handle_accept_channel(&unkown_public_key, &accept_channel_msg);


Ideally I'd like to keep these msgs handling function calls with an unknown peer. Looking into if there's a good way of expecting a panic with a given message here, without having to do a separate test for every msg handling function call.

Oh, hmm, actually yea those panics are going to be annoying. Let's wrap them in in #[cfg(all(not(test), not(fuzzing)))], as the fuzzer will also get mad about it.

ViktorTigerstrom · 2023-01-18T19:27:08Z

Ah snap, will fix the CI check failures.

ViktorTigerstrom · 2023-01-20T00:24:31Z

Rebased on main to fix merge conflict + CI error fix.

TheBlueMatt

Basically LGTM, one real comment but feel free to squash.

TheBlueMatt · 2023-01-24T00:50:44Z

lightning/src/ln/channelmanager.rs

@@ -2233,7 +2228,8 @@ where
 	/// public, and thus should be called whenever the result is going to be passed out in a
 	/// [`MessageSendEvent::BroadcastChannelUpdate`] event.
 	///
-	/// May be called with peer_state already locked!
+	/// Note that this function may be called without the `peer_state` corresponding to the passed


I'm not sure how that's possible if we have a reference to the Channel.

In internal_closing_signed this is called after the channel has been removed out of the peer_state and the lock has been drop, ie. call site has full ownership of the channel.
Though I agree that this comment is probably just confusing and not very helpful, so I'll just drop it all together 😁!

Ah, okay, I'd be happy to see the comment remain, too, just with an explanation of what it means :)

TheBlueMatt · 2023-01-24T00:54:46Z

lightning/src/ln/channelmanager.rs

+	/// limit the negative effects on parallelism as much as possible.
+	///
+	/// See `ChannelManager` struct-level documentation for lock order requirements.
+	pending_peers_awaiting_removal: Mutex<HashSet<PublicKey>>,


I don't think we need to do either - you call self.remove_peers_awaiting_removal in timer_tick_occurred immediately after we iterate over the peer_peer_state set - instead of storing the set of nodes to remove explicitly in the ChannelManager, we can just calculate it on the fly in that same iteration. Then we can do the removal with a calculated set.

TheBlueMatt · 2023-01-24T00:59:12Z

lightning/src/ln/channelmanager.rs


-		nodes[0].node.handle_accept_channel(&unkown_public_key, &accept_channel_msg);


Oh, hmm, actually yea those panics are going to be annoying. Let's wrap them in in #[cfg(all(not(test), not(fuzzing)))], as the fuzzer will also get mad about it.

valentinewallace · 2023-01-24T18:25:56Z

Looks good after Matt's feedback!

TheBlueMatt · 2023-01-31T00:56:35Z

Any update here @ViktorTigerstrom

ViktorTigerstrom · 2023-01-31T11:50:49Z

Thanks for the reviews @TheBlueMatt & @valentinewallace! I've had a really hectic week, but will address the review as soon as I can later tonight or tomorrow. Sorry for the delay.

TheBlueMatt · 2023-01-31T17:58:52Z

No huge rush, just making sure it didn't slip off your radar :)

codecov-commenter · 2023-02-02T21:58:43Z

Codecov Report

Base: 90.89% // Head: 87.24% // Decreases project coverage by -3.65% ⚠️

Coverage data is based on head (2f21d56) compared to base (760ab65).
Patch coverage: 92.85% of modified lines in pull request are covered.

❗ Current head 2f21d56 differs from pull request most recent head 7f6f90d. Consider uploading reports for the commit 7f6f90d to get more accurate results

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1966      +/-   ##
==========================================
- Coverage   90.89%   87.24%   -3.65%     
==========================================
  Files          99      100       +1     
  Lines       53026    44110    -8916     
  Branches    53026    44110    -8916     
==========================================
- Hits        48197    38484    -9713     
- Misses       4829     5626     +797

Impacted Files	Coverage Δ
lightning/src/ln/channelmanager.rs	`85.85% <91.78%> (-1.41%)`	⬇️
lightning/src/ln/chanmon_update_fail_tests.rs	`97.31% <100.00%> (-0.29%)`	⬇️
lightning/src/ln/functional_test_utils.rs	`88.53% <100.00%> (-2.95%)`	⬇️
lightning/src/ln/functional_tests.rs	`97.20% <100.00%> (+0.13%)`	⬆️
lightning/src/ln/payment_tests.rs	`96.51% <100.00%> (-0.74%)`	⬇️
lightning/src/ln/reload_tests.rs	`93.35% <100.00%> (-2.22%)`	⬇️
lightning/src/ln/reorg_tests.rs	`96.35% <100.00%> (-3.65%)`	⬇️
lightning/src/chain/chaininterface.rs	`41.17% <0.00%> (-54.48%)`	⬇️
lightning/src/offers/payer.rs	`50.00% <0.00%> (-50.00%)`	⬇️
lightning/src/util/errors.rs	`32.14% <0.00%> (-49.34%)`	⬇️
... and 98 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

ViktorTigerstrom · 2023-02-02T22:01:26Z

Updated the PR with the latest feedback and pushed squashed the previous fixups.

A general question about removing the disconnected peers with no channels through timer_tick_occurred though.
If we have a connected peer with no channels, and we shutdown the node and then restart it:
If timer_tick_occurred is called on startup before we connect to the node again, timer_tick_occurred will drop the node as we initialize the peer_state as disconnected in ChannelManager::read. Is that an issue?

TheBlueMatt

A general question about removing the disconnected peers with no channels through timer_tick_occurred though.
If we have a connected peer with no channels, and we shutdown the node and then restart it:
If timer_tick_occurred is called on startup before we connect to the node again, timer_tick_occurred will drop the node as we initialize the peer_state as disconnected in ChannelManager::read. Is that an issue?

No, there's no reason to keep those peers. In fact, there's no reason to write those peers out to the manager to begin with - the manager should be written as if all peers had disconnected immediately prior to the write. That doesn't have to happen now, but we should consider doing it. If you do do it here, let's go ahead and make the "does this peer merit disconnect" criteria a utility function, as it will get further changes in #1897.

lightning/src/ln/channelmanager.rs

TheBlueMatt · 2023-02-02T23:38:41Z

Needs rebase, sadly.

ViktorTigerstrom · 2023-02-04T18:55:03Z

Addressed feedback, rebased on main and cherry-picked df8c1cf

lightning/src/ln/channelmanager.rs

TheBlueMatt

Feel free to squash as you go, I think.

TheBlueMatt · 2023-02-06T18:45:11Z

lightning/src/ln/channelmanager.rs

+			)
+		).unwrap_or(None);
+
+		if let Some(hash_map::Entry::Occupied(mut chan)) = peer_state_opt.as_mut().map(|peer_state| peer_state.channel_by_id.entry(chan_id))


nit: this doesn't need enter, get should suffice.

I opted to not change this to get and keep it as an entry as we're calling chan.remove_entry(); when drop is set to true further down below. If it's preferred though, I'll try refactoring this function a bit to use get instead. Let me know if you'd like me to do so :)!

TheBlueMatt · 2023-02-06T18:51:28Z

lightning/src/ln/channelmanager.rs

@@ -5683,7 +5768,8 @@ where
 	/// Note that that while `MessageSendEvent`s are strictly ordered per-peer, the peer order for
 	/// the chunks of `MessageSendEvent`s for different peers is random. I.e. if the array contains
 	/// `MessageSendEvent`s  for both `node_a` and `node_b`, the `MessageSendEvent`s for `node_a`
-	/// will randomly be placed first or last in the returned array.
+	/// will randomly be placed first or last in the returned array. Note that order will not be
+	/// randomized with the `no-std` feature enabled.


I mean this is...kinda true? We end up using ahash's compile-time randomness, I think, which is still random, just not randomized across calls? I'm not sure we need to bother noting this.

Ok I see your point. Dropped the comment!
The reason I included this in the first place though, is that I'm a bit worried that people will write tests that validates MessageSendEvent in specific orders without using the remove_first_msg_event_to_node function, and then be really confused why the tests passes with the no-std feature set, but then randomly fail with the std feature set. Maybe this isn't really the correct place to address that worry though.

lightning/src/ln/channelmanager.rs

TheBlueMatt · 2023-02-10T17:06:54Z

Please squash the fixup commit.

lightning/src/ln/channelmanager.rs

arik-so · 2023-02-10T17:39:37Z

lightning/src/ln/channelmanager.rs

+		    .ok_or_else(|| {
+				#[cfg(all(not(test), not(fuzzing)))]
+				{
+					debug_assert!(false);


why are we introducing this debug_assert?

Its implicitly testing the PeerManager correctly serializes connect/message-handle/disconnect events.

lightning/src/ln/channelmanager.rs

TheBlueMatt · 2023-02-10T18:09:29Z

lightning/src/ln/channelmanager.rs

-			let mut peer_state_lock = peer_state_mutex_opt.unwrap().lock().unwrap();
+			let peer_state_mutex = per_peer_state.get(counterparty_node_id)
+				.ok_or_else(|| {
+					#[cfg(all(not(test), not(fuzzing)))]


Honestly though let's remove the cfg flag on it and just unconditionally debug_assertfalse. We can just remove the test_api_calls_with_unkown_counterparty_node test.

Removed with the latest push! I kept the API functions tests in the test_api_calls_with_unkown_counterparty_node, but let me know if you'd prefer me to drop the entire test :)!

lightning/src/ln/channelmanager.rs

arik-so · 2023-02-10T18:35:34Z

lightning/src/ln/chanmon_update_fail_tests.rs

@@ -926,7 +926,8 @@ fn do_test_monitor_update_fail_raa(test_ignore_second_cs: bool) {
 	expect_pending_htlcs_forwardable_and_htlc_handling_failed!(nodes[1], vec![HTLCDestination::NextHopChannel { node_id: Some(nodes[2].node.get_our_node_id()), channel_id: chan_2.2 }]);
 	check_added_monitors!(nodes[1], 1);

-	let mut events_3 = nodes[1].node.get_and_clear_pending_msg_events();
+	let mut events_3_vec = nodes[1].node.get_and_clear_pending_msg_events();
+	let events_3 = &mut events_3_vec;


rather than making a mutable reference here and using two variables, could you explicitly insert the &mut prefix in the individual call sites?

Unfortunately our rust 1.41.1 and 1.45.2 considers that as also taking a mutable reference borrow of the node itself and not only the list returned by get_and_clear_pending_msg_events and therefore fails the build. I found that super confusing as well 😅. That was the CI failure that I got earlier in this PR. #1966 (comment)

hm, that is quite curious. But if that's how it is, I guess it'll have to be that way. In which case, could you add a comment and use slightly more descriptive variable names? Perhaps the first can be events, and the second one mutable_events_reference or something like that?

I'm confused, the following patch builds just fine for me on 1.41.1:

diff --git a/lightning/src/ln/chanmon_update_fail_tests.rs b/lightning/src/ln/chanmon_update_fail_tests.rs index 1f7643fd6..e1ea256a6 100644 --- a/lightning/src/ln/chanmon_update_fail_tests.rs +++ b/lightning/src/ln/chanmon_update_fail_tests.rs @@ -926,8 +926,7 @@ fn do_test_monitor_update_fail_raa(test_ignore_second_cs: bool) { expect_pending_htlcs_forwardable_and_htlc_handling_failed!(nodes[1], vec![HTLCDestination::NextHopChannel { node_id: Some(nodes[2].node.get_our_node_id()), channel_id: chan_2.2 }]); check_added_monitors!(nodes[1], 1); - let mut events_3_vec = nodes[1].node.get_and_clear_pending_msg_events(); - let events_3 = &mut events_3_vec; + let mut events_3 = nodes[1].node.get_and_clear_pending_msg_events(); if test_ignore_second_cs { assert_eq!(events_3.len(), 3); } else { @@ -936,7 +935,7 @@ fn do_test_monitor_update_fail_raa(test_ignore_second_cs: bool) { // Note that the ordering of the events for different nodes is non-prescriptive, though the // ordering of the two events that both go to nodes[2] have to stay in the same order. - let nodes_0_event = remove_first_msg_event_to_node(&nodes[0].node.get_our_node_id(), events_3); + let nodes_0_event = remove_first_msg_event_to_node(&nodes[0].node.get_our_node_id(), &mut events_3); let messages_a = match nodes_0_event { MessageSendEvent::UpdateHTLCs { node_id, mut updates } => { assert_eq!(node_id, nodes[0].node.get_our_node_id()); @@ -950,12 +949,12 @@ fn do_test_monitor_update_fail_raa(test_ignore_second_cs: bool) { _ => panic!("Unexpected event type!"), }; - let nodes_2_event = remove_first_msg_event_to_node(&nodes[2].node.get_our_node_id(), events_3); + let nodes_2_event = remove_first_msg_event_to_node(&nodes[2].node.get_our_node_id(), &mut events_3); let send_event_b = SendEvent::from_event(nodes_2_event); assert_eq!(send_event_b.node_id, nodes[2].node.get_our_node_id()); let raa = if test_ignore_second_cs { - let nodes_2_event = remove_first_msg_event_to_node(&nodes[2].node.get_our_node_id(), events_3); + let nodes_2_event = remove_first_msg_event_to_node(&nodes[2].node.get_our_node_id(), &mut events_3); match nodes_2_event { MessageSendEvent::SendRevokeAndACK { node_id, msg } => { assert_eq!(node_id, nodes[2].node.get_our_node_id());

Ah, that should work thanks! I was doing it the other way around which caused the issue for 1.41.1 and 1.45.2, but which worked for later builds, ie:

let events_3 = &mut nodes[1].node.get_and_clear_pending_msg_events(); let nodes_0_event = remove_first_msg_event_to_node(&nodes[0].node.get_our_node_id(), events_3);

My bad, sorry! Updated this with the latest push :)

ViktorTigerstrom · 2023-02-13T09:53:32Z

Thanks a lot for the review @ariard & @arik-so :)!
Addressed the latest feedback and squashed all fixups with the latest commit!

ViktorTigerstrom · 2023-02-14T00:05:48Z

Addressed the feedback about mutable reference borrows :)! #1966 (comment)

arik-so · 2023-02-14T06:25:16Z

Nit: in the commit message, it should be spelled existence and not existance, but ¯\(ツ)/¯

arik-so

lgtm, thanks very much for addressing the comments!

Avoid doing the same hashmap lookups twice in a row, when it's not needed. Refactor `claim_funds_from_hop` in order to enable this change.

Updates multiple instances of the `ChannelManager` docs related to the previous change that moved the storage of the channels to the `per_peer_state`. This docs update corrects some grammar errors and incorrect information, as well as clarifies documentation that was confusing.

When a peer disconnects but still has channels, the peer's `peer_state` entry in the `per_peer_state` is not removed by the `peer_disconnected` function. If the channels with that peer is later closed while still being disconnected (i.e. force closed), we therefore need to remove the peer from `peer_state` separately. To remove the peers separately, we push such peers to a separate HashSet that holds peers awaiting removal, and remove the peers on a timer to limit the negative effects on parallelism as much as possible.

ViktorTigerstrom · 2023-02-14T14:06:37Z

Nit: in the commit message, it should be spelled existence and not existance, but ¯_(ツ)_/¯

Thanks! Corrected some spelling/grammar mistakes in the commit messages.

ViktorTigerstrom commented Jan 18, 2023

View reviewed changes

ViktorTigerstrom mentioned this pull request Jan 18, 2023

Store channels per-peer #1507

Merged

TheBlueMatt added this to the 0.0.114 milestone Jan 19, 2023

ViktorTigerstrom force-pushed the 2023-01-store-channels-per-peer-followups branch from fa84f75 to 834ca24 Compare January 20, 2023 00:24

TheBlueMatt added the Seeking Code Review label Jan 23, 2023

TheBlueMatt assigned TheBlueMatt and valentinewallace Jan 23, 2023

TheBlueMatt removed the Seeking Code Review label Jan 23, 2023

TheBlueMatt reviewed Jan 24, 2023

View reviewed changes

TheBlueMatt mentioned this pull request Feb 1, 2023

Always process ChannelMonitorUpdates asynchronously #1897

Merged

ViktorTigerstrom force-pushed the 2023-01-store-channels-per-peer-followups branch from 834ca24 to 07078a5 Compare February 2, 2023 21:56

TheBlueMatt reviewed Feb 2, 2023

View reviewed changes

lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved

lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved

lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved

ViktorTigerstrom force-pushed the 2023-01-store-channels-per-peer-followups branch from 07078a5 to 05eca8a Compare February 4, 2023 18:47

TheBlueMatt and others added 2 commits February 4, 2023 20:49

Track if a peer is connected or not in ChannelManager

9033b43

Don't pass per_peer_state to claim_funds_from_hop

1a5482a

ViktorTigerstrom force-pushed the 2023-01-store-channels-per-peer-followups branch from 05eca8a to 7cd5798 Compare February 4, 2023 18:53

ViktorTigerstrom commented Feb 5, 2023

View reviewed changes

lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved

TheBlueMatt mentioned this pull request Feb 6, 2023

Limit the number of pending un-funded inbound channel #1988

Merged

TheBlueMatt reviewed Feb 6, 2023

View reviewed changes

ViktorTigerstrom force-pushed the 2023-01-store-channels-per-peer-followups branch from 7cd5798 to 3067120 Compare February 8, 2023 00:52

arik-so reviewed Feb 10, 2023

View reviewed changes

lightning/src/ln/channelmanager.rs Show resolved Hide resolved

arik-so reviewed Feb 10, 2023

View reviewed changes

lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved

arik-so reviewed Feb 10, 2023

View reviewed changes

lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved

arik-so reviewed Feb 10, 2023

View reviewed changes

lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved

arik-so reviewed Feb 10, 2023

View reviewed changes

lightning/src/ln/channelmanager.rs Show resolved Hide resolved

TheBlueMatt reviewed Feb 10, 2023

View reviewed changes

arik-so reviewed Feb 10, 2023

View reviewed changes

ViktorTigerstrom force-pushed the 2023-01-store-channels-per-peer-followups branch from 3067120 to e1f1c45 Compare February 13, 2023 09:42

ViktorTigerstrom force-pushed the 2023-01-store-channels-per-peer-followups branch from e1f1c45 to 2f21d56 Compare February 14, 2023 00:01

arik-so previously approved these changes Feb 14, 2023

View reviewed changes

ViktorTigerstrom added 9 commits February 14, 2023 15:03

Remove redundant hashmap lookups

b852df5

Avoid doing the same hashmap lookups twice in a row, when it's not needed. Refactor `claim_funds_from_hop` in order to enable this change.

Don't clone the Vec in remove_first_msg_event_to_node

a0adc51

Remove redundant Vec in get_and_clear_pending_msg_events

29ed286

Check peer_state existence more idiomatically

b45d69e

Update unknown peer passed to msg and api functions handling

4fc15f0

Initialize list_channels_with_filter result vec with capacity

9ae2810

Don't serialize nodes which we have no channels to

7f6f90d

ViktorTigerstrom dismissed arik-so’s stale review via 7f6f90d February 14, 2023 14:05

ViktorTigerstrom force-pushed the 2023-01-store-channels-per-peer-followups branch from 2f21d56 to 7f6f90d Compare February 14, 2023 14:05

TheBlueMatt approved these changes Feb 14, 2023

View reviewed changes

arik-so approved these changes Feb 14, 2023

View reviewed changes

TheBlueMatt merged commit 33720b0 into lightningdevkit:main Feb 14, 2023


		nodes[0].node.handle_accept_channel(&unkown_public_key, &accept_channel_msg);

1507 followups #1966

1507 followups #1966

Uh oh!

Conversation

ViktorTigerstrom commented Jan 18, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ViktorTigerstrom commented Jan 18, 2023

Uh oh!

ViktorTigerstrom commented Jan 20, 2023

Uh oh!

TheBlueMatt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

valentinewallace commented Jan 24, 2023

Uh oh!

TheBlueMatt commented Jan 31, 2023

Uh oh!

ViktorTigerstrom commented Jan 31, 2023

Uh oh!

TheBlueMatt commented Jan 31, 2023

Uh oh!

codecov-commenter commented Feb 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ViktorTigerstrom commented Feb 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TheBlueMatt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TheBlueMatt commented Feb 2, 2023

Uh oh!

ViktorTigerstrom commented Feb 4, 2023

Uh oh!

Uh oh!

TheBlueMatt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ViktorTigerstrom Feb 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

TheBlueMatt commented Feb 10, 2023

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Feb 2, 2023 •

edited

Loading

ViktorTigerstrom commented Feb 2, 2023 •

edited

Loading

ViktorTigerstrom Feb 8, 2023 •

edited

Loading

ViktorTigerstrom Feb 14, 2023 •

edited

Loading

ViktorTigerstrom commented Feb 14, 2023 •

edited

Loading

arik-so commented Feb 14, 2023 •

edited

Loading