connectd: fix accidental handling of old reconnections. #5256

rustyrussell · 2022-05-13T13:10:33Z

We had multiple reports of channels being unilaterally closed because
it seemed like the peer was sending old revocation numbers.

Turns out, it was actually old reestablish messages! When we have a
reconnection, we would put the new connection aside, and tell lightningd
to close the current connection: when it did, we would restart
processing of the initial reconnection.

However, we could end up with multiple "reconnecting" connections,
while waiting for an existing connection to close. Though the
connections were long gone, there could still be messages queued
(particularly the channel_reestablish message, which comes early on).

Eventually, a normal reconnection would cause us to process one of
these reconnecting connections, and channeld would see the (perhaps
very old!) messages, and get confused.

(I have a test which triggers this, but it also hangs the connect
command, due to other issues we will fix in the next release...)

Fixes: #5240
Fixes: #5235

niftynei · 2022-05-13T17:01:57Z

ACK 3fab422

Reviewed w/ @endothermicdev

vincenzopalazzo

ACK 3fab422

This should fix also #5235 just to free all the issue :)

ZmnSCPxj · 2022-05-13T23:15:40Z

Looks OK to merge, just a style nit: the checks in our repo look for Changelog-whatever tags in commit messages, but this specific PR puts the changelog entries directly into the CHANGELOG.md file. Or are we planning a quick bugfix release that just contains this bugfix?

ZmnSCPxj · 2022-05-13T23:20:28Z

connectd/connectd.c

+	pr = peer_reconnected_htable_get(&daemon->reconnected, id);
+	if (pr) {
+		status_broken("Reconnected AGAIN");
+		peer_reconnected_htable_del(&daemon->reconnected, pr);


If the previous reconnection already had the destructor installed, would it not delete this twice? Or delete from htable is idempotent?

We had multiple reports of channels being unilaterally closed because it seemed like the peer was sending old revocation numbers. Turns out, it was actually old reestablish messages! When we have a reconnection, we would put the new connection aside, and tell lightningd to close the current connection: when it did, we would restart processing of the initial reconnection. However, we could end up with *multiple* "reconnecting" connections, while waiting for an existing connection to close. Though the connections were long gone, there could still be messages queued (particularly the channel_reestablish message, which comes early on). Eventually, a normal reconnection would cause us to process one of these reconnecting connections, and channeld would see the (perhaps very old!) messages, and get confused. (I have a test which triggers this, but it also hangs the connect command, due to other issues we will fix in the next release...) Fixes: ElementsProject#5240 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

I have a test which reproduces this, too, and it's been seen in the wild. It seems we can add a subd as we're closing, which causes this assert to trigger. Fixes: ElementsProject#5254 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

rustyrussell · 2022-05-14T04:59:26Z

Removed accidentally left BROKEN log, and added another assert removal to fix another report.

vincenzopalazzo

ACK b817e0a

frustating I know, but I think this will fix also #5254

cdecker · 2022-05-15T09:55:34Z

ACK b817e0a

svewa · 2022-05-15T10:03:45Z

bad reestablish revocation_number: 0 vs 3 in the changelog is maybe a bad example, as a revocation number of 0 usually indicates use of SCB?

vincenzopalazzo · 2022-05-15T10:06:41Z

CHANGELOG.md

+### Fixed
+
+ - connectd: make sure we don't keep stale reconnections around. ([#5256])
+ - connectd: fix assert which we could trigger. ([#5256])


Suggested change

- connectd: fix assert which we could trigger. ([#5256])

- connectd: fix assert which we could trigger. ([#5254])

This should be #5254 ?

No, these refer to the PR, not the bug itself...

Ah damnt! Sorry just assuming wrong stuff

rustyrussell added the bug label May 13, 2022

vincenzopalazzo approved these changes May 13, 2022

View reviewed changes

ZmnSCPxj reviewed May 13, 2022

View reviewed changes

rustyrussell added 3 commits May 14, 2022 14:28

connectd: remove assert which can trigger.

b7d18a9

I have a test which reproduces this, too, and it's been seen in the wild. It seems we can add a subd as we're closing, which causes this assert to trigger. Fixes: ElementsProject#5254 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

CHANGELOG.md: release notes for 0.11.1.

b817e0a

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

rustyrussell force-pushed the guilt/connectd-cleanups branch from 3fab422 to b817e0a Compare May 14, 2022 04:58

vincenzopalazzo approved these changes May 15, 2022

View reviewed changes

vincenzopalazzo reviewed May 15, 2022

View reviewed changes

rustyrussell merged commit 1860bba into ElementsProject:master May 16, 2022

TonyGiorgio mentioned this pull request Jun 10, 2023

LDK sending old ChannelReestablish after reconnection lightningdevkit/rust-lightning#2350

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

connectd: fix accidental handling of old reconnections. #5256

connectd: fix accidental handling of old reconnections. #5256

rustyrussell commented May 13, 2022 •

edited by ZmnSCPxj

Loading

niftynei commented May 13, 2022 •

edited

Loading

vincenzopalazzo left a comment

ZmnSCPxj commented May 13, 2022 •

edited

Loading

ZmnSCPxj May 13, 2022

rustyrussell commented May 14, 2022

vincenzopalazzo left a comment

cdecker commented May 15, 2022

svewa commented May 15, 2022

vincenzopalazzo May 15, 2022

rustyrussell May 16, 2022

vincenzopalazzo May 16, 2022

	- connectd: fix assert which we could trigger. ([#5256])
	- connectd: fix assert which we could trigger. ([#5254])

connectd: fix accidental handling of old reconnections. #5256

connectd: fix accidental handling of old reconnections. #5256

Conversation

rustyrussell commented May 13, 2022 • edited by ZmnSCPxj Loading

niftynei commented May 13, 2022 • edited Loading

vincenzopalazzo left a comment

Choose a reason for hiding this comment

ZmnSCPxj commented May 13, 2022 • edited Loading

ZmnSCPxj May 13, 2022

Choose a reason for hiding this comment

rustyrussell commented May 14, 2022

vincenzopalazzo left a comment

Choose a reason for hiding this comment

cdecker commented May 15, 2022

svewa commented May 15, 2022

vincenzopalazzo May 15, 2022

Choose a reason for hiding this comment

rustyrussell May 16, 2022

Choose a reason for hiding this comment

vincenzopalazzo May 16, 2022

Choose a reason for hiding this comment

rustyrussell commented May 13, 2022 •

edited by ZmnSCPxj

Loading

niftynei commented May 13, 2022 •

edited

Loading

ZmnSCPxj commented May 13, 2022 •

edited

Loading