-
Notifications
You must be signed in to change notification settings - Fork 491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not unnecessarily retransmit commitment_signed
in dual funding
#1214
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2520,7 +2520,8 @@ A receiving node: | |
- if `next_funding_txid` is set: | ||
- if `next_funding_txid` matches the latest interactive funding transaction: | ||
- if it has not received `tx_signatures` for that funding transaction: | ||
- MUST retransmit its `commitment_signed` for that funding transaction. | ||
- if `next_commitment_number` is zero: | ||
- MUST retransmit its `commitment_signed` for that funding transaction. | ||
- if it has already received `commitment_signed` and it should sign first, | ||
as specified in the [`tx_signatures` requirements](#the-tx_signatures-message): | ||
- MUST send its `tx_signatures` for that funding transaction. | ||
|
@@ -2529,6 +2530,9 @@ A receiving node: | |
- otherwise: | ||
- MUST send `tx_abort` to let the sending node know that they can forget | ||
this funding transaction. | ||
- if `next_funding_txid` is not set, and `next_commitment_number` is zero: | ||
- MUST immediately fail the channel and broadcast any relevant latest commitment | ||
transaction. | ||
Comment on lines
+2533
to
+2535
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. They were definitely still doing it post-934, we started doing it (in addition to an There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @ziggie1984 this is the discussion we had about There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. LND will respond to a normal establishment msg with an error so there is no need for the other implementation to immediately force close when receiving the establishment-msg with a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That wasn't quite the question, though - will lnd force-close a channel when it receives an There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Triggering a force-close with a channel-reestablishment (setting next_commitment_number = 0) msg seems more robust rather then always force closing when an error is received wdyt ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I don't think this is the case, as we haven't seen such
Good, thanks for clarifying that. Since the SCB-restoring node sends Just one last question: the error sent by There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I disagree, and a good counter-example to that is that dual-funding has valid reasons to set It's a much better spec to force-close on There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point, haven't looked into dual-funding, that then needs to change on our side and we need to make it more strict.
We sent the following error string:
There seems to be no error codes for these error msgs: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Perfect, we'll definitely force-close on that one!
Because there shouldn't need to be any error code, since the behavior should be to:
But implementations have had to deviate from the spec because |
||
|
||
A node: | ||
- MUST NOT assume that previously-transmitted messages were lost, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I notice that you didn't add a requirement that we MUST NOT retransmit
commitment_signed
ifnext_commitment_number = 1
. That's a good thing, because this wouldn't be backwards-compatible, sinceeclair
andcln
currently always sendnext_commitment_number = 1
and will always retransmitcommit_sig
ifnext_funding_txid
is included.We previously discussed whether it was worth avoiding unnecessarily retransmitting
commit_sig
in that case. We decided that since it was simple to ignore a spurious retransmission, it wasn't worth the extra effort. But if you feel that it's important to be cleaner here and avoid that retransmission, we can definitely add it, WDYT @ddustin @niftynei?In order to introduce this without breaking too many nodes, I think we should deploy that change using the following steps:
channel_reestablish
, acceptnext_commitment_number = 0
ifnext_funding_txid
is setchannel_reestablish
, setnext_commitment_number = 0
when asking to retransmitcommit_sig
channel_reestablish
, don't retransmitcommit_sig
ifnext_commitment_number = 1
If we decide to go down that road, we must add the same mechanism for splicing: retransmitting
commit_sig
whennext_funding_txid
is set should be gated onnext_commitment_number
being equal to the currentcommitment_number
. I'm implementing this in eclair to verify that this doesn't have any unintended side effects.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, that was a bit of an oversight. I don't really see why you'd want to spuriously retransmit when its so trivial to avoid. I'm not sure about the current dual-funding implementation but in LDK more generally if you retransmit a commitment_signed post-open we'll definitely think its invalid and FC the channel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In terms of upgrade, I don't see why we wouldn't just stop sending the spurious retransmit. Most nodes that support dual-funding will upgrade quickly but generally nodes should in practice handle the spurious retransmit for a bit, but I don't really think it needs to be around that long :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to handle backwards-compatibility, because there are two implementations that already shipped with the current retransmit behavior and it's actively used on mainnet. It doesn't have to take a year, but we must coordinate releases with
cln
if we make this change and allow a grace period where we retransmit.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's even more trivial to ignore the spurious retransmit and not have to deal with thinking about whether to retransmit or not (and always send
commit_sig
beforetx_signatures
), which is what we chose to currently implement...to be honest it's unclear to me why it's a real problem for you (apart from "we can avoid it")?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, okay, so then we're on the same page. One remaining question - what does CLN/eclair do today if they get a next commit number of 0? Can we skip the first step of your upgrade process above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better than I thought for the dual-funding case:
eclair
will acceptnext_commitment_number = 0
, so we can indeed skip one step of the upgrade.It's for splicing that this is an issue, because that
next_commitment_number
will currently be interpreted as our peer being late, but that's only a problem for our own backwards-compat betweeneclair
andphoenix
since splicing isn't officially supported yet, so we'll bite the bullet and deal with it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably you're not splicing with a next commit number of 0, though, that would imply the first version never even got signed :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant is that for splicing this is slightly more complex:
local_commitment_number = N
next_commitment_number = N+1
commit_sig
but before receiving the remotecommit_sig
next_funding_txid
andnext_commitment_number = N
, right? Because we're missing our peer'scommit_sig
for commitment N that spends the new splice transactionchannel_reestablish
logic and would be sendingnext_commitment_number = N + 1
, and if we sentnext_commitment_number = N
our peer would currently think we're late (because we're not looking at whethernext_funding_txid
is set in that case, which would remove the confusion)So we need to more carefully manage our deployment to ensure we're not force-closing channels in the process 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As was discussed during this week's spec meeting, since this only matters if peers disconnect in the middle of the signing flow, which should be extremely rare for server nodes, we probably don't need to care that much about phasing deployments between
eclair
andcln
, and can directly implement this PR. We need a sign-off fromcln
though, @ddustin @niftynei does that sound good to you?