BOLT 2: quiescence protocol (feature 34/35) option_quiesce #869

rustyrussell · 2021-05-10T05:39:22Z

A simple protocol for upgrades, splices and other things which are easier when no updates are in-flight.

02-peer-protocol.md

Changelog-EXPERIMENTAL: Protocol: we support the quiescence protocol from lightning/bolts#869 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

rustyrussell · 2021-05-24T05:26:26Z

Added the concept of "initiator" which makes splicing far simpler.

Changelog-EXPERIMENTAL: Protocol: we support the quiescence protocol from lightning/bolts#869 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

rustyrussell · 2021-07-19T05:10:02Z

Trivial rebase on top of #880

Crypt-iQ · 2021-09-21T14:36:39Z

If stfu is type 2 (even), then it needs feature-bits according to bolt-01:

"MUST NOT send an evenly-typed message not listed here without prior negotiation"

rustyrussell · 2021-09-30T05:11:51Z

If stfu is type 2 (even), then it needs feature-bits according to bolt-01:
* "MUST NOT send an evenly-typed message not listed here without prior negotiation"

Excellent point! The purpose of STFU is to support other things, but it's worth having its own feature bit IMHO anyway.

lightning-developer

Concept ACK.

I think the wording around pending can be confusing and the cases if peers ignore the protocol could be elaborated a bit.

Also as a feedback: I initially thought that this could be generalized to the case of closing the channel but there we actually do want to send more updates to resolve / settle outstanding htlcs. I am writing this to demonstrate how easy it was to get confused by the notation.

lightning-developer · 2021-12-10T01:24:48Z

02-peer-protocol.md

+The sender of `stfu`:
+  - MUST NOT send `stfu` unless `option_quiesce` is negotiated.
+  - MUST NOT send `stfu` if any of the sender's htlc additions, htlc removals
+    or fee updates are pending for either peer.


The way how I initially understood the word pending was that there are HTLCs in flight which have not been settled or removed. Yet after going through the text I think pending means that the statemachine is not at a point where revoke_and_ack has been sent.

Maybe we could elaborate this a bit and make it more concrete?

Good point. This refers to the state machine, where changes are referred to as pending.

Concretely, this is not allowed if you have sent update_* messages, without a commitment_signed. Is that clearer?

Yes that would have been much clearer. But maybe not necessary at that point of the document if it is clear within BOLT2 that the word pending always refers to the state machine and not to in flight HTLCs. If not I suggest to introduce such a clarification the first time the word pending is being used in BOLT 2. Also as said I eventually understood the meaning of pending so the issue seems to be not that big.

I am having trouble understanding what "pending" means here.

In this flow, is the add considered pending?:

--update_add_htlc--> --commit_sig------->

If so, is it only considered non-pending at the end of this flow?:

--update_add_htlc--> --commit_sig-------> <--revoke_and_ack-- <--commit_sig------ --revoke_and_ack-->

Agree that "pending" is confusing here. I think it would be more clear to write:

MUST NOT send stfu if any of the sender's htlc additions, htlc removals
or fee updates have not been counter signed or failed by our peer.

Agree that "pending" is confusing here. I think it would be more clear to write:

* MUST NOT send `stfu` if any of the sender's htlc additions, htlc removals or fee updates have not been counter signed or failed by our peer.

I agree. I think it's still vague.

lightning-developer · 2021-12-10T01:29:11Z

02-peer-protocol.md

+    - MUST reply with `stfu` once it can do so.
+
+Upon disconnection:
+  - the channel is no longer considered quiescent.


Is that the only condition under which a channel can get out of being considered quiescent?

I guess future protocols that need a quiescent channel end with stopping the channel to be considered quiescent. But maybe this could also be done with a separate message or an additional explicit field in the stfu that resets the request to be quiescent.

Indeed. For now we only use this for channel upgrade: other changes may have different needs.

I probably missed something but why is a disconnect (of the connection) necessary for a channel upgrade? Do the peers after upgrade need to execute the routines in connection reestablishment?

Whether we add a new message that explicitly ends quiescence as suggested by @ProofOfKeags or leave it to be described in each downstream protocol, I think we need to say you can exit quiescence without sending a disconnect.

I do not think all downstream protocols should be forced to trigger a disconnect to leave quiescence.

I believe this is addressed.

lightning-developer · 2021-12-10T01:37:11Z

02-peer-protocol.md

+  - MUST NOT send `stfu` unless `option_quiesce` is negotiated.
+  - MUST NOT send `stfu` if any of the sender's htlc additions, htlc removals
+    or fee updates are pending for either peer.
+  - MUST NOT send `stfu` twice.


We did not specify what happens if a receiver receives two stfu messages? I guess the second one SHOULD be ignored or should there be more drastic measures?

It's undefined; since the sender shouldn't do that, I'd probably reply with a warning and disconnect.

lightning-developer · 2021-12-10T01:37:20Z

02-peer-protocol.md

+    - MUST set `initiator` to 1
+  - MUST set `channel_id` to the id of the channel to quiesce.
+  - MUST now consider the channel to be quiescing.
+  - MUST NOT send an update message after `stfu`.


again what happens if the sender does?

Again, undefined. I would warning and disconnect, but that's not a requirement.

t-bast · 2024-03-01T09:22:48Z

You have to anyway...
So then how do you handle disconnects then? I'd imagine that the volatile state gets rolled back, no?

You're right! We would probably use the same behavior as what happens if the other peer disconnects while we're quiescent and haven't completed the inner protocol. In the splicing case, it means that if we've reached a state where we cannot abort anymore, we finalize the signatures exchange when reconnecting or we force-close (if our peer is buggy/malicious).

That means that this explicit quiescence termination step is still some additional code, but it reuses existing behavior, so it may not be as bad as I thought.

I agree with you that it is conceptually cleaner to have a way to explicitly end quiescence. I'm less strongly against this than I was in my previous comment, it's probably at least worth prototyping an implementation to see how much code it would add before deciding whether we want this or not.

Would it be best if I made a PR that targeted @rustyrussell 's branch?

Yes, that sounds good 👍. I'd also be very interested in hearing @rustyrussell's thoughts on this unquiesce message, that would help tilt the balance in either direction.

remyers

My primary suggestion is to replacing pending and acknowledge with language that is more precise.

I think we also need to add something about ending quiescence when a downstream protocol succeeds and does not disconnect.

remyers · 2024-03-01T15:07:22Z

02-peer-protocol.md

+easiest on channels where both commitment transactions match, and no
+pending updates are in flight.  We define a protocol to quiesce the


To reduce the ambiguity with 'pending' meaning that an htlc is waiting for a pre-image vs. 'pending' meaning the update has not been mutually committed to by both sides, how about this instead:

"and all proposed updates are fully committed to by both parties."

I think we should do this

remyers · 2024-03-01T15:13:03Z

02-peer-protocol.md

+The sender of `stfu`:
+  - MUST NOT send `stfu` unless `option_quiesce` is negotiated.
+  - MUST NOT send `stfu` if any of the sender's htlc additions, htlc removals
+    or fee updates are pending for either peer.


Agree that "pending" is confusing here. I think it would be more clear to write:

MUST NOT send stfu if any of the sender's htlc additions, htlc removals
or fee updates have not been counter signed or failed by our peer.

remyers · 2024-03-01T15:23:43Z

02-peer-protocol.md

+
+If both sides send `stfu` simultaneously, they will both set
+`initiator` to `1`, in which case the "initiator" is arbitrarily
+considered to be the channel funder (the sender of `open_channel`).


nit: should we also mention open_channel2 here for completeness after rebasing?

Think we should still add this.

remyers · 2024-03-01T15:45:47Z

02-peer-protocol.md

+### Rationale
+
+The normal use would be to cease sending updates, then wait for all
+the current updates to be acknowledged by both peers, then start


I propose we replace "acknowledged" with "counter signed or rejected", or with whatever term we use in the "MUST NOT send stfu" section above to replace the ambiguous use of "pending".

remyers · 2024-03-01T16:04:54Z

02-peer-protocol.md

+    - MUST reply with `stfu` once it can do so.
+
+Upon disconnection:
+  - the channel is no longer considered quiescent.


Whether we add a new message that explicitly ends quiescence as suggested by @ProofOfKeags or leave it to be described in each downstream protocol, I think we need to say you can exit quiescence without sending a disconnect.

I do not think all downstream protocols should be forced to trigger a disconnect to leave quiescence.

ProofOfKeags · 2024-04-10T22:26:41Z

I have now come to believe that if we want this to be a standalone protocol gadget, we actually need an explicit resume message if we want to be able to test these gadgets in isolation.

At the moment I have been able to complete testing that we defer any update inducing state transitions while we are in an stfu state. To complete the testing we have to ensure that the updates that would have normally taken place while we were quiesced are run. However, there is a question where if we have no downstream protocol that can signify the termination of quiescence, we will stay in that state indefinitely. We can, of course, disconnect and test that things start up that way, however we would feasibly want to also do some live testing to ensure that once a channel is no longer quiesced, we can resume traffic and never find ourselves in an inconsistent state.

As it stands right now, without a go_on message or equivalent, we cannot complete testing, which tells me that the protocol is incomplete without it. So the time has come. I will be temporarily suspending implementation/testing work to put together a complete spec for the new message like I said I would do a few weeks back.

EDIT: spec update proposal

remyers · 2024-04-11T07:34:36Z

I have now come to believe that if we want this to be a standalone protocol gadget, we actually need an explicit resume message if we want to be able to test these gadgets in isolation.

From our experience, tests for quiescence ended by disconnect are very valuable. A go_on message would not capture the edge cases of a disconnect. When you combine quiescence with splicing, then you would want to end quiescence when splicing succeeds or disconnect on failure. Both situations may have subtle edge cases to handle that go_on would not capture.

I think for the purpose of treating quiescence as a stand alone gadget for testing, disconnect is sufficient. However, there could be a place for go_on as an internal place-holder for testing in lieu of testing with a specific downstream protocol.

ProofOfKeags · 2024-04-11T17:21:52Z

From our experience, tests for quiescence ended by disconnect are very valuable.

I am not suggesting this as a substitution. I think we need to test both. However, live resume has some different subtleties than resume via reconnect, which is why I was hoping to be able to test both situations. At the moment I'm limited to disconnect only in lieu of a complete implementation of splicing or dyncomms.

Chinwendu20 · 2024-04-12T15:50:06Z

02-peer-protocol.md

+  - otherwise:
+    - SHOULD NOT send any more update messages.
+    - MUST reply with `stfu` once it can do so.
+


Maybe you should add, "must now consider the channel as quiescent here as well" ?

You can't do that because quiescence is only achieved once both sides have sent it.

Yeah both sides have sent it, if a peer receives stfu and has sent one before, the peer must now consider the channel as quiescent (This is already stated in the spec, line 527)

The otherwise branch(L528-530), where the peer receives but has not sent one and so replies, I think it should also be stated that peer must now consider the channel as quiescent as well, just as in the other scenario.

It's redundant as that case is already covered by the requirements on the sender

I do not think it is. I could not find anywhere in the sender perspective where it is stated that the channel should now be considered as quiescent when stfu is sent. Rather, the channel is considered as quiescing in that case, line 522. I think when stfu is sent and received that is when the channel is considered quiescent?

Good catch. I will add this to the sender requirements.

I think we need to add it when the sender has received a reply so it might be more appropriate here, but maybe there is a better way to structure it.

I don't understand. Is it quiescent only when you've send AND received the message? Or only when you send it?

ProofOfKeags · 2024-04-30T18:44:58Z

The effort to add an explicit resume message is being abandoned after the discussion taking place here.

The core observation is that it increases the number of situations where we can result in "half-committed" state. Protocol state is considered half-committed when it can no longer be aborted (usually due to the transmission of a signature or secret) but has not yet reached a state where both parties have done so.

Currently we resolve all half-committed state issues through the channel_reestablish mechanism which is performed only at reconnection and we don't find a compelling reason at the moment to do it at any other time. Any explicit resume message will result in having to solve this problem. Further, while aesthetically displeasing, the layer violation is entirely manageable by making it explicit that a quiescence session is usable once only and that any downstream protocol that assumes quiescence must define its terminal states at which point quiescence terminates as well.

I will open up a new PR that clarifies these more subtle requirements and link it back here.

carlaKC · 2024-05-08T19:09:29Z

Weighing in late here, but is it worth adding an advisory note about disconnecting to exit quiescence if you need to resolve a HTLC to prevent a force close? Given that we don't settle our in-flight HTLCs, seems to me that a badly timed quiescence with some almost-expired HTLCs could be unfortunate unless you're paying attention to what's in your quiesced channel.

ProofOfKeags · 2024-05-08T19:30:25Z

Weighing in late here, but is it worth adding an advisory note about disconnecting to exit quiescence if you need to resolve a HTLC to prevent a force close? Given that we don't settle our in-flight HTLCs, seems to me that a badly timed quiescence with some almost-expired HTLCs could be unfortunate unless you're paying attention to what's in your quiesced channel.

Yes this is an important observation.

I think we do need to add an advisory note recommending that nodes disconnect if quiescence lasts "too long" at their discretion.

Nominally a quiescence session should not take more than a handful of seconds, as any negotiation that takes place on a quiesced channel should be able to complete imminently. Given that all of the HTLCs are block bound this should be improbable.

One way to combat the edge case though would be that if a channel is quiesced and a new block comes in (which would be what usually triggers any HTLC timeout action), then we can use a new block as a cue to cycle the connection on quiesced channels. I'll need to think a bit more about the consequences of this but I think it would be simpler to implement than a quiescence controller that was fully "htlc aware"

t-bast · 2024-05-13T07:35:31Z

Agreed, on our side we have a plain duration-based timeout: once quiescence is initiated, we start a 90 seconds timer (that value can be overridden by the node operator) that is tied to the quiescence session (and include the quiescent action, e.g. a splice). If that timer ticks before we get out of quiescence, we disconnect.

remyers · 2024-06-03T15:24:53Z

Using a version of Eclair modified for testing quiescence at commits 87b141f and 2fcdc4a I performed interop testing of quiescence with the LND quiescence branch ProofOfKeags:feature/stfu at commit 9225d29.

I confirmed the following basic scenarios played out as expected:

LND sends stfu and Eclair sends add_update_htlc before responding with stfu
Both nodes cross-sign the update
Eclair times out waiting for splice_init, sends a warning and disconnects
After reconnect the nodes settle the pending payment
same as 1, but Eclair also sends update_fee after responding with stfu
LND sends a warning and disconnects
After reconnect the nodes settle the pending payment
LND sends add_update_htlc and Eclair sends stfu before sending revoke_and_ack + commit_sig
LND sends revoke_and_ack before sending stfu

From these tests I feel comfortable about the interoperability between LND and Eclair. I think we should be good to merge this PR after addressing clean-up comments.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Header from folded patch 'bolt_2__set_an_initiator_in_quiescence.patch': BOLT #2: Set an initiator in quiescence. This is especially useful for protocols such as splicing; for simplified commitment transactions, there is already an implied initiator at each point, so having the negotiation at splicing time would be redundant. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Header from folded patch 'option_quiesce__feature_to_support_stfu_method.patch': option_quiesce: feature to support stfu method. In practice, sftu is useless unless you have something (e.g. channel_upgrade) which uses it, but adding a feature is best practice IMHO. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

rustyrussell · 2024-06-11T07:49:28Z

Rebased, merged rustyrussell#15

t-bast

ACK b474755, I believe this is ready to go 🚀

We may want to add a paragraph to clarify #869 (comment), or that can be a follow-up PR.

…ith HTLCs pending. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

rustyrussell · 2024-06-17T21:38:07Z

Added timeout requirement, as agreed.

ProofOfKeags · 2024-06-24T18:36:27Z

02-peer-protocol.md

+Both nodes:
+  - MUST disconnect after 60 seconds of quiescence if the HTLCs are pending.


I know this has already been merged but I do not believe this is a MUST. This is up to node policy and can be initiated by either side. Especially considering that time is both relative and difficult in distributed systems this ought to be a SHOULD

What is "pending" here? I believe it is a different definition than the one above.

rustyrussell mentioned this pull request May 10, 2021

EXPERIMENTAL: quiesence protocol ElementsProject/lightning#4520

Merged

t-bast mentioned this pull request May 10, 2021

Lightning Specification Meeting 2021/05/10 #870

Closed

9 tasks

t-bast reviewed May 10, 2021

View reviewed changes

02-peer-protocol.md Show resolved Hide resolved

02-peer-protocol.md Show resolved Hide resolved

rustyrussell force-pushed the quiescence-protocol branch 2 times, most recently from 99c0499 to b96218b Compare May 24, 2021 05:21

rustyrussell mentioned this pull request May 24, 2021

Splice draft (feature 62/63) #863

Closed

rustyrussell force-pushed the quiescence-protocol branch from b96218b to 65f98ff Compare July 19, 2021 05:10

t-bast mentioned this pull request Aug 9, 2021

Should the htlc_maximum_msat and htlc_minimum_msat fields in the channel_update be dynamic? #540

Closed

rustyrussell force-pushed the quiescence-protocol branch from 65f98ff to 9832e17 Compare September 30, 2021 05:08

rustyrussell changed the title ~~BOLT 2: quiescence protocol.~~ BOLT 2: quiescence protocol (feature 34/35) option_quiesce Sep 30, 2021

rustyrussell mentioned this pull request Sep 30, 2021

BOLT 2: upgrade protocol on reestablish #868

Closed

whitslack mentioned this pull request Oct 8, 2021

Feature request: Graceful shutdown ElementsProject/lightning#4842

Open

lightning-developer suggested changes Dec 10, 2021

View reviewed changes

carlaKC mentioned this pull request Sep 26, 2022

zzz-extension-bolt: dynamic commitments #1026

Closed

t-bast mentioned this pull request Oct 6, 2022

Lightning Specification Meeting 2022/10/10 #1030

Closed

26 tasks

t-bast mentioned this pull request Oct 19, 2022

Lightning Specification Meeting 2022/10/24 #1033

Closed

26 tasks

t-bast mentioned this pull request Nov 2, 2022

Lightning Specification Meeting 2022/11/07 #1038

Closed

30 tasks

t-bast mentioned this pull request Nov 17, 2022

Lightning Specification Meeting 2022/11/21 #1041

Closed

28 tasks

t-bast mentioned this pull request Dec 5, 2022

Lightning Specification Meeting 2022/12/05 #1046

Closed

28 tasks

t-bast mentioned this pull request Dec 15, 2022

Lightning Specification Meeting 2022/12/19 #1047

Closed

27 tasks

remyers suggested changes Mar 1, 2024

View reviewed changes

t-bast mentioned this pull request Mar 7, 2024

Lightning Specification Meeting 2024/03/11 #1146

Closed

21 tasks

t-bast mentioned this pull request Mar 20, 2024

Lightning Specification Meeting 2024/03/25 #1150

Closed

21 tasks

Roasbeef mentioned this pull request Apr 8, 2024

Lightning Specification Meeting 2024/04/08 #1152

Closed

21 tasks

ProofOfKeags mentioned this pull request Apr 10, 2024

bolt2: add go_on to quiescence spec rustyrussell/bolts#14

Closed

Chinwendu20 reviewed Apr 12, 2024

View reviewed changes

t-bast mentioned this pull request Apr 18, 2024

Lightning Specification Meeting 2024/04/22 #1155

Closed

22 tasks

This was referenced May 2, 2024

Channel Splicing (feature 62/63) #1160

Open

Lightning Specification Meeting 2024/05/06 #1161

Closed

t-bast mentioned this pull request May 14, 2024

Lightning Specification Meeting 2024/05/20 #1164

Closed

23 tasks

t-bast mentioned this pull request Jun 3, 2024

Lightning Specification Meeting 2024/06/03 #1167

Closed

23 tasks

rustyrussell and others added 2 commits June 11, 2024 17:14

require quiescence sessions to be single-use

b474755

rustyrussell force-pushed the quiescence-protocol branch from a68d7b0 to b474755 Compare June 11, 2024 07:49

t-bast approved these changes Jun 11, 2024

View reviewed changes

t-bast mentioned this pull request Jun 17, 2024

Lightning Specification Meeting 2024/06/17 #1172

Closed

22 tasks

BOLT 2: add requirement for disconnect if quiescence takes too long w…

ccaa4e6

…ith HTLCs pending. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

rustyrussell merged commit c562d91 into lightning:master Jun 17, 2024

ProofOfKeags reviewed Jun 24, 2024

View reviewed changes

		easiest on channels where both commitment transactions match, and no
		pending updates are in flight. We define a protocol to quiesce the

		Both nodes:
		- MUST disconnect after 60 seconds of quiescence if the HTLCs are pending.

BOLT 2: quiescence protocol (feature 34/35) option_quiesce #869

BOLT 2: quiescence protocol (feature 34/35) option_quiesce #869

Uh oh!

Conversation

rustyrussell commented May 10, 2021

Uh oh!

Uh oh!

Uh oh!

rustyrussell commented May 24, 2021

Uh oh!

rustyrussell commented Jul 19, 2021

Uh oh!

Crypt-iQ commented Sep 21, 2021

Uh oh!

rustyrussell commented Sep 30, 2021

Uh oh!

lightning-developer left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

t-bast commented Mar 1, 2024

Uh oh!

remyers left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ProofOfKeags commented Apr 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

remyers commented Apr 11, 2024

Uh oh!

ProofOfKeags commented Apr 11, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

ProofOfKeags commented Apr 10, 2024 •

edited

Loading

ProofOfKeags commented May 8, 2024 •

edited

Loading

t-bast left a comment •

edited

Loading