multi: implement spec-compliant coop close #6760

Crypt-iQ · 2022-07-22T16:47:54Z

The main changes are that:

ChannelLink can receive a Shutdown message from the peer
- If it receives it from the remote peer or from the rpcserver, it will enter a wind-down mode where new htlc's won't be processed from the peer (there is a minor exception if we're mid state-transition)
Once ChannelLink is totally clean via IsChannelClean(), it signals to peer.Brontide that fee negotiation can begin and the link shuts down.
If a Shutdown was sent, it will always be retransmitted.
The ChanCloser is reworked a bit so that closing_signed is no longer immediately sent after receiving and sending Shutdown

I included an integration test and have tested this in a local regtest environment and my tests passed. The integration test can probably be removed since it relies on time.Sleep(...) and is probably going to be flaky in CI. But added to show that it can work.

Fixes #6039

yyforyongyu

Cool work!

Still reviewing the PR, meanwhile the third commit is a bit large tbh😂

I want to split up the large htlcswitch+chancloser+peer commit

I think maybe we could split it by taking the hints from the commit message? something like, making each package ready for the new state, and then connect them using the new state.

This is accomplished by:
* passing Shutdown messages to the link when appropriate
* having the link tell peer.Brontide when closing_signed can be sent
* handling retransmission of Shutdown
* persisting the delivery scripts for retransmission
* reworking the ChanCloser such that closing_signed isn't immediately
sent after sending and receiving Shutdown

channeldb/channel.go

htlcswitch/link.go

yyforyongyu · 2022-11-11T07:31:10Z

htlcswitch/link.go

+				l.channel.LocalUpfrontShutdownScript(),
+			)
+
+			// Flip the shutdownInit and shutdownSent bools.


are we allowed to init and send the msg multiple times?

it's undefined in the spec meaning anything can happen (peer disconnects or something else) but the code here shouldn't be sending shutdown multiple times

htlcswitch/link.go

carlaKC

First pass, agree that the third commit needs some breaking up. Lots of moving pieces makes it difficult to review all in one batch.

peer/brontide.go

carlaKC · 2022-11-10T20:46:26Z

htlcswitch/link.go

+	// DeliveryAddr is the address to use in Shutdown if an upfront script
+	// has not been set.
+	DeliveryAddr lnwire.DeliveryAddress


Link config seems like a weird place for this?

I put this here because if there is no upfront shutdown script, the link needs to be able to send an address in Shutdown. An alternative would be to just pass a closure from peer/brontide.go instead of having the peer/brontide.go code generate it and then store it in the link config

Agree with above comment, the link has already been created well before we even know what the upfront shutdown addr is. Generally I don't think we should add more logic to the link (which already has a lot of critical operations), and instead we can use the existing goroutine in the peer to manage the shutdown state.

htlcswitch/link.go

carlaKC · 2022-11-14T16:06:02Z

htlcswitch/link.go

+			// upfront shutdown script if it exists, and the
+			// provided delivery address otherwise. This is used in
+			// case we did not initiate the coop close.
+			upfrontAddr := l.channel.LocalUpfrontShutdownScript()


Couldn't we use chooseDeliveryScript here rather? Simplifies needing to re-check the upfront shutdown again.

not sure what you mean here?

I don't think we can do that because if we were to call chooseDeliveryScript(l.channel.LocalUpfrontShutdownScript(), l.cfg.DeliveryAddr) it will return an error due to mismatch if the upfront script exists since l.cfg.DeliveryAddr will always be set

carlaKC · 2022-11-14T16:09:24Z

htlcswitch/link.go

+			// shutdownInit bool and wait until there are no more
+			// updates to send. We should not forward HTLC's to our
+			// peer. Only cancels and update_fee messages are
+			// allowed. The peer.Brontide


Comment incomplete?

yyforyongyu

Tried my best to review the commits, a bit lost in the end so some of the comments may not make sense🤦🏻. Will review it again once the third commit is split into smaller parts.

Basically I tried to review the third commit in five parts,

retransmit shutdown message
allow updates after shutdown is sent
a new phase to process closing signed, particularly changes around coopCloseReady
shutdownRequest related refactor, added NotifyShouldShutdown
newly added shutdown-related states for link

Maybe it could be broken into such?

yyforyongyu · 2022-12-12T01:11:13Z

channeldb/channel.go

@@ -1364,6 +1371,51 @@ func (c *OpenChannel) SecondCommitmentPoint() (*btcec.PublicKey, error) {
 	return input.ComputeCommitmentPoint(revocation[:]), nil
 }

+// MarkShutdownSent is used to mark that we've sent a Shutdown message for this
+// channel. This is so we can retransmit Shutdown.


This is so?

yyforyongyu · 2022-12-12T01:22:44Z

htlcswitch/link.go

+	// The mutex is only accessed if:
+	// - a write is occurring
+	// - a read is occurring from hasReceivedShutdown
+	// all reads from the htlcManager goroutine should be race-free.


Can now use atomic.Bool to avoid making the mutex.

The mutex is only accessed in production code if a write occurs. This happens in handleUpstreamMsg. The hasReceivedShutdown function is only called in test code. If we make this an atomic.Bool, then all of the read-only code that previously didn't need to access the mutex (since the reads are in the same goroutine as the write except for the test-only hasReceivedShutdown) will need to now use atomics. Is there that much of a noticeable difference in performance? If so, will change

Also curious so created a bench test here, turns out atmoic is 60x faster in write and 1000x faster in read.

I wasn't thinking about performance tho, but more about code health, as in avoiding using mutexes so we can care less about deadlocks or their side effects etc etc.

htlcswitch/link.go

yyforyongyu · 2022-12-12T01:45:05Z

peer/brontide.go

+
+	// If we need to retransmit Shutdown, create a ChanCloser and give it
+	// the Shutdown message.
+	retransmitShutdown := lnChan.State().HasSentShutdown()


why do we need to retransmit shutdown when the channel has sent shutdown?

This will retransmit shutdown if we've ever sent shutdown in the past

Does it mean we are sending multiple shutdowns? Suppose trySendingShutdown succeeds, the node restarts, then it will send another shutdown.

this is ok as long as it's not in the same p2p connection

yyforyongyu · 2022-12-12T06:13:08Z

htlcswitch/link.go

+
+			// The peer.Brontide that has access to this link will
+			// call RemoveLink which will stop this link. Returning
+			// here ensures no more channel updates can occur.


I think an info log is needed here.

htlcswitch/link.go

yyforyongyu · 2022-12-12T06:34:56Z

htlcswitch/link.go

+		// We've received a Shutdown message from the remote peer.
+		// We'll set the shutdownReceived bool and cancel back any new
+		// HTLC's received after this point instead of forwarding them.
+		if l.shutdownReceived {


I think the specs only say the sender MUST NOT send multiple shutdown messages., but the receiver side is not defined.

This defines it for LND here. If an implementation is sending multiple shutdown messages, they have no guarantees on what can happen. So to keep things simple, we just fail here

yyforyongyu · 2022-12-12T06:40:49Z

htlcswitch/link.go

+
+	// shutdownInit is a bool that is set when we've initiated a coop
+	// close.
+	shutdownInit bool


Have we thought about giving the link a state and advancing its states? That probably could make things a bit easier and more abstracted.

I think we should do that in a different PR

Crypt-iQ · 2023-01-17T21:48:41Z

I think I addressed most of the comments and I split up the commits.

yyforyongyu

This is indeed lots of change! Halfway there, send out my comments now in case they get lost.

yyforyongyu · 2023-01-19T09:35:55Z

rpcserver.go

@@ -4233,9 +4233,10 @@ func createRPCOpenChannel(r *rpcServer, dbChannel *channeldb.OpenChannel,
 		channel.PushAmountSat = uint64(amt)
 	}

-	if len(dbChannel.LocalShutdownScript) > 0 {
+	if len(dbChannel.GetLocalShutdownScript()) > 0 {


nit: could use a variable here to avoid calling GetLocalShutdownScript twice.

yyforyongyu · 2023-01-19T09:47:20Z

lnwallet/chancloser/chancloser.go

+	}
+
+	// Process the peer's ClosingSigned.
+	msgs, closeFin, err := c.ProcessCloseMsg(c.peerClosingSigned, true)


can just return c.ProcessCloseMsg here.

lnwallet/chancloser/chancloser.go

yyforyongyu · 2023-01-19T11:48:53Z

peer/brontide.go

 	chanCloser, err := p.createChanCloser(
-		lnChan, deliveryScript, feePerKw, nil, locallyInitiated,
+		lnChan, deliveryScript, feePerKw, nil, locallyInitiated, true,


maybe we could call chanCloser.ChannelClean() here to avoid having yet another variable cleanOnRecv?

ChannelClean has to be called at a specific point - when we have processed our local Shutdown and right after we receive the peer's Shutdown. This is only used if restarting the coop close flow and ChannelClean was called in a previous iteration.

but this is the only place where cleanOnRecv is true?

yeah it's the only place. I'll have to think if we can remove it by instead checking the channel's internal states

yyforyongyu · 2023-01-19T11:52:48Z

lnwallet/chancloser/chancloser.go

+
+	// channelClean is true if the peer.Brontide has signaled that the
+	// channel is no longer operational and is in a clean state.
+	channelClean bool


I haven't fully comprehended the meaning of this channelClean state. To me it seems it's closely tied to ChanStatusCoopBroadcasted since it will only be set when MarkCoopBroadcasted is called, and I guess we can use that instead.

There is one scenario in which ChanStatusCoopBroadcasted is true but channelClean is not true - on a restart. We can get around this, but I'd rather keep the bool for now

i assume that scenario exists because we have the field channelClean in the first place and it's default to empty value false...

yeah that's why

peer/brontide.go

yyforyongyu · 2023-01-19T12:28:29Z

htlcswitch/link.go

+			return
+		}
+
+		// Otherwise, set the shutdownReceived bool.


This "don't allow a second shutdown message" logic has already been implemented in chancloser and I think we should avoid duplicating it.

This is done to stop the link rather than have the chancloser stop the link

htlcswitch/linktestscoop

carlaKC

Done my best to review, but I'm don't think I'm familiar enough with this part of the codebase to approve a ~3k LOC change.

lnwallet/channel.go

channeldb/channel.go

lnwallet/chancloser/chancloser.go

carlaKC · 2023-01-19T20:14:48Z

server.go

@@ -644,7 +644,8 @@ func newServer(cfg *Config, listenAddrs []net.Addr,
 				return
 			}

-			peer.HandleLocalCloseChanReqs(request)
+			serverQuit := make(chan struct{})


Why do we need to pass in a quit channel if we never close use it?

carlaKC · 2023-01-19T20:22:10Z

htlcswitch/link.go

+		if err != nil {
+			l.fail(LinkFailureError{
+				code: ErrInternalError,
+			}, format, err)


This format/error is just used by l.fail to log the error down the line. Seems more typical to just return an error from trySendingShutdown (can wrap the error if desired) then add some generic formatting string here if we want it.

carlaKC · 2023-01-19T20:25:53Z

lntest/itest/lnd_coopclose_test.go

+
+	// Sleep again to ensure the ChannelArbitrator can advance to
+	// StateFullyResolved.
+	time.Sleep(10 * time.Second)


Use a wait / predicate until the closed channel shows up in Alice + Bob's close channel lists?

I added a predicate here, but for the other sleep I'm not sure it's possible. We need to start the coop close flow without settling too early because we want to test the behavior of trying to coop close while the HTLC is in-flight. I'm not aware of any RPC signals we can use to determine when Shutdown has been sent. If there were, we'd wait to send and receive Shutdown, and then call SettleInvoice.

yyforyongyu

Finished my first round. Left some questions and will do another round once answered!

yyforyongyu · 2023-01-20T09:14:07Z

htlcswitch/link.go

+		if l.isCoopReady() {
+			l.cfg.NotifyCoopReady(l.ChannelPoint(), l.quit)
+
+			// The peer.Brontide that has access to this link will


this is no long true as the link is now shutdown here.

The link isn't actually stopped here, only the htlcManager goroutine is closed. We could instead set a boolean to make it spin in a for loop. The peer.Brontide will call beginClosingSigned which will call RemoveLink which will call link.Stop and remove it from the Switch's maps

htlcswitch/link.go

yyforyongyu · 2023-01-20T09:28:28Z

htlcswitch/link.go

+	// Mark that we've sent the Shutdown message before actually sending
+	// it. This is also used for retransmission to know if we need to send
+	// a Shutdown on restart.
+	err := l.channel.State().MarkShutdownSent()


If the purpose is to survive from a restart, I think we need to mark it after we've successfully sent it?

If we did this, we could crash after we've sent, but before we've marked it. When we come back up, we wouldn't retransmit it and we would be in violation of the spec

hmm...but since we haven't marked it we'd resend it again when we come back up?

htlcswitch/link.go

yyforyongyu · 2023-01-20T09:54:36Z

lnwallet/chancloser/chancloser.go

-			c.cfg.ChainParams,
-		); err != nil {
-			return nil, false, err
+		// Check that we're not receiving two Shutdowns.


Will there be a case where the node sends the Shutdown, stops, and resends it again? A broader question is, duplicate msgs seem to be unavoidable, in gossip we just ignore them, can we ignore the shutdown too?

yyforyongyu · 2023-01-20T10:28:07Z

lnwallet/chancloser/chancloser.go

+	// Set the fee baseline now that we know the remote's delivery script.
+	c.initFeeBaseline()
+
+	c.channelClean = true


I mean if the node restarts, these empemeral states are lost.

peer/brontide.go

yyforyongyu · 2023-01-20T10:33:00Z

peer/brontide.go

 	chanCloser, err := p.createChanCloser(
-		lnChan, deliveryScript, feePerKw, nil, locallyInitiated,
+		lnChan, deliveryScript, feePerKw, nil, locallyInitiated, true,


but this is the only place where cleanOnRecv is true?

yyforyongyu · 2023-01-20T10:34:35Z

lnwallet/chancloser/chancloser.go

+
+	// channelClean is true if the peer.Brontide has signaled that the
+	// channel is no longer operational and is in a clean state.
+	channelClean bool


i assume that scenario exists because we have the field channelClean in the first place and it's default to empty value false...

Currently, if upfront shutdown is not used, the delivery script isn't persisted. This leads to a spec violation upon retransmission of Shutdown because LND will send a different delivery script. This change paves the way for persisting the delivery script so that LND is not in violation of the spec here.

This commit changes the ChanCloser logic in several ways: * ChanCloser no longer returns a Shutdown that the caller can send * This is so that a caller that manages ChanCloser can determine when the Shutdown can be sent. The caller will then pass the Shutdown to the ChanCloser before they send it out. * ProcessCloseMsg now takes a boolean called remote that denotes whether the passed message is from the peer or from us. * This is used in the aforementioned case where the caller passes the Shutdown message directly to the ChanCloser instead of the ChanCloser generating the Shutdown message and returning it. * A new method called ChannelClean is introduced. * This can be used by a caller to tell the ChanCloser when fee negotiation can begin. At this point, the ChanCloser may return a ClosingSigned message to send. * This is mostly to change the lifecycle of the ChanCloser so that the ChanCloser doesn't immediately advance to the closeFeeNegotiation step until it's appropriate. * Various sanity checks are added to ensure that proper state transitions are occurring.

When cleanOnRecv is set, the ChanCloser will immediately call ChannelClean upon receiving Shutdown from the peer. This is only used when we are restarting the coop close flow and the channel is already in a "clean" state. Specifically, this occurs when the channel has ChanStatusCoopBroadcasted but hasn't completed fee negotiation. We can call ChannelClean because the channel MUST be clean at this point.

This commit contains two changes: * HandleLocalCloseChanReqs now takes a quit channel as a precautionary measure to avoid deadlock and listens on it. * A function called HandleCoopReady is introduced that is used by a caller to notify the peer code when a channel's state is considered "clean" and fee negotiation can begin. Internally, this will call ChannelClean on the associated ChanCloser and fee negotiation will begin. This function also takes a quit channel as a precautionary measure to avoid deadlock. * Test code in brontide_test.go is also modified to use the exported functions instead of accessing the Brontide struct's local variables. One test is not modified because it will be removed in a later commit.

This commit passes any Shutdown messages to the associated ChannelLink. Note that the Shutdown message is still passed along the chanCloseMsgs channel and will be given to the ChanCloser for processing. If the call to ProcessCloseMsg in peer/brontide.go fails due to validation, the coop close process won't happen, but the link will continue to behave as if it will happen. This will become more clear in a later commit, but this is fine because the link will gracefully shutdown and be removed from the HtlcSwitch. Note also that the test-only hasReceivedShutdown function is not used until a later commit which adds link tests.

This commit changes several things in the htlcswitch, chancloser, and peer packages. * The ChanCloser's ShutdownChan method has been removed. * The first change is that the ShutdownIfChannelClean function has been removed from the ChannelLink interface. Its functionality has largely been replaced by the NotifyShouldShutdown function. This function will be used to notify the link when it should begin to wind down the link and attempt to send Shutdown after there are no more pending updates. * The second change is that some local channelLink variables are added. These are populated when the NotifyShouldShutdown function is called. The localCloseReq variable stores the passed *ChanClose so that we can refer to its fields later when we actually send the Shutdown message. The shutdownInit bool is so that the channelLink knows to enter the Shutdown phase when there are no pending updates. * The third change is that chooseDeliveryScript has been moved from peer/brontide.go to htlcswitch/link.go. This means that the link now checks whether the local user has violated upfront shutdown by putting an invalid shutdown script in the close request. This change also necessitated the move of ErrUpfrontShutdownScriptMismatch from lnwallet/chancloser/chancloser.go to htlcswitch/link.go. * The fourth change is the ChannelLinkConfig now contains a DeliveryAddr variable that is populated by the link-creation function in peer/brontide.go. This is used in a future commit so that when upfront shutdown isn't used, the link still has a script to send in the Shutdown message. * The fifth change is that the tryLinkShutdown function has been removed from peer/brontide.go. Since the link will still be up during the shutdown "phase" of coop close, it doesn't make sense to attempt to tear down the link. A future change will ensure that the link is only stopped when the link is in a "clean" state. * The sixth change is that the handleLocalCloseReq function in peer/brontide.go is modified to account for future logic. If the ChanCloser already exists in the activeChanCloses map, we will call the newly-introduced ChanCloser.SetLocalScript. This is needed because if the peer sent Shutdown first, the Brontide code will create a new ChanCloser with a freshly generated script via genDeliveryScript. However, this script may not match what either 1) the user requested or 2) the upfront shutdown script. In this case, it's necessary to "override" the generated script with the script from ChanClose parameter. * TestChooseDeliveryScript has been moved to htlcswitch/link_test.go. * TestCustomShutdownScript has been temporarily deleted and will be added to the htlcswitch test suite in a later commit.

This commit allows the channelLink to send out a Shutdown message when there are no pending local updates and either the local user or remote peer has initiated the coop close flow. * The ChannelLinkConfig has a new notification function called NotifySendingShutdown which will notify the peer/brontide.go code that the channelLink is about to send the Shutdown message to the remote peer. This is useful because it lets us advance the associated ChanCloser's state machine first. If we instead called this function *after* sending Shutdown to the remote peer, it would be possible that the remote peer sends us a ClosingSigned message when the ChanCloser for this channel wasn't ready for it. This ordering ensures that no races like this can occur. * The ChannelLinkConfig has another new notification function called NotifyCoopReady. This is called when the link has both sent and received Shutdown and when the commitment state is completely "clean" of any updates. This function tells the peer/brontide.go code to tear down the link and begin fee negotiation. * The main change is that the htlcManager goroutine now includes checks to see whether it can send Shutdown and, if it has already sent Shutdown, if it can start to wind down the link and clean up the htlcManager goroutine. * If the channelLink is 1) in the "shutdown phase", 2) hasn't yet sent Shutdown, and 3) has no pending updates, then Shutdown is sent. * If a local upfront shutdown script exists, that will be used in the Shutdown message. If one doesn't exist, the ChannelLinkConfig's DeliveryAddr variable will be persisted to the database and used in the Shutdown message. It is persisted before sending Shutdown so that if a restart occurs, the same script will be sent upon retransmission. This is a requirement of BOLT#02. The retransmission logic is in a later commit. * After the script has been chosen, MarkShutdownSent is called. This sets a boolean in the database and ensures that we will send a Shutdown on reconnect. This is a requirement of BOLT#02. * A *ChanClose is populated and passed to NotifySendingShutdown, which makes sure the related ChanCloser advances state before the Shutdown is actually sent. * After the above steps are complete, the Shutdown message is finally sent to the remote peer. * Various tests are added which exercise different Shutdown-related scenarios.

This commit allows the link-construction function addLink in peer/brontide.go to set a boolean RetransmitShutdown to true. When this is true, the link will retransmit Shutdown using the persisted delivery script. This happens right after the ChannelReestablish dance. When the peer/brontide.go sets this boolean to true, it creates a ChanCloser and feeds an identical Shutdown message to the ChanCloser so that it can advance state. A link-level test is included that exercises the retransmission logic.

This commit modifies the link so that if 1) the user has initiated Shutdown (shutdownInit is set), or 2) we have received Shutdown from the remote peer, the link will fail back any HTLCs from the HtlcSwitch. This is so that the link can come to a stopping point and wind down. A test is included that exercises this logic.

This commit makes the link fail and send an Error message if the peer sends an HTLC after sending a Shutdown message. Doing so is a spec violation, so no spec-compliant peer should do this. A unit test is added that exercises this logic.

This changes the HtlcSwitch to use NotifyShouldShutdown instead of the now-deleted LocalChannelClose function.

This check is no longer necessary since we are able to initiate the cooperative close flow while we have HTLCs on the commitment transaction.

yyforyongyu

Getting close! Post a few comments first in case they get lost while I'm still reviewing it now.

yyforyongyu · 2023-02-15T13:30:38Z

lnwallet/chancloser/chancloser.go

@@ -705,6 +705,15 @@ func (c *ChanCloser) ProcessCloseMsg(msg lnwire.Message, remote bool) (
 		chancloserLog.Infof("ChannelPoint(%v): entering fee "+
 			"negotiation", c.chanPoint)

+		if remote && c.isChannelClean() {


commit message doesn't match the change.

IIUC, we should have already called this (in the peer) since we receive a shutdown message that prompts the chan closer to be created in the first place.

yyforyongyu · 2023-02-15T13:44:44Z

htlcswitch/link.go

+		// We've received a Shutdown message from the remote peer.
+		// We'll set the shutdownReceived bool and cancel back any new
+		// HTLC's received after this point instead of forwarding them.
+		if l.shutdownReceived.Load() {


Could use l.hasReceivedShutdown()

yyforyongyu · 2023-02-15T14:03:13Z

peer/brontide.go

-		}
+			err = fmt.Errorf("unable to process close msg: %w",
+				err)
+			peerLog.Error(err)


I think we want to use p.log so we know which peer failed from the log.

yyforyongyu · 2023-02-15T14:05:51Z

peer/brontide.go

-		// provided, generate a fresh script.
-		if len(deliveryScript) == 0 {
-			deliveryScript, err = p.genDeliveryScript()
+		chanCloser, ok := p.activeChanCloses[chanID]


ideally we'd have unit tests for this, but I don't think we have unit tests for handleLocalCloseReq in the first place so nvm in this pr...

yyforyongyu · 2023-02-15T14:19:59Z

peer/brontide.go

+			//   generates the script
+			// - the Shutdown that the link sends will use a
+			//   different script.
+			chanCloser.SetLocalScript(req.DeliveryScript)


Q: how does link know this updated script?

yyforyongyu · 2023-02-15T14:36:10Z

htlcswitch/link.go

+	// Mark that we've sent the Shutdown message before actually sending
+	// it. This is also used for retransmission to know if we need to send
+	// a Shutdown on restart.
+	err := l.channel.State().MarkShutdownSent()


hmm...but since we haven't marked it we'd resend it again when we come back up?

yyforyongyu · 2023-02-15T14:41:51Z

htlcswitch/link.go

+	// Notify peer.Brontide that we've sent the Shutdown message so that
+	// the coop close state machine can advance.
+	closeReq := l.localCloseReq
+	if closeReq == nil {


The closeReq could only be nil if we are retransmitting the shutdown right?

yyforyongyu · 2023-02-15T15:03:41Z

htlcswitch/link.go

+	// receives the Shutdown. This happens before the peer can reply with a
+	// ClosingSigned because we haven't sent Shutdown to the peer at this
+	// point.
+	l.cfg.NotifySendingShutdown(closeReq, l.quit)


This creates a goroutine as NotifySendingShutdown -> handleLocalCloseReq -> ProcessCloseMsg, which marks the channel as disabled and sends a channel update. I'm not sure about the implications here, just wondering whether we should wait for NotifySendingShutdown to be finished first.

yyforyongyu · 2023-02-15T16:28:39Z

htlcswitch/link_test.go

+	n.aliceChannelLink.cfg.RetransmitShutdown = true
+
+	err = n.start()
+	require.NoError(t, err)


i think we need to check l.shutdownInit == true to test if it's retransmitted.

yyforyongyu · 2023-02-15T16:32:34Z

peer/brontide.go

+	// If we need to retransmit Shutdown, create a ChanCloser and give it
+	// the Shutdown message.
+	retransmitShutdown := lnChan.State().HasSentShutdown()
+	if retransmitShutdown {


Can we do NotifySendingShutdown in link instead?

Roasbeef

Catching up with this PR again, stopped a bit short as I don't agree that the link should grow to understand more details about the shutdown procedure. This expands the set of responsibilities of the link beyond channel state machine driving and payment forwarding. This also makes it harder to test the shutdown specific aspects, as we now have another sub-system that has gained responsibilities critical to proper opertion of the shutdown.

Instead, I think that we should keep minimal knowledge we have in the link. Aside from the assertions re message ordering, the main things we want from this are:

The shutdown procedure restarts if we shutdown before things were finalized.
- Instead of the link transmitting (which means we need the link alive for channels that need to resume a co-op close). The channel state machine can do that.
The shutdown request (from RPC) should be acknowledged, with a persistent write to disk that we should shut down the channel. Once the coast is clear, shutdown should start.
- Rather than the link taking on this responsibility (including sending the message), it should just take the request, then start to cancel back all HTLCs. Once no HTLCs are present (and also the link is marked as not eligible to forward), then it should respond back to the channel close state machine to kick things off.

Overall, I think by not introducing link into the critical flow, and by keeping the existing linear state transition in the state machine, we can simplify a lot of this logic, thereby making it easier to review and test.

Roasbeef · 2023-02-15T23:29:28Z

channeldb/channel.go

+			return err
+		}
+
+		channel.LocalShutdownScript = deliveryScript


What about also persisting the last offer we sent over? Np, in doing in another PR (should make an issue for).

Roasbeef · 2023-02-15T23:30:13Z

channeldb/channel.go

@@ -818,6 +822,10 @@ type OpenChannel struct {
 	// default ShortChannelID. This is only set for zero-conf channels.
 	confirmedScid lnwire.ShortChannelID

+	// sentShutdown denotes whether or not we've sent the Shutdown message
+	// for this channel.
+	sentShutdown bool


Can alternatively store a nested TLV here that can be extended to store the: delivery script sent, fee range, last offer, etc.

Roasbeef · 2023-02-15T23:30:35Z

channeldb/channel.go

+// PersistDeliveryScript is used during the cooperative close flow to persist
+// a script sent in Shutdown when we did not set an upfront shutdown script
+// during the funding flow.
+func (c *OpenChannel) PersistDeliveryScript(


Don't see this called in this commit fwiw.

Roasbeef · 2023-02-15T23:32:05Z

lnwallet/chancloser/chancloser.go

+
+	// peerClosingSigned stores a single ClosingSigned that is received
+	// while the channel is still not in a clean state.
+	peerClosingSigned *lnwire.ClosingSigned


IIUC, this would only ever be sent once we're in a "clean" slate, here "clean" is defined as us having sent a shutdown message (so it's clean and we're ready for the other party).

Roasbeef · 2023-02-15T23:32:38Z

lnwallet/chancloser/chancloser.go

+	// Set the fee baseline now that we know the remote's delivery script.
+	c.initFeeBaseline()
+
+	// Before continuing, mark the channel as cooperatively closed with a


So then all the link stuff is handled at another layer?

Roasbeef · 2023-02-15T23:46:23Z

htlcswitch/link.go

+	// DeliveryAddr is the address to use in Shutdown if an upfront script
+	// has not been set.
+	DeliveryAddr lnwire.DeliveryAddress


Agree with above comment, the link has already been created well before we even know what the upfront shutdown addr is. Generally I don't think we should add more logic to the link (which already has a lot of critical operations), and instead we can use the existing goroutine in the peer to manage the shutdown state.

Roasbeef · 2023-02-15T23:46:57Z

htlcswitch/link.go

@@ -413,6 +426,21 @@ type channelLink struct {
 	quit chan struct{}
 }

+// chanCloseCtx contains a ChanClose pointer as well as a buffered error chan.
+type chanCloseCtx struct {
+	closeReq *ChanClose


Yeh we already have localCloseReq stored above?

Roasbeef · 2023-02-15T23:47:43Z

htlcswitch/link.go

+			// Determine if upfront shutdown has been violated. If
+			// it hasn't, replace the DeliveryAddr with the
+			// user-provided one.
+			deliveryAddr, err := chooseDeliveryScript(


Should stay in the chan closer imo. Co-op close is distinct from normal channel operation as it starts after normal channel operation has ceased.

Roasbeef · 2023-02-15T23:48:26Z

htlcswitch/link.go

+			}
+
+			if len(deliveryAddr) != 0 {
+				l.cfg.DeliveryAddr = deliveryAddr


Don't think this should be here, but if it's just stashing, then it can live in the channelLink struct instead of the main config (this isn't externally set).

Roasbeef · 2023-02-15T23:49:48Z

htlcswitch/link.go

+	// subsystem that Shutdown is about to be sent. This is not used during
+	// retransmission. It takes a quit chan that is used as a precaution to
+	// avoid deadlock.
+	NotifySendingShutdown func(req *ChanClose, quit chan struct{})


I think this flow should be reversed still: the link should gain an async way (which already exists) to notify the chan closer state machine that it can enter the shutdown flow.

Crypt-iQ · 2023-02-16T15:35:23Z

  * Instead of the link transmitting (which means we need the link alive for channels that need to resume a co-op close). The channel state machine can do that.

The link isn't always alive when resuming coop close - see

lnd/peer/brontide.go

Line 831 in d4c3b35

msgs = append(msgs, shutdownMsg)

  * Rather than the link taking on this responsibility (including sending the message), it should just take the request, then start to cancel back all HTLCs. Once no HTLCs are present (and also the link is marked as not eligible to forward), then it should respond back to the channel close state machine to kick things off.

The coop close flow allows HTLCs to be present on the commitment transactions during shutdown. They just need to be off before closing_signed. Waiting for all HTLCs to be cleared before sending shutdown may end up stalling coop close for a long time because the peer is never notified that you want to initiate a coop close. This means the peer isn't subject to this BOLT#02 requirement

* once there are no outstanding updates on the peer, UNLESS it has already sent a shutdown:
    * MUST reply to a shutdown message with a shutdown

However, if we wait for our offered HTLCs to be resolved, that should be fine.

Overall, I think by not introducing link into the critical flow, and by keeping the existing linear state transition in the state machine, we can simplify a lot of this logic, thereby making it easier to review and test.

I went down the other path suggested when I first started writing this and the issue I found was that it was not possible given the async nature of both the goroutine managing the ChanCloser and the goroutine managing the link (htlcManager) for the proper message ordering to be enforced.

If a peer sends shutdown followed by update_add_htlc, this is disallowed by the spec. However, it's possible that update_add_htlc reaches the link before some shutdown signal from the ChanCloser reaches the link. This means that LND will sometimes accept this message ordering even though it is disallowed. We can choose to be lenient here.
If a peer sends update_add_htlc followed by shutdown, this is allowed by the spec. It's possible that some shutdown "signal" from the ChanCloser reaches the link before the update_add_htlc. If the link is strict, it will fail the channel and force close even though the peer was spec-compliant.

Points 1 & 2 are addressed in the current code by ensuring that the link handles shutdown by taking it from the mailbox.

To summarize, the alternative approach needs the following:

Wait for our offered HTLCs to be resolved before sending shutdown, even though we can send shutdown almost immediately. This is straightforward.
- This allows the link to signal to the ChanCloser when to send shutdown
Figure out the ChanCloser/link async interaction with message order (points 1 & 2 above). This is not straightforward.

lightninglabs-deploy · 2023-02-23T16:33:39Z

@Crypt-iQ, remember to re-request review from reviewers when ready

Crypt-iQ added p2p Code related to the peer-to-peer behaviour channel closing Related to the closing of channels cooperatively and uncooperatively spec labels Jul 22, 2022

Crypt-iQ force-pushed the coop_close_970 branch 5 times, most recently from 0766b7a to 9ae20e6 Compare July 27, 2022 21:00

Crypt-iQ mentioned this pull request Jul 28, 2022

lnwallet/chancloser: remove the commit fee clamp, introduce max fee #6770

Merged

Roasbeef added this to the v0.16.0 milestone Aug 9, 2022

Roasbeef linked an issue Aug 24, 2022 that may be closed by this pull request

lnwallet+peer: properly process update_fee messages sent after a shutdown has been initiated #6522

Closed

2 tasks

saubyk removed this from the v0.16.0 milestone Aug 27, 2022

Crypt-iQ force-pushed the coop_close_970 branch 2 times, most recently from 2998182 to 75e94d0 Compare October 13, 2022 23:57

Crypt-iQ marked this pull request as ready for review October 13, 2022 23:57

yyforyongyu self-requested a review October 14, 2022 03:56

Crypt-iQ force-pushed the coop_close_970 branch from 75e94d0 to 2b9fc67 Compare October 17, 2022 15:28

Crypt-iQ mentioned this pull request Oct 19, 2022

chancloser+lnwire: add range-based negotiation #7062

Closed

5 tasks

Crypt-iQ force-pushed the coop_close_970 branch from 2b9fc67 to e25691b Compare October 27, 2022 23:10

yyforyongyu mentioned this pull request Oct 30, 2022

itest: fix previously known test flakes #6825

Merged

1 task

Neil-LL requested a review from Roasbeef November 8, 2022 18:13

yyforyongyu requested changes Nov 11, 2022

View reviewed changes

carlaKC reviewed Nov 14, 2022

View reviewed changes

Crypt-iQ force-pushed the coop_close_970 branch from e25691b to f5f7c02 Compare November 30, 2022 22:35

saubyk requested review from carlaKC and yyforyongyu December 6, 2022 18:12

yyforyongyu requested changes Dec 12, 2022

View reviewed changes

Crypt-iQ force-pushed the coop_close_970 branch 3 times, most recently from 4c3c164 to d94c709 Compare January 17, 2023 19:09

yyforyongyu reviewed Jan 19, 2023

View reviewed changes

carlaKC reviewed Jan 19, 2023

View reviewed changes

yyforyongyu reviewed Jan 20, 2023

View reviewed changes

Crypt-iQ force-pushed the coop_close_970 branch from d76ec83 to bd0ce2a Compare February 8, 2023 16:52

Crypt-iQ added 14 commits February 8, 2023 13:09

channeldb: persist shutdown info with channel, add helper funcs

39f29c0

htlcswitch+server.go: remove LocalChannelClose, use NotifyShouldShutdown

0ac5d69

This changes the HtlcSwitch to use NotifyShouldShutdown instead of the now-deleted LocalChannelClose function.

rpcserver.go: remove the ActiveHtlcs check when coop closing

e22bae3

This check is no longer necessary since we are able to initiate the cooperative close flow while we have HTLCs on the commitment transaction.

htlcswitch: appease linter regarding unrelated test code

d4c3b35

Crypt-iQ force-pushed the coop_close_970 branch from bd0ce2a to e053e15 Compare February 8, 2023 18:09

Crypt-iQ requested a review from yyforyongyu February 9, 2023 15:50

Crypt-iQ force-pushed the coop_close_970 branch from e053e15 to d4c3b35 Compare February 9, 2023 15:50

yyforyongyu reviewed Feb 15, 2023

View reviewed changes

Roasbeef requested changes Feb 15, 2023

View reviewed changes

Crypt-iQ closed this Mar 1, 2023

Roasbeef mentioned this pull request Apr 24, 2023

[bug]: lnd sends "internal error" to peer, when peer tries to update channel during shutdown #7628

Open

Roasbeef mentioned this pull request Sep 13, 2023

Can't mutual close with pending HTLC #6039

Closed

ProofOfKeags mentioned this pull request Dec 7, 2023

Bugfix/htlc flush shutdown #8167

Merged

7 tasks

multi: implement spec-compliant coop close #6760

multi: implement spec-compliant coop close #6760

Conversation

Crypt-iQ commented Jul 22, 2022 • edited Loading

yyforyongyu left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carlaKC left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yyforyongyu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Crypt-iQ commented Jan 17, 2023

yyforyongyu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carlaKC left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yyforyongyu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yyforyongyu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Roasbeef left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Crypt-iQ commented Jul 22, 2022 •

edited

Loading

yyforyongyu left a comment •

edited

Loading