-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multi: implement spec-compliant coop close #6760
Conversation
0766b7a
to
9ae20e6
Compare
2998182
to
75e94d0
Compare
75e94d0
to
2b9fc67
Compare
2b9fc67
to
e25691b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool work!
Still reviewing the PR, meanwhile the third commit is a bit large tbh😂
I want to split up the large htlcswitch+chancloser+peer commit
I think maybe we could split it by taking the hints from the commit message? something like, making each package ready for the new state, and then connect them using the new state.
This is accomplished by:
* passing Shutdown messages to the link when appropriate
* having the link tell peer.Brontide when closing_signed can be sent
* handling retransmission of Shutdown
* persisting the delivery scripts for retransmission
* reworking the ChanCloser such that closing_signed isn't immediately
sent after sending and receiving Shutdown
htlcswitch/link.go
Outdated
l.channel.LocalUpfrontShutdownScript(), | ||
) | ||
|
||
// Flip the shutdownInit and shutdownSent bools. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we allowed to init and send the msg multiple times?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's undefined in the spec meaning anything can happen (peer disconnects or something else) but the code here shouldn't be sending shutdown
multiple times
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass, agree that the third commit needs some breaking up. Lots of moving pieces makes it difficult to review all in one batch.
// DeliveryAddr is the address to use in Shutdown if an upfront script | ||
// has not been set. | ||
DeliveryAddr lnwire.DeliveryAddress |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link config seems like a weird place for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put this here because if there is no upfront shutdown script, the link needs to be able to send an address in Shutdown
. An alternative would be to just pass a closure from peer/brontide.go
instead of having the peer/brontide.go
code generate it and then store it in the link config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with above comment, the link has already been created well before we even know what the upfront shutdown addr is. Generally I don't think we should add more logic to the link (which already has a lot of critical operations), and instead we can use the existing goroutine in the peer to manage the shutdown state.
htlcswitch/link.go
Outdated
// upfront shutdown script if it exists, and the | ||
// provided delivery address otherwise. This is used in | ||
// case we did not initiate the coop close. | ||
upfrontAddr := l.channel.LocalUpfrontShutdownScript() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't we use chooseDeliveryScript
here rather? Simplifies needing to re-check the upfront shutdown again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what you mean here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can do that because if we were to call chooseDeliveryScript(l.channel.LocalUpfrontShutdownScript(), l.cfg.DeliveryAddr)
it will return an error due to mismatch if the upfront script exists since l.cfg.DeliveryAddr
will always be set
htlcswitch/link.go
Outdated
// shutdownInit bool and wait until there are no more | ||
// updates to send. We should not forward HTLC's to our | ||
// peer. Only cancels and update_fee messages are | ||
// allowed. The peer.Brontide |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment incomplete?
e25691b
to
f5f7c02
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried my best to review the commits, a bit lost in the end so some of the comments may not make sense🤦🏻. Will review it again once the third commit is split into smaller parts.
Basically I tried to review the third commit in five parts,
- retransmit shutdown message
- allow updates after shutdown is sent
- a new phase to process closing signed, particularly changes around
coopCloseReady
shutdownRequest
related refactor, addedNotifyShouldShutdown
- newly added shutdown-related states for
link
Maybe it could be broken into such?
channeldb/channel.go
Outdated
@@ -1364,6 +1371,51 @@ func (c *OpenChannel) SecondCommitmentPoint() (*btcec.PublicKey, error) { | |||
return input.ComputeCommitmentPoint(revocation[:]), nil | |||
} | |||
|
|||
// MarkShutdownSent is used to mark that we've sent a Shutdown message for this | |||
// channel. This is so we can retransmit Shutdown. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is so
?
htlcswitch/link.go
Outdated
// The mutex is only accessed if: | ||
// - a write is occurring | ||
// - a read is occurring from hasReceivedShutdown | ||
// all reads from the htlcManager goroutine should be race-free. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can now use atomic.Bool
to avoid making the mutex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mutex is only accessed in production code if a write occurs. This happens in handleUpstreamMsg
. The hasReceivedShutdown
function is only called in test code. If we make this an atomic.Bool
, then all of the read-only code that previously didn't need to access the mutex (since the reads are in the same goroutine as the write except for the test-only hasReceivedShutdown
) will need to now use atomics. Is there that much of a noticeable difference in performance? If so, will change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also curious so created a bench test here, turns out atmoic
is 60x faster in write and 1000x faster in read.
I wasn't thinking about performance tho, but more about code health, as in avoiding using mutexes so we can care less about deadlocks or their side effects etc etc.
|
||
// If we need to retransmit Shutdown, create a ChanCloser and give it | ||
// the Shutdown message. | ||
retransmitShutdown := lnChan.State().HasSentShutdown() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to retransmit shutdown when the channel has sent shutdown?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will retransmit shutdown if we've ever sent shutdown in the past
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean we are sending multiple shutdowns? Suppose trySendingShutdown
succeeds, the node restarts, then it will send another shutdown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is ok as long as it's not in the same p2p connection
|
||
// The peer.Brontide that has access to this link will | ||
// call RemoveLink which will stop this link. Returning | ||
// here ensures no more channel updates can occur. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think an info
log is needed here.
htlcswitch/link.go
Outdated
// We've received a Shutdown message from the remote peer. | ||
// We'll set the shutdownReceived bool and cancel back any new | ||
// HTLC's received after this point instead of forwarding them. | ||
if l.shutdownReceived { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the specs only say the sender MUST NOT send multiple shutdown messages.
, but the receiver side is not defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This defines it for LND here. If an implementation is sending multiple shutdown messages, they have no guarantees on what can happen. So to keep things simple, we just fail here
|
||
// shutdownInit is a bool that is set when we've initiated a coop | ||
// close. | ||
shutdownInit bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we thought about giving the link a state
and advancing its states? That probably could make things a bit easier and more abstracted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should do that in a different PR
4c3c164
to
d94c709
Compare
I think I addressed most of the comments and I split up the commits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is indeed lots of change! Halfway there, send out my comments now in case they get lost.
rpcserver.go
Outdated
@@ -4233,9 +4233,10 @@ func createRPCOpenChannel(r *rpcServer, dbChannel *channeldb.OpenChannel, | |||
channel.PushAmountSat = uint64(amt) | |||
} | |||
|
|||
if len(dbChannel.LocalShutdownScript) > 0 { | |||
if len(dbChannel.GetLocalShutdownScript()) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could use a variable here to avoid calling GetLocalShutdownScript
twice.
lnwallet/chancloser/chancloser.go
Outdated
} | ||
|
||
// Process the peer's ClosingSigned. | ||
msgs, closeFin, err := c.ProcessCloseMsg(c.peerClosingSigned, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can just return c.ProcessCloseMsg
here.
peer/brontide.go
Outdated
chanCloser, err := p.createChanCloser( | ||
lnChan, deliveryScript, feePerKw, nil, locallyInitiated, | ||
lnChan, deliveryScript, feePerKw, nil, locallyInitiated, true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we could call chanCloser.ChannelClean()
here to avoid having yet another variable cleanOnRecv
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ChannelClean
has to be called at a specific point - when we have processed our local Shutdown and right after we receive the peer's Shutdown. This is only used if restarting the coop close flow and ChannelClean
was called in a previous iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but this is the only place where cleanOnRecv
is true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah it's the only place. I'll have to think if we can remove it by instead checking the channel's internal states
lnwallet/chancloser/chancloser.go
Outdated
|
||
// channelClean is true if the peer.Brontide has signaled that the | ||
// channel is no longer operational and is in a clean state. | ||
channelClean bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't fully comprehended the meaning of this channelClean
state. To me it seems it's closely tied to ChanStatusCoopBroadcasted
since it will only be set when MarkCoopBroadcasted
is called, and I guess we can use that instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is one scenario in which ChanStatusCoopBroadcasted
is true but channelClean
is not true - on a restart. We can get around this, but I'd rather keep the bool for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i assume that scenario exists because we have the field channelClean
in the first place and it's default to empty value false...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that's why
return | ||
} | ||
|
||
// Otherwise, set the shutdownReceived bool. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This "don't allow a second shutdown message" logic has already been implemented in chancloser
and I think we should avoid duplicating it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done to stop the link rather than have the chancloser
stop the link
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done my best to review, but I'm don't think I'm familiar enough with this part of the codebase to approve a ~3k LOC change.
server.go
Outdated
@@ -644,7 +644,8 @@ func newServer(cfg *Config, listenAddrs []net.Addr, | |||
return | |||
} | |||
|
|||
peer.HandleLocalCloseChanReqs(request) | |||
serverQuit := make(chan struct{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to pass in a quit channel if we never close use it?
htlcswitch/link.go
Outdated
if err != nil { | ||
l.fail(LinkFailureError{ | ||
code: ErrInternalError, | ||
}, format, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This format/error is just used by l.fail
to log the error down the line. Seems more typical to just return an error from trySendingShutdown
(can wrap the error if desired) then add some generic formatting string here if we want it.
lntest/itest/lnd_coopclose_test.go
Outdated
|
||
// Sleep again to ensure the ChannelArbitrator can advance to | ||
// StateFullyResolved. | ||
time.Sleep(10 * time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a wait / predicate until the closed channel shows up in Alice + Bob's close channel lists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a predicate here, but for the other sleep I'm not sure it's possible. We need to start the coop close flow without settling too early because we want to test the behavior of trying to coop close while the HTLC is in-flight. I'm not aware of any RPC signals we can use to determine when Shutdown
has been sent. If there were, we'd wait to send and receive Shutdown
, and then call SettleInvoice
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finished my first round. Left some questions and will do another round once answered!
if l.isCoopReady() { | ||
l.cfg.NotifyCoopReady(l.ChannelPoint(), l.quit) | ||
|
||
// The peer.Brontide that has access to this link will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is no long true as the link is now shutdown here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The link isn't actually stopped here, only the htlcManager
goroutine is closed. We could instead set a boolean to make it spin in a for loop. The peer.Brontide will call beginClosingSigned
which will call RemoveLink
which will call link.Stop
and remove it from the Switch's maps
// Mark that we've sent the Shutdown message before actually sending | ||
// it. This is also used for retransmission to know if we need to send | ||
// a Shutdown on restart. | ||
err := l.channel.State().MarkShutdownSent() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the purpose is to survive from a restart, I think we need to mark it after we've successfully sent it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we did this, we could crash after we've sent, but before we've marked it. When we come back up, we wouldn't retransmit it and we would be in violation of the spec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm...but since we haven't marked it we'd resend it again when we come back up?
c.cfg.ChainParams, | ||
); err != nil { | ||
return nil, false, err | ||
// Check that we're not receiving two Shutdowns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will there be a case where the node sends the Shutdown, stops, and resends it again? A broader question is, duplicate msgs seem to be unavoidable, in gossip we just ignore them, can we ignore the shutdown too?
lnwallet/chancloser/chancloser.go
Outdated
// Set the fee baseline now that we know the remote's delivery script. | ||
c.initFeeBaseline() | ||
|
||
c.channelClean = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean if the node restarts, these empemeral states are lost.
peer/brontide.go
Outdated
chanCloser, err := p.createChanCloser( | ||
lnChan, deliveryScript, feePerKw, nil, locallyInitiated, | ||
lnChan, deliveryScript, feePerKw, nil, locallyInitiated, true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but this is the only place where cleanOnRecv
is true?
lnwallet/chancloser/chancloser.go
Outdated
|
||
// channelClean is true if the peer.Brontide has signaled that the | ||
// channel is no longer operational and is in a clean state. | ||
channelClean bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i assume that scenario exists because we have the field channelClean
in the first place and it's default to empty value false...
d76ec83
to
bd0ce2a
Compare
Currently, if upfront shutdown is not used, the delivery script isn't persisted. This leads to a spec violation upon retransmission of Shutdown because LND will send a different delivery script. This change paves the way for persisting the delivery script so that LND is not in violation of the spec here.
This commit changes the ChanCloser logic in several ways: * ChanCloser no longer returns a Shutdown that the caller can send * This is so that a caller that manages ChanCloser can determine when the Shutdown can be sent. The caller will then pass the Shutdown to the ChanCloser before they send it out. * ProcessCloseMsg now takes a boolean called remote that denotes whether the passed message is from the peer or from us. * This is used in the aforementioned case where the caller passes the Shutdown message directly to the ChanCloser instead of the ChanCloser generating the Shutdown message and returning it. * A new method called ChannelClean is introduced. * This can be used by a caller to tell the ChanCloser when fee negotiation can begin. At this point, the ChanCloser may return a ClosingSigned message to send. * This is mostly to change the lifecycle of the ChanCloser so that the ChanCloser doesn't immediately advance to the closeFeeNegotiation step until it's appropriate. * Various sanity checks are added to ensure that proper state transitions are occurring.
When cleanOnRecv is set, the ChanCloser will immediately call ChannelClean upon receiving Shutdown from the peer. This is only used when we are restarting the coop close flow and the channel is already in a "clean" state. Specifically, this occurs when the channel has ChanStatusCoopBroadcasted but hasn't completed fee negotiation. We can call ChannelClean because the channel MUST be clean at this point.
This commit contains two changes: * HandleLocalCloseChanReqs now takes a quit channel as a precautionary measure to avoid deadlock and listens on it. * A function called HandleCoopReady is introduced that is used by a caller to notify the peer code when a channel's state is considered "clean" and fee negotiation can begin. Internally, this will call ChannelClean on the associated ChanCloser and fee negotiation will begin. This function also takes a quit channel as a precautionary measure to avoid deadlock. * Test code in brontide_test.go is also modified to use the exported functions instead of accessing the Brontide struct's local variables. One test is not modified because it will be removed in a later commit.
This commit passes any Shutdown messages to the associated ChannelLink. Note that the Shutdown message is still passed along the chanCloseMsgs channel and will be given to the ChanCloser for processing. If the call to ProcessCloseMsg in peer/brontide.go fails due to validation, the coop close process won't happen, but the link will continue to behave as if it will happen. This will become more clear in a later commit, but this is fine because the link will gracefully shutdown and be removed from the HtlcSwitch. Note also that the test-only hasReceivedShutdown function is not used until a later commit which adds link tests.
This commit changes several things in the htlcswitch, chancloser, and peer packages. * The ChanCloser's ShutdownChan method has been removed. * The first change is that the ShutdownIfChannelClean function has been removed from the ChannelLink interface. Its functionality has largely been replaced by the NotifyShouldShutdown function. This function will be used to notify the link when it should begin to wind down the link and attempt to send Shutdown after there are no more pending updates. * The second change is that some local channelLink variables are added. These are populated when the NotifyShouldShutdown function is called. The localCloseReq variable stores the passed *ChanClose so that we can refer to its fields later when we actually send the Shutdown message. The shutdownInit bool is so that the channelLink knows to enter the Shutdown phase when there are no pending updates. * The third change is that chooseDeliveryScript has been moved from peer/brontide.go to htlcswitch/link.go. This means that the link now checks whether the local user has violated upfront shutdown by putting an invalid shutdown script in the close request. This change also necessitated the move of ErrUpfrontShutdownScriptMismatch from lnwallet/chancloser/chancloser.go to htlcswitch/link.go. * The fourth change is the ChannelLinkConfig now contains a DeliveryAddr variable that is populated by the link-creation function in peer/brontide.go. This is used in a future commit so that when upfront shutdown isn't used, the link still has a script to send in the Shutdown message. * The fifth change is that the tryLinkShutdown function has been removed from peer/brontide.go. Since the link will still be up during the shutdown "phase" of coop close, it doesn't make sense to attempt to tear down the link. A future change will ensure that the link is only stopped when the link is in a "clean" state. * The sixth change is that the handleLocalCloseReq function in peer/brontide.go is modified to account for future logic. If the ChanCloser already exists in the activeChanCloses map, we will call the newly-introduced ChanCloser.SetLocalScript. This is needed because if the peer sent Shutdown first, the Brontide code will create a new ChanCloser with a freshly generated script via genDeliveryScript. However, this script may not match what either 1) the user requested or 2) the upfront shutdown script. In this case, it's necessary to "override" the generated script with the script from ChanClose parameter. * TestChooseDeliveryScript has been moved to htlcswitch/link_test.go. * TestCustomShutdownScript has been temporarily deleted and will be added to the htlcswitch test suite in a later commit.
This commit allows the channelLink to send out a Shutdown message when there are no pending local updates and either the local user or remote peer has initiated the coop close flow. * The ChannelLinkConfig has a new notification function called NotifySendingShutdown which will notify the peer/brontide.go code that the channelLink is about to send the Shutdown message to the remote peer. This is useful because it lets us advance the associated ChanCloser's state machine first. If we instead called this function *after* sending Shutdown to the remote peer, it would be possible that the remote peer sends us a ClosingSigned message when the ChanCloser for this channel wasn't ready for it. This ordering ensures that no races like this can occur. * The ChannelLinkConfig has another new notification function called NotifyCoopReady. This is called when the link has both sent and received Shutdown and when the commitment state is completely "clean" of any updates. This function tells the peer/brontide.go code to tear down the link and begin fee negotiation. * The main change is that the htlcManager goroutine now includes checks to see whether it can send Shutdown and, if it has already sent Shutdown, if it can start to wind down the link and clean up the htlcManager goroutine. * If the channelLink is 1) in the "shutdown phase", 2) hasn't yet sent Shutdown, and 3) has no pending updates, then Shutdown is sent. * If a local upfront shutdown script exists, that will be used in the Shutdown message. If one doesn't exist, the ChannelLinkConfig's DeliveryAddr variable will be persisted to the database and used in the Shutdown message. It is persisted before sending Shutdown so that if a restart occurs, the same script will be sent upon retransmission. This is a requirement of BOLT#02. The retransmission logic is in a later commit. * After the script has been chosen, MarkShutdownSent is called. This sets a boolean in the database and ensures that we will send a Shutdown on reconnect. This is a requirement of BOLT#02. * A *ChanClose is populated and passed to NotifySendingShutdown, which makes sure the related ChanCloser advances state before the Shutdown is actually sent. * After the above steps are complete, the Shutdown message is finally sent to the remote peer. * Various tests are added which exercise different Shutdown-related scenarios.
This commit allows the link-construction function addLink in peer/brontide.go to set a boolean RetransmitShutdown to true. When this is true, the link will retransmit Shutdown using the persisted delivery script. This happens right after the ChannelReestablish dance. When the peer/brontide.go sets this boolean to true, it creates a ChanCloser and feeds an identical Shutdown message to the ChanCloser so that it can advance state. A link-level test is included that exercises the retransmission logic.
This commit modifies the link so that if 1) the user has initiated Shutdown (shutdownInit is set), or 2) we have received Shutdown from the remote peer, the link will fail back any HTLCs from the HtlcSwitch. This is so that the link can come to a stopping point and wind down. A test is included that exercises this logic.
This commit makes the link fail and send an Error message if the peer sends an HTLC after sending a Shutdown message. Doing so is a spec violation, so no spec-compliant peer should do this. A unit test is added that exercises this logic.
This changes the HtlcSwitch to use NotifyShouldShutdown instead of the now-deleted LocalChannelClose function.
This check is no longer necessary since we are able to initiate the cooperative close flow while we have HTLCs on the commitment transaction.
bd0ce2a
to
e053e15
Compare
e053e15
to
d4c3b35
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting close! Post a few comments first in case they get lost while I'm still reviewing it now.
@@ -705,6 +705,15 @@ func (c *ChanCloser) ProcessCloseMsg(msg lnwire.Message, remote bool) ( | |||
chancloserLog.Infof("ChannelPoint(%v): entering fee "+ | |||
"negotiation", c.chanPoint) | |||
|
|||
if remote && c.isChannelClean() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commit message doesn't match the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, we should have already called this (in the peer) since we receive a shutdown message that prompts the chan closer to be created in the first place.
// We've received a Shutdown message from the remote peer. | ||
// We'll set the shutdownReceived bool and cancel back any new | ||
// HTLC's received after this point instead of forwarding them. | ||
if l.shutdownReceived.Load() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use l.hasReceivedShutdown()
} | ||
err = fmt.Errorf("unable to process close msg: %w", | ||
err) | ||
peerLog.Error(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to use p.log
so we know which peer failed from the log.
// provided, generate a fresh script. | ||
if len(deliveryScript) == 0 { | ||
deliveryScript, err = p.genDeliveryScript() | ||
chanCloser, ok := p.activeChanCloses[chanID] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally we'd have unit tests for this, but I don't think we have unit tests for handleLocalCloseReq
in the first place so nvm in this pr...
// generates the script | ||
// - the Shutdown that the link sends will use a | ||
// different script. | ||
chanCloser.SetLocalScript(req.DeliveryScript) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: how does link know this updated script?
// Mark that we've sent the Shutdown message before actually sending | ||
// it. This is also used for retransmission to know if we need to send | ||
// a Shutdown on restart. | ||
err := l.channel.State().MarkShutdownSent() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm...but since we haven't marked it we'd resend it again when we come back up?
// Notify peer.Brontide that we've sent the Shutdown message so that | ||
// the coop close state machine can advance. | ||
closeReq := l.localCloseReq | ||
if closeReq == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The closeReq
could only be nil
if we are retransmitting the shutdown right?
// receives the Shutdown. This happens before the peer can reply with a | ||
// ClosingSigned because we haven't sent Shutdown to the peer at this | ||
// point. | ||
l.cfg.NotifySendingShutdown(closeReq, l.quit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This creates a goroutine as NotifySendingShutdown -> handleLocalCloseReq -> ProcessCloseMsg
, which marks the channel as disabled and sends a channel update. I'm not sure about the implications here, just wondering whether we should wait for NotifySendingShutdown
to be finished first.
n.aliceChannelLink.cfg.RetransmitShutdown = true | ||
|
||
err = n.start() | ||
require.NoError(t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we need to check l.shutdownInit == true
to test if it's retransmitted.
// If we need to retransmit Shutdown, create a ChanCloser and give it | ||
// the Shutdown message. | ||
retransmitShutdown := lnChan.State().HasSentShutdown() | ||
if retransmitShutdown { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do NotifySendingShutdown
in link instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Catching up with this PR again, stopped a bit short as I don't agree that the link should grow to understand more details about the shutdown procedure. This expands the set of responsibilities of the link beyond channel state machine driving and payment forwarding. This also makes it harder to test the shutdown specific aspects, as we now have another sub-system that has gained responsibilities critical to proper opertion of the shutdown.
Instead, I think that we should keep minimal knowledge we have in the link. Aside from the assertions re message ordering, the main things we want from this are:
- The shutdown procedure restarts if we shutdown before things were finalized.
- Instead of the link transmitting (which means we need the link alive for channels that need to resume a co-op close). The channel state machine can do that.
- The shutdown request (from RPC) should be acknowledged, with a persistent write to disk that we should shut down the channel. Once the coast is clear, shutdown should start.
- Rather than the link taking on this responsibility (including sending the message), it should just take the request, then start to cancel back all HTLCs. Once no HTLCs are present (and also the link is marked as not eligible to forward), then it should respond back to the channel close state machine to kick things off.
Overall, I think by not introducing link into the critical flow, and by keeping the existing linear state transition in the state machine, we can simplify a lot of this logic, thereby making it easier to review and test.
return err | ||
} | ||
|
||
channel.LocalShutdownScript = deliveryScript |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about also persisting the last offer we sent over? Np, in doing in another PR (should make an issue for).
@@ -818,6 +822,10 @@ type OpenChannel struct { | |||
// default ShortChannelID. This is only set for zero-conf channels. | |||
confirmedScid lnwire.ShortChannelID | |||
|
|||
// sentShutdown denotes whether or not we've sent the Shutdown message | |||
// for this channel. | |||
sentShutdown bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can alternatively store a nested TLV here that can be extended to store the: delivery script sent, fee range, last offer, etc.
// PersistDeliveryScript is used during the cooperative close flow to persist | ||
// a script sent in Shutdown when we did not set an upfront shutdown script | ||
// during the funding flow. | ||
func (c *OpenChannel) PersistDeliveryScript( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't see this called in this commit fwiw.
|
||
// peerClosingSigned stores a single ClosingSigned that is received | ||
// while the channel is still not in a clean state. | ||
peerClosingSigned *lnwire.ClosingSigned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, this would only ever be sent once we're in a "clean" slate, here "clean" is defined as us having sent a shutdown message (so it's clean and we're ready for the other party).
// Set the fee baseline now that we know the remote's delivery script. | ||
c.initFeeBaseline() | ||
|
||
// Before continuing, mark the channel as cooperatively closed with a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So then all the link stuff is handled at another layer?
// DeliveryAddr is the address to use in Shutdown if an upfront script | ||
// has not been set. | ||
DeliveryAddr lnwire.DeliveryAddress |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with above comment, the link has already been created well before we even know what the upfront shutdown addr is. Generally I don't think we should add more logic to the link (which already has a lot of critical operations), and instead we can use the existing goroutine in the peer to manage the shutdown state.
@@ -413,6 +426,21 @@ type channelLink struct { | |||
quit chan struct{} | |||
} | |||
|
|||
// chanCloseCtx contains a ChanClose pointer as well as a buffered error chan. | |||
type chanCloseCtx struct { | |||
closeReq *ChanClose |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeh we already have localCloseReq
stored above?
// Determine if upfront shutdown has been violated. If | ||
// it hasn't, replace the DeliveryAddr with the | ||
// user-provided one. | ||
deliveryAddr, err := chooseDeliveryScript( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should stay in the chan closer imo. Co-op close is distinct from normal channel operation as it starts after normal channel operation has ceased.
} | ||
|
||
if len(deliveryAddr) != 0 { | ||
l.cfg.DeliveryAddr = deliveryAddr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think this should be here, but if it's just stashing, then it can live in the channelLink
struct instead of the main config (this isn't externally set).
// subsystem that Shutdown is about to be sent. This is not used during | ||
// retransmission. It takes a quit chan that is used as a precaution to | ||
// avoid deadlock. | ||
NotifySendingShutdown func(req *ChanClose, quit chan struct{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this flow should be reversed still: the link should gain an async way (which already exists) to notify the chan closer state machine that it can enter the shutdown flow.
The link isn't always alive when resuming coop close - see Line 831 in d4c3b35
The coop close flow allows HTLCs to be present on the commitment transactions during
However, if we wait for our offered HTLCs to be resolved, that should be fine.
I went down the other path suggested when I first started writing this and the issue I found was that it was not possible given the async nature of both the goroutine managing the
Points 1 & 2 are addressed in the current code by ensuring that the link handles To summarize, the alternative approach needs the following:
|
@Crypt-iQ, remember to re-request review from reviewers when ready |
Implements lightning/bolts#970
The main changes are that:
Shutdown
message from the peerIsChannelClean()
, it signals topeer.Brontide
that fee negotiation can begin and the link shuts down.Shutdown
was sent, it will always be retransmitted.ChanCloser
is reworked a bit so thatclosing_signed
is no longer immediately sent after receiving and sendingShutdown
I included an integration test and have tested this in a local regtest environment and my tests passed. The integration test can probably be removed since it relies on
time.Sleep(...)
and is probably going to be flaky in CI. But added to show that it can work.Fixes #6039