Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi: implement spec-compliant coop close #6760

Closed
wants to merge 14 commits into from

Conversation

Crypt-iQ
Copy link
Collaborator

@Crypt-iQ Crypt-iQ commented Jul 22, 2022

Implements lightning/bolts#970

The main changes are that:

  • ChannelLink can receive a Shutdown message from the peer
    • If it receives it from the remote peer or from the rpcserver, it will enter a wind-down mode where new htlc's won't be processed from the peer (there is a minor exception if we're mid state-transition)
  • Once ChannelLink is totally clean via IsChannelClean(), it signals to peer.Brontide that fee negotiation can begin and the link shuts down.
  • If a Shutdown was sent, it will always be retransmitted.
  • The ChanCloser is reworked a bit so that closing_signed is no longer immediately sent after receiving and sending Shutdown

I included an integration test and have tested this in a local regtest environment and my tests passed. The integration test can probably be removed since it relies on time.Sleep(...) and is probably going to be flaky in CI. But added to show that it can work.

Fixes #6039

@Crypt-iQ Crypt-iQ added p2p Code related to the peer-to-peer behaviour channel closing Related to the closing of channels cooperatively and uncooperatively spec labels Jul 22, 2022
@Crypt-iQ Crypt-iQ force-pushed the coop_close_970 branch 5 times, most recently from 0766b7a to 9ae20e6 Compare July 27, 2022 21:00
@Roasbeef Roasbeef added this to the v0.16.0 milestone Aug 9, 2022
@saubyk saubyk removed this from the v0.16.0 milestone Aug 27, 2022
@Crypt-iQ Crypt-iQ force-pushed the coop_close_970 branch 2 times, most recently from 2998182 to 75e94d0 Compare October 13, 2022 23:57
@Crypt-iQ Crypt-iQ marked this pull request as ready for review October 13, 2022 23:57
@yyforyongyu yyforyongyu self-requested a review October 14, 2022 03:56
@Neil-LL Neil-LL requested a review from Roasbeef November 8, 2022 18:13
Copy link
Member

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool work!

Still reviewing the PR, meanwhile the third commit is a bit large tbh😂

I want to split up the large htlcswitch+chancloser+peer commit

I think maybe we could split it by taking the hints from the commit message? something like, making each package ready for the new state, and then connect them using the new state.

This is accomplished by:
* passing Shutdown messages to the link when appropriate
* having the link tell peer.Brontide when closing_signed can be sent
* handling retransmission of Shutdown
* persisting the delivery scripts for retransmission
* reworking the ChanCloser such that closing_signed isn't immediately
sent after sending and receiving Shutdown

channeldb/channel.go Show resolved Hide resolved
channeldb/channel.go Show resolved Hide resolved
htlcswitch/link.go Outdated Show resolved Hide resolved
l.channel.LocalUpfrontShutdownScript(),
)

// Flip the shutdownInit and shutdownSent bools.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we allowed to init and send the msg multiple times?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's undefined in the spec meaning anything can happen (peer disconnects or something else) but the code here shouldn't be sending shutdown multiple times

htlcswitch/link.go Outdated Show resolved Hide resolved
htlcswitch/link.go Outdated Show resolved Hide resolved
htlcswitch/link.go Outdated Show resolved Hide resolved
htlcswitch/link.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@carlaKC carlaKC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass, agree that the third commit needs some breaking up. Lots of moving pieces makes it difficult to review all in one batch.

peer/brontide.go Outdated Show resolved Hide resolved
Comment on lines +315 to +323
// DeliveryAddr is the address to use in Shutdown if an upfront script
// has not been set.
DeliveryAddr lnwire.DeliveryAddress
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link config seems like a weird place for this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put this here because if there is no upfront shutdown script, the link needs to be able to send an address in Shutdown. An alternative would be to just pass a closure from peer/brontide.go instead of having the peer/brontide.go code generate it and then store it in the link config

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with above comment, the link has already been created well before we even know what the upfront shutdown addr is. Generally I don't think we should add more logic to the link (which already has a lot of critical operations), and instead we can use the existing goroutine in the peer to manage the shutdown state.

htlcswitch/link.go Outdated Show resolved Hide resolved
htlcswitch/link.go Outdated Show resolved Hide resolved
// upfront shutdown script if it exists, and the
// provided delivery address otherwise. This is used in
// case we did not initiate the coop close.
upfrontAddr := l.channel.LocalUpfrontShutdownScript()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we use chooseDeliveryScript here rather? Simplifies needing to re-check the upfront shutdown again.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what you mean here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can do that because if we were to call chooseDeliveryScript(l.channel.LocalUpfrontShutdownScript(), l.cfg.DeliveryAddr) it will return an error due to mismatch if the upfront script exists since l.cfg.DeliveryAddr will always be set

// shutdownInit bool and wait until there are no more
// updates to send. We should not forward HTLC's to our
// peer. Only cancels and update_fee messages are
// allowed. The peer.Brontide
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment incomplete?

Copy link
Member

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried my best to review the commits, a bit lost in the end so some of the comments may not make sense🤦🏻. Will review it again once the third commit is split into smaller parts.

Basically I tried to review the third commit in five parts,

  1. retransmit shutdown message
  2. allow updates after shutdown is sent
  3. a new phase to process closing signed, particularly changes around coopCloseReady
  4. shutdownRequest related refactor, added NotifyShouldShutdown
  5. newly added shutdown-related states for link

Maybe it could be broken into such?

@@ -1364,6 +1371,51 @@ func (c *OpenChannel) SecondCommitmentPoint() (*btcec.PublicKey, error) {
return input.ComputeCommitmentPoint(revocation[:]), nil
}

// MarkShutdownSent is used to mark that we've sent a Shutdown message for this
// channel. This is so we can retransmit Shutdown.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so?

// The mutex is only accessed if:
// - a write is occurring
// - a read is occurring from hasReceivedShutdown
// all reads from the htlcManager goroutine should be race-free.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can now use atomic.Bool to avoid making the mutex.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mutex is only accessed in production code if a write occurs. This happens in handleUpstreamMsg. The hasReceivedShutdown function is only called in test code. If we make this an atomic.Bool, then all of the read-only code that previously didn't need to access the mutex (since the reads are in the same goroutine as the write except for the test-only hasReceivedShutdown) will need to now use atomics. Is there that much of a noticeable difference in performance? If so, will change

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also curious so created a bench test here, turns out atmoic is 60x faster in write and 1000x faster in read.

I wasn't thinking about performance tho, but more about code health, as in avoiding using mutexes so we can care less about deadlocks or their side effects etc etc.

htlcswitch/link.go Show resolved Hide resolved
htlcswitch/link.go Outdated Show resolved Hide resolved

// If we need to retransmit Shutdown, create a ChanCloser and give it
// the Shutdown message.
retransmitShutdown := lnChan.State().HasSentShutdown()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to retransmit shutdown when the channel has sent shutdown?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will retransmit shutdown if we've ever sent shutdown in the past

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean we are sending multiple shutdowns? Suppose trySendingShutdown succeeds, the node restarts, then it will send another shutdown.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is ok as long as it's not in the same p2p connection


// The peer.Brontide that has access to this link will
// call RemoveLink which will stop this link. Returning
// here ensures no more channel updates can occur.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an info log is needed here.

htlcswitch/link.go Outdated Show resolved Hide resolved
htlcswitch/link.go Outdated Show resolved Hide resolved
// We've received a Shutdown message from the remote peer.
// We'll set the shutdownReceived bool and cancel back any new
// HTLC's received after this point instead of forwarding them.
if l.shutdownReceived {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the specs only say the sender MUST NOT send multiple shutdown messages., but the receiver side is not defined.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This defines it for LND here. If an implementation is sending multiple shutdown messages, they have no guarantees on what can happen. So to keep things simple, we just fail here


// shutdownInit is a bool that is set when we've initiated a coop
// close.
shutdownInit bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we thought about giving the link a state and advancing its states? That probably could make things a bit easier and more abstracted.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should do that in a different PR

@Crypt-iQ Crypt-iQ force-pushed the coop_close_970 branch 3 times, most recently from 4c3c164 to d94c709 Compare January 17, 2023 19:09
@Crypt-iQ
Copy link
Collaborator Author

I think I addressed most of the comments and I split up the commits.

Copy link
Member

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed lots of change! Halfway there, send out my comments now in case they get lost.

rpcserver.go Outdated
@@ -4233,9 +4233,10 @@ func createRPCOpenChannel(r *rpcServer, dbChannel *channeldb.OpenChannel,
channel.PushAmountSat = uint64(amt)
}

if len(dbChannel.LocalShutdownScript) > 0 {
if len(dbChannel.GetLocalShutdownScript()) > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could use a variable here to avoid calling GetLocalShutdownScript twice.

}

// Process the peer's ClosingSigned.
msgs, closeFin, err := c.ProcessCloseMsg(c.peerClosingSigned, true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can just return c.ProcessCloseMsg here.

lnwallet/chancloser/chancloser.go Show resolved Hide resolved
lnwallet/chancloser/chancloser.go Show resolved Hide resolved
lnwallet/chancloser/chancloser.go Show resolved Hide resolved
peer/brontide.go Outdated
chanCloser, err := p.createChanCloser(
lnChan, deliveryScript, feePerKw, nil, locallyInitiated,
lnChan, deliveryScript, feePerKw, nil, locallyInitiated, true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we could call chanCloser.ChannelClean() here to avoid having yet another variable cleanOnRecv?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ChannelClean has to be called at a specific point - when we have processed our local Shutdown and right after we receive the peer's Shutdown. This is only used if restarting the coop close flow and ChannelClean was called in a previous iteration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but this is the only place where cleanOnRecv is true?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it's the only place. I'll have to think if we can remove it by instead checking the channel's internal states


// channelClean is true if the peer.Brontide has signaled that the
// channel is no longer operational and is in a clean state.
channelClean bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't fully comprehended the meaning of this channelClean state. To me it seems it's closely tied to ChanStatusCoopBroadcasted since it will only be set when MarkCoopBroadcasted is called, and I guess we can use that instead.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one scenario in which ChanStatusCoopBroadcasted is true but channelClean is not true - on a restart. We can get around this, but I'd rather keep the bool for now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i assume that scenario exists because we have the field channelClean in the first place and it's default to empty value false...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that's why

peer/brontide.go Show resolved Hide resolved
return
}

// Otherwise, set the shutdownReceived bool.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "don't allow a second shutdown message" logic has already been implemented in chancloser and I think we should avoid duplicating it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done to stop the link rather than have the chancloser stop the link

htlcswitch/linktestscoop Outdated Show resolved Hide resolved
Copy link
Collaborator

@carlaKC carlaKC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done my best to review, but I'm don't think I'm familiar enough with this part of the codebase to approve a ~3k LOC change.

lnwallet/channel.go Show resolved Hide resolved
channeldb/channel.go Outdated Show resolved Hide resolved
channeldb/channel.go Outdated Show resolved Hide resolved
lnwallet/chancloser/chancloser.go Show resolved Hide resolved
server.go Outdated
@@ -644,7 +644,8 @@ func newServer(cfg *Config, listenAddrs []net.Addr,
return
}

peer.HandleLocalCloseChanReqs(request)
serverQuit := make(chan struct{})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to pass in a quit channel if we never close use it?

if err != nil {
l.fail(LinkFailureError{
code: ErrInternalError,
}, format, err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This format/error is just used by l.fail to log the error down the line. Seems more typical to just return an error from trySendingShutdown (can wrap the error if desired) then add some generic formatting string here if we want it.


// Sleep again to ensure the ChannelArbitrator can advance to
// StateFullyResolved.
time.Sleep(10 * time.Second)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a wait / predicate until the closed channel shows up in Alice + Bob's close channel lists?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a predicate here, but for the other sleep I'm not sure it's possible. We need to start the coop close flow without settling too early because we want to test the behavior of trying to coop close while the HTLC is in-flight. I'm not aware of any RPC signals we can use to determine when Shutdown has been sent. If there were, we'd wait to send and receive Shutdown, and then call SettleInvoice.

Copy link
Member

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished my first round. Left some questions and will do another round once answered!

if l.isCoopReady() {
l.cfg.NotifyCoopReady(l.ChannelPoint(), l.quit)

// The peer.Brontide that has access to this link will
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is no long true as the link is now shutdown here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link isn't actually stopped here, only the htlcManager goroutine is closed. We could instead set a boolean to make it spin in a for loop. The peer.Brontide will call beginClosingSigned which will call RemoveLink which will call link.Stop and remove it from the Switch's maps

htlcswitch/link.go Show resolved Hide resolved
// Mark that we've sent the Shutdown message before actually sending
// it. This is also used for retransmission to know if we need to send
// a Shutdown on restart.
err := l.channel.State().MarkShutdownSent()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the purpose is to survive from a restart, I think we need to mark it after we've successfully sent it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we did this, we could crash after we've sent, but before we've marked it. When we come back up, we wouldn't retransmit it and we would be in violation of the spec

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm...but since we haven't marked it we'd resend it again when we come back up?

htlcswitch/link.go Outdated Show resolved Hide resolved
htlcswitch/link.go Outdated Show resolved Hide resolved
c.cfg.ChainParams,
); err != nil {
return nil, false, err
// Check that we're not receiving two Shutdowns.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there be a case where the node sends the Shutdown, stops, and resends it again? A broader question is, duplicate msgs seem to be unavoidable, in gossip we just ignore them, can we ignore the shutdown too?

// Set the fee baseline now that we know the remote's delivery script.
c.initFeeBaseline()

c.channelClean = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean if the node restarts, these empemeral states are lost.

peer/brontide.go Show resolved Hide resolved
peer/brontide.go Outdated
chanCloser, err := p.createChanCloser(
lnChan, deliveryScript, feePerKw, nil, locallyInitiated,
lnChan, deliveryScript, feePerKw, nil, locallyInitiated, true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but this is the only place where cleanOnRecv is true?


// channelClean is true if the peer.Brontide has signaled that the
// channel is no longer operational and is in a clean state.
channelClean bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i assume that scenario exists because we have the field channelClean in the first place and it's default to empty value false...

Currently, if upfront shutdown is not used, the delivery script
isn't persisted. This leads to a spec violation upon retransmission
of Shutdown because LND will send a different delivery script. This
change paves the way for persisting the delivery script so that LND
is not in violation of the spec here.
This commit changes the ChanCloser logic in several ways:
* ChanCloser no longer returns a Shutdown that the caller can send
  * This is so that a caller that manages ChanCloser can determine
    when the Shutdown can be sent. The caller will then pass the
    Shutdown to the ChanCloser before they send it out.
* ProcessCloseMsg now takes a boolean called remote that denotes
  whether the passed message is from the peer or from us.
  * This is used in the aforementioned case where the caller passes
    the Shutdown message directly to the ChanCloser instead of the
    ChanCloser generating the Shutdown message and returning it.
* A new method called ChannelClean is introduced.
  * This can be used by a caller to tell the ChanCloser when fee
    negotiation can begin. At this point, the ChanCloser may return
    a ClosingSigned message to send.
  * This is mostly to change the lifecycle of the ChanCloser so that
    the ChanCloser doesn't immediately advance to the
    closeFeeNegotiation step until it's appropriate.
* Various sanity checks are added to ensure that proper state
  transitions are occurring.
When cleanOnRecv is set, the ChanCloser will immediately call
ChannelClean upon receiving Shutdown from the peer. This is only
used when we are restarting the coop close flow and the channel is
already in a "clean" state. Specifically, this occurs when the
channel has ChanStatusCoopBroadcasted but hasn't completed fee
negotiation. We can call ChannelClean because the channel MUST be
clean at this point.
This commit contains two changes:
* HandleLocalCloseChanReqs now takes a quit channel as a precautionary
  measure to avoid deadlock and listens on it.
* A function called HandleCoopReady is introduced that is used by a
  caller to notify the peer code when a channel's state is considered
  "clean" and fee negotiation can begin. Internally, this will call
  ChannelClean on the associated ChanCloser and fee negotiation will
  begin. This function also takes a quit channel as a precautionary
  measure to avoid deadlock.
* Test code in brontide_test.go is also modified to use the exported
  functions instead of accessing the Brontide struct's local variables.
  One test is not modified because it will be removed in a later commit.
This commit passes any Shutdown messages to the associated ChannelLink.
Note that the Shutdown message is still passed along the chanCloseMsgs
channel and will be given to the ChanCloser for processing. If the
call to ProcessCloseMsg in peer/brontide.go fails due to validation,
the coop close process won't happen, but the link will continue to
behave as if it will happen. This will become more clear in a later
commit, but this is fine because the link will gracefully shutdown and
be removed from the HtlcSwitch.

Note also that the test-only hasReceivedShutdown function is not used
until a later commit which adds link tests.
This commit changes several things in the htlcswitch, chancloser, and
peer packages.
* The ChanCloser's ShutdownChan method has been removed.
* The first change is that the ShutdownIfChannelClean function has
  been removed from the ChannelLink interface. Its functionality has
  largely been replaced by the NotifyShouldShutdown function. This
  function will be used to notify the link when it should begin to
  wind down the link and attempt to send Shutdown after there are no
  more pending updates.
* The second change is that some local channelLink variables are added.
  These are populated when the NotifyShouldShutdown function is called.
  The localCloseReq variable stores the passed *ChanClose so that we can
  refer to its fields later when we actually send the Shutdown message.
  The shutdownInit bool is so that the channelLink knows to enter the
  Shutdown phase when there are no pending updates.
* The third change is that chooseDeliveryScript has been moved from
  peer/brontide.go to htlcswitch/link.go. This means that the link now
  checks whether the local user has violated upfront shutdown by putting
  an invalid shutdown script in the close request. This change also
  necessitated the move of ErrUpfrontShutdownScriptMismatch from
  lnwallet/chancloser/chancloser.go to htlcswitch/link.go.
* The fourth change is the ChannelLinkConfig now contains a DeliveryAddr
  variable that is populated by the link-creation function in
  peer/brontide.go. This is used in a future commit so that when
  upfront shutdown isn't used, the link still has a script to send in
  the Shutdown message.
* The fifth change is that the tryLinkShutdown function has been removed
  from peer/brontide.go. Since the link will still be up during the
  shutdown "phase" of coop close, it doesn't make sense to attempt to
  tear down the link. A future change will ensure that the link is only
  stopped when the link is in a "clean" state.
* The sixth change is that the handleLocalCloseReq function in
  peer/brontide.go is modified to account for future logic. If the
  ChanCloser already exists in the activeChanCloses map, we will call
  the newly-introduced ChanCloser.SetLocalScript. This is needed because
  if the peer sent Shutdown first, the Brontide code will create a new
  ChanCloser with a freshly generated script via genDeliveryScript.
  However, this script may not match what either 1) the user requested
  or 2) the upfront shutdown script. In this case, it's necessary to
  "override" the generated script with the script from ChanClose
  parameter.
* TestChooseDeliveryScript has been moved to htlcswitch/link_test.go.
* TestCustomShutdownScript has been temporarily deleted and will be
  added to the htlcswitch test suite in a later commit.
This commit allows the channelLink to send out a Shutdown message
when there are no pending local updates and either the local user
or remote peer has initiated the coop close flow.
* The ChannelLinkConfig has a new notification function called
  NotifySendingShutdown which will notify the peer/brontide.go code
  that the channelLink is about to send the Shutdown message to the
  remote peer. This is useful because it lets us advance the
  associated ChanCloser's state machine first. If we instead called
  this function *after* sending Shutdown to the remote peer, it
  would be possible that the remote peer sends us a ClosingSigned
  message when the ChanCloser for this channel wasn't ready for it.
  This ordering ensures that no races like this can occur.
* The ChannelLinkConfig has another new notification function called
  NotifyCoopReady. This is called when the link has both sent and
  received Shutdown and when the commitment state is completely
  "clean" of any updates. This function tells the peer/brontide.go
  code to tear down the link and begin fee negotiation.
* The main change is that the htlcManager goroutine now includes
  checks to see whether it can send Shutdown and, if it has already
  sent Shutdown, if it can start to wind down the link and clean up
  the htlcManager goroutine.
  * If the channelLink is 1) in the "shutdown phase", 2) hasn't yet
    sent Shutdown, and 3) has no pending updates, then Shutdown is
    sent.
  * If a local upfront shutdown script exists, that will be used in
    the Shutdown message. If one doesn't exist, the ChannelLinkConfig's
    DeliveryAddr variable will be persisted to the database and used
    in the Shutdown message. It is persisted before sending Shutdown
    so that if a restart occurs, the same script will be sent upon
    retransmission. This is a requirement of BOLT#02. The retransmission
    logic is in a later commit.
  * After the script has been chosen, MarkShutdownSent is called.
    This sets a boolean in the database and ensures that we will
    send a Shutdown on reconnect. This is a requirement of BOLT#02.
  * A *ChanClose is populated and passed to NotifySendingShutdown, which
    makes sure the related ChanCloser advances state before the
    Shutdown is actually sent.
  * After the above steps are complete, the Shutdown message is
    finally sent to the remote peer.
* Various tests are added which exercise different Shutdown-related
  scenarios.
This commit allows the link-construction function addLink in
peer/brontide.go to set a boolean RetransmitShutdown to true. When
this is true, the link will retransmit Shutdown using the persisted
delivery script. This happens right after the ChannelReestablish
dance. When the peer/brontide.go sets this boolean to true, it creates
a ChanCloser and feeds an identical Shutdown message to the ChanCloser
so that it can advance state. A link-level test is included that
exercises the retransmission logic.
This commit modifies the link so that if 1) the user has initiated
Shutdown (shutdownInit is set), or 2) we have received Shutdown from the
remote peer, the link will fail back any HTLCs from the HtlcSwitch. This
is so that the link can come to a stopping point and wind down. A test
is included that exercises this logic.
This commit makes the link fail and send an Error message if the peer
sends an HTLC after sending a Shutdown message. Doing so is a spec
violation, so no spec-compliant peer should do this. A unit test is
added that exercises this logic.
This changes the HtlcSwitch to use NotifyShouldShutdown instead of the
now-deleted LocalChannelClose function.
This check is no longer necessary since we are able to initiate the
cooperative close flow while we have HTLCs on the commitment
transaction.
Copy link
Member

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting close! Post a few comments first in case they get lost while I'm still reviewing it now.

@@ -705,6 +705,15 @@ func (c *ChanCloser) ProcessCloseMsg(msg lnwire.Message, remote bool) (
chancloserLog.Infof("ChannelPoint(%v): entering fee "+
"negotiation", c.chanPoint)

if remote && c.isChannelClean() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit message doesn't match the change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, we should have already called this (in the peer) since we receive a shutdown message that prompts the chan closer to be created in the first place.

// We've received a Shutdown message from the remote peer.
// We'll set the shutdownReceived bool and cancel back any new
// HTLC's received after this point instead of forwarding them.
if l.shutdownReceived.Load() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use l.hasReceivedShutdown()

}
err = fmt.Errorf("unable to process close msg: %w",
err)
peerLog.Error(err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to use p.log so we know which peer failed from the log.

// provided, generate a fresh script.
if len(deliveryScript) == 0 {
deliveryScript, err = p.genDeliveryScript()
chanCloser, ok := p.activeChanCloses[chanID]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally we'd have unit tests for this, but I don't think we have unit tests for handleLocalCloseReq in the first place so nvm in this pr...

// generates the script
// - the Shutdown that the link sends will use a
// different script.
chanCloser.SetLocalScript(req.DeliveryScript)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: how does link know this updated script?

// Mark that we've sent the Shutdown message before actually sending
// it. This is also used for retransmission to know if we need to send
// a Shutdown on restart.
err := l.channel.State().MarkShutdownSent()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm...but since we haven't marked it we'd resend it again when we come back up?

// Notify peer.Brontide that we've sent the Shutdown message so that
// the coop close state machine can advance.
closeReq := l.localCloseReq
if closeReq == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The closeReq could only be nil if we are retransmitting the shutdown right?

// receives the Shutdown. This happens before the peer can reply with a
// ClosingSigned because we haven't sent Shutdown to the peer at this
// point.
l.cfg.NotifySendingShutdown(closeReq, l.quit)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This creates a goroutine as NotifySendingShutdown -> handleLocalCloseReq -> ProcessCloseMsg, which marks the channel as disabled and sends a channel update. I'm not sure about the implications here, just wondering whether we should wait for NotifySendingShutdown to be finished first.

n.aliceChannelLink.cfg.RetransmitShutdown = true

err = n.start()
require.NoError(t, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we need to check l.shutdownInit == true to test if it's retransmitted.

// If we need to retransmit Shutdown, create a ChanCloser and give it
// the Shutdown message.
retransmitShutdown := lnChan.State().HasSentShutdown()
if retransmitShutdown {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do NotifySendingShutdown in link instead?

Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catching up with this PR again, stopped a bit short as I don't agree that the link should grow to understand more details about the shutdown procedure. This expands the set of responsibilities of the link beyond channel state machine driving and payment forwarding. This also makes it harder to test the shutdown specific aspects, as we now have another sub-system that has gained responsibilities critical to proper opertion of the shutdown.

Instead, I think that we should keep minimal knowledge we have in the link. Aside from the assertions re message ordering, the main things we want from this are:

  • The shutdown procedure restarts if we shutdown before things were finalized.
    • Instead of the link transmitting (which means we need the link alive for channels that need to resume a co-op close). The channel state machine can do that.
  • The shutdown request (from RPC) should be acknowledged, with a persistent write to disk that we should shut down the channel. Once the coast is clear, shutdown should start.
    • Rather than the link taking on this responsibility (including sending the message), it should just take the request, then start to cancel back all HTLCs. Once no HTLCs are present (and also the link is marked as not eligible to forward), then it should respond back to the channel close state machine to kick things off.

Overall, I think by not introducing link into the critical flow, and by keeping the existing linear state transition in the state machine, we can simplify a lot of this logic, thereby making it easier to review and test.

return err
}

channel.LocalShutdownScript = deliveryScript
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about also persisting the last offer we sent over? Np, in doing in another PR (should make an issue for).

@@ -818,6 +822,10 @@ type OpenChannel struct {
// default ShortChannelID. This is only set for zero-conf channels.
confirmedScid lnwire.ShortChannelID

// sentShutdown denotes whether or not we've sent the Shutdown message
// for this channel.
sentShutdown bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can alternatively store a nested TLV here that can be extended to store the: delivery script sent, fee range, last offer, etc.

// PersistDeliveryScript is used during the cooperative close flow to persist
// a script sent in Shutdown when we did not set an upfront shutdown script
// during the funding flow.
func (c *OpenChannel) PersistDeliveryScript(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't see this called in this commit fwiw.


// peerClosingSigned stores a single ClosingSigned that is received
// while the channel is still not in a clean state.
peerClosingSigned *lnwire.ClosingSigned
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this would only ever be sent once we're in a "clean" slate, here "clean" is defined as us having sent a shutdown message (so it's clean and we're ready for the other party).

// Set the fee baseline now that we know the remote's delivery script.
c.initFeeBaseline()

// Before continuing, mark the channel as cooperatively closed with a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So then all the link stuff is handled at another layer?

Comment on lines +315 to +323
// DeliveryAddr is the address to use in Shutdown if an upfront script
// has not been set.
DeliveryAddr lnwire.DeliveryAddress
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with above comment, the link has already been created well before we even know what the upfront shutdown addr is. Generally I don't think we should add more logic to the link (which already has a lot of critical operations), and instead we can use the existing goroutine in the peer to manage the shutdown state.

@@ -413,6 +426,21 @@ type channelLink struct {
quit chan struct{}
}

// chanCloseCtx contains a ChanClose pointer as well as a buffered error chan.
type chanCloseCtx struct {
closeReq *ChanClose
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeh we already have localCloseReq stored above?

// Determine if upfront shutdown has been violated. If
// it hasn't, replace the DeliveryAddr with the
// user-provided one.
deliveryAddr, err := chooseDeliveryScript(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should stay in the chan closer imo. Co-op close is distinct from normal channel operation as it starts after normal channel operation has ceased.

}

if len(deliveryAddr) != 0 {
l.cfg.DeliveryAddr = deliveryAddr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think this should be here, but if it's just stashing, then it can live in the channelLink struct instead of the main config (this isn't externally set).

// subsystem that Shutdown is about to be sent. This is not used during
// retransmission. It takes a quit chan that is used as a precaution to
// avoid deadlock.
NotifySendingShutdown func(req *ChanClose, quit chan struct{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this flow should be reversed still: the link should gain an async way (which already exists) to notify the chan closer state machine that it can enter the shutdown flow.

@Crypt-iQ
Copy link
Collaborator Author

Crypt-iQ commented Feb 16, 2023

  * Instead of the link transmitting (which means we need the link alive for channels that need to resume a co-op close). The channel state machine can do that.

The link isn't always alive when resuming coop close - see

msgs = append(msgs, shutdownMsg)

  * Rather than the link taking on this responsibility (including sending the message), it should just take the request, then start to cancel back all HTLCs. Once no HTLCs are present (and also the link is marked as not eligible to forward), then it should respond back to the channel close state machine to kick things off.

The coop close flow allows HTLCs to be present on the commitment transactions during shutdown. They just need to be off before closing_signed. Waiting for all HTLCs to be cleared before sending shutdown may end up stalling coop close for a long time because the peer is never notified that you want to initiate a coop close. This means the peer isn't subject to this BOLT#02 requirement

* once there are no outstanding updates on the peer, UNLESS it has already sent a shutdown:
    * MUST reply to a shutdown message with a shutdown

However, if we wait for our offered HTLCs to be resolved, that should be fine.

Overall, I think by not introducing link into the critical flow, and by keeping the existing linear state transition in the state machine, we can simplify a lot of this logic, thereby making it easier to review and test.

I went down the other path suggested when I first started writing this and the issue I found was that it was not possible given the async nature of both the goroutine managing the ChanCloser and the goroutine managing the link (htlcManager) for the proper message ordering to be enforced.

  1. If a peer sends shutdown followed by update_add_htlc, this is disallowed by the spec. However, it's possible that update_add_htlc reaches the link before some shutdown signal from the ChanCloser reaches the link. This means that LND will sometimes accept this message ordering even though it is disallowed. We can choose to be lenient here.
  2. If a peer sends update_add_htlc followed by shutdown, this is allowed by the spec. It's possible that some shutdown "signal" from the ChanCloser reaches the link before the update_add_htlc. If the link is strict, it will fail the channel and force close even though the peer was spec-compliant.

Points 1 & 2 are addressed in the current code by ensuring that the link handles shutdown by taking it from the mailbox.

To summarize, the alternative approach needs the following:

  • Wait for our offered HTLCs to be resolved before sending shutdown, even though we can send shutdown almost immediately. This is straightforward.
    • This allows the link to signal to the ChanCloser when to send shutdown
  • Figure out the ChanCloser/link async interaction with message order (points 1 & 2 above). This is not straightforward.

@lightninglabs-deploy
Copy link

@Crypt-iQ, remember to re-request review from reviewers when ready

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
channel closing Related to the closing of channels cooperatively and uncooperatively p2p Code related to the peer-to-peer behaviour spec
Projects
None yet
6 participants