-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: start htlcswitch early in the pipeline #6214
server: start htlcswitch early in the pipeline #6214
Conversation
7035c5c
to
ffe7800
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure we can also fix #2998 at the same time by handling StateContractClosed
explicitly here:
lnd/contractcourt/channel_arbitrator.go
Lines 465 to 470 in 60625b6
switch c.state { | |
case StateDefault: | |
fallthrough | |
case StateBroadcastCommit: | |
fallthrough | |
case StateCommitmentBroadcasted: |
edit: can confirm the above suggestion with this pr fixes the resolutionmsg issue. this is because if the proper trigger is set, the HtlcFailNowActions get recalculated (usually they are not for the chainTrigger), so even if lnd goes down during the hand-off, the hand-off will occur again on start-up and the switch can handle duplicates
My previous comment is a little wrong: this won't fully fix #2998 but my suggestion does make it better. The switch doesn't persist them, so if they are held in a mailbox after the hand-off and lnd crashes, the switch won't get them again. The hand-off could be changed so that they're fully handled before returning but out of scope here. |
ffe7800
to
89149b8
Compare
You mean we just fallthrough here for |
I think it can be left to another pr that fixes the issue i mentioned |
89149b8
to
6555564
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: third commit message could be changed, but code is good
6555564
to
3862fe1
Compare
Also unified the log messages.
3862fe1
to
72548ea
Compare
@Roasbeef: review reminder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👑
After a recent change, a consistent failure has been showing in the test
revoked_uncooperative_close_retribution_zero_value_remote_output
from the build withbackend=bitcoind dbbackend=postgres
,lnd
node was stuck at starting channel arbitrator. Here's what could happen.Suppose a node starts with a channel breached with a state
StateContractClosed
,advanceState
, when we begin withStateContractClosed
, it will send msg tohtlcswitch
viac.cfg.DeliverResolutionMsg
htlcswitch
hasn't started yet, causing the above function never returns, and the node will be stuckThis PR fixes it by starting
htlcswitch
early. By checking the dependencies ofhtlcswitch.Start()
, it makes a few db queries and must be started afterNotifier
since it needs to subscribe new blocks. There's also a callsite ons.chanRouter
viahtlcswitch.Start() -> s.ForwardPackets -> s.cfg.FetchLastChannelUpdate -> s.chanRouter.GetChannelByID
. However I think this doesn't requires.chanRouter
to be started since it's just querying the db.To end the starting dependecy issue, I think we need to rely on checking states from disk and not from other subservers.