server: start htlcswitch early in the pipeline #6214

yyforyongyu · 2022-01-30T04:26:05Z

After a recent change, a consistent failure has been showing in the test revoked_uncooperative_close_retribution_zero_value_remote_output from the build with backend=bitcoind dbbackend=postgres, lnd node was stuck at starting channel arbitrator. Here's what could happen.

Suppose a node starts with a channel breached with a state StateContractClosed,

chain arbitrator starts -> channel arbitrator starts
when starting channel arbitrator, we advance the channel state
in that advanceState, when we begin with StateContractClosed, it will send msg to htlcswitch via c.cfg.DeliverResolutionMsg
htlcswitch hasn't started yet, causing the above function never returns, and the node will be stuck

This PR fixes it by starting htlcswitch early. By checking the dependencies of htlcswitch.Start(), it makes a few db queries and must be started after Notifier since it needs to subscribe new blocks. There's also a callsite on s.chanRouter via htlcswitch.Start() -> s.ForwardPackets -> s.cfg.FetchLastChannelUpdate -> s.chanRouter.GetChannelByID. However I think this doesn't require s.chanRouter to be started since it's just querying the db.

To end the starting dependecy issue, I think we need to rely on checking states from disk and not from other subservers.

Crypt-iQ

Pretty sure we can also fix #2998 at the same time by handling StateContractClosed explicitly here:

lnd/contractcourt/channel_arbitrator.go

Lines 465 to 470 in 60625b6

    
           switch c.state { 
        
           case StateDefault: 
        
           	fallthrough 
        
           case StateBroadcastCommit: 
        
           	fallthrough 
        
           case StateCommitmentBroadcasted:

edit: can confirm the above suggestion with this pr fixes the resolutionmsg issue. this is because if the proper trigger is set, the HtlcFailNowActions get recalculated (usually they are not for the chainTrigger), so even if lnd goes down during the hand-off, the hand-off will occur again on start-up and the switch can handle duplicates

server.go

Crypt-iQ · 2022-02-01T15:22:29Z

My previous comment is a little wrong: this won't fully fix #2998 but my suggestion does make it better. The switch doesn't persist them, so if they are held in a mailbox after the hand-off and lnd crashes, the switch won't get them again. The hand-off could be changed so that they're fully handled before returning but out of scope here.

yyforyongyu · 2022-02-07T09:13:14Z

Pretty sure we can also fix #2998 at the same time by handling StateContractClosed explicitly here:

You mean we just fallthrough here for StateContractClosed?

Crypt-iQ · 2022-02-07T16:14:25Z

Pretty sure we can also fix #2998 at the same time by handling StateContractClosed explicitly here:

You mean we just fallthrough here for StateContractClosed?

I think it can be left to another pr that fixes the issue i mentioned

Crypt-iQ

nit: third commit message could be changed, but code is good

server.go

Also unified the log messages.

lightninglabs-deploy · 2022-02-18T14:02:36Z

@Roasbeef: review reminder

Roasbeef

LGTM 👑

yyforyongyu requested review from Roasbeef and Crypt-iQ January 30, 2022 04:26

yyforyongyu force-pushed the server-start-order branch from 7035c5c to ffe7800 Compare January 30, 2022 05:57

yyforyongyu mentioned this pull request Jan 30, 2022

itest: fix previously known test flakes #5940

Closed

Crypt-iQ reviewed Jan 31, 2022

View reviewed changes

server.go Outdated Show resolved Hide resolved

joostjager mentioned this pull request Jan 31, 2022

lnrpc,routing: add an always on mode to HTLC interceptor #6165

Closed

yyforyongyu force-pushed the server-start-order branch from ffe7800 to 89149b8 Compare February 7, 2022 09:08

yyforyongyu force-pushed the server-start-order branch from 89149b8 to 6555564 Compare February 8, 2022 07:42

Crypt-iQ approved these changes Feb 8, 2022

View reviewed changes

server.go Outdated Show resolved Hide resolved

yyforyongyu force-pushed the server-start-order branch from 6555564 to 3862fe1 Compare February 8, 2022 18:10

yyforyongyu added 3 commits February 11, 2022 21:17

multi: add logs when subservers are starting

1ad6bbf

Also unified the log messages.

funding: fix make lint

1aaa1d8

server: start htlcSwitch before chainArb

72548ea

yyforyongyu force-pushed the server-start-order branch from 3862fe1 to 72548ea Compare February 11, 2022 13:28

Crypt-iQ mentioned this pull request Feb 11, 2022

htlcswitch+lntest: create resolutionStore to persist ResolutionMsg #6250

Merged

Roasbeef added this to the v0.15.0 milestone Feb 24, 2022

Roasbeef approved these changes Feb 24, 2022

View reviewed changes

Roasbeef enabled auto-merge February 24, 2022 23:31

Roasbeef disabled auto-merge February 24, 2022 23:32

Roasbeef merged commit 10fba3d into lightningnetwork:master Feb 24, 2022

yyforyongyu deleted the server-start-order branch February 25, 2022 05:43

Crypt-iQ mentioned this pull request Jun 13, 2022

LND v0.14.2-beta fails to startup completely #6638

Closed

yyforyongyu mentioned this pull request Nov 21, 2023

[hardening]: synchronously handle on-chain events during startup #8166

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: start htlcswitch early in the pipeline #6214

server: start htlcswitch early in the pipeline #6214

yyforyongyu commented Jan 30, 2022

Crypt-iQ left a comment •

edited

Loading

Crypt-iQ commented Feb 1, 2022

yyforyongyu commented Feb 7, 2022

Crypt-iQ commented Feb 7, 2022

Crypt-iQ left a comment

lightninglabs-deploy commented Feb 18, 2022

Roasbeef left a comment

	switch c.state {
	case StateDefault:
	fallthrough
	case StateBroadcastCommit:
	fallthrough
	case StateCommitmentBroadcasted:

server: start htlcswitch early in the pipeline #6214

server: start htlcswitch early in the pipeline #6214

Conversation

yyforyongyu commented Jan 30, 2022

Crypt-iQ left a comment • edited Loading

Choose a reason for hiding this comment

Crypt-iQ commented Feb 1, 2022

yyforyongyu commented Feb 7, 2022

Crypt-iQ commented Feb 7, 2022

Crypt-iQ left a comment

Choose a reason for hiding this comment

lightninglabs-deploy commented Feb 18, 2022

Roasbeef left a comment

Choose a reason for hiding this comment

Crypt-iQ left a comment •

edited

Loading