lnd: implement "safe mode" node stand up #3287

Roasbeef · 2019-07-10T02:33:42Z

Although we now have the proper base set of tools in place (SCB) to enable nodes to safely reclaim their channels in the event of data loss, it's still possible that a node boots up with stale data. If this is the case, then the node is at risk of breaching the other channel peer inadvertently. Due to systems like the contractcourt which will automatically force close channels that have expired HTLCs, this can happen in an automated fashion on restarts.

Rather than resuming normal operation if one knows they may be restoring with an out dated state, we can instead implement a "safe mode" of sorts. When users boot up in this mode, all commitment broadcasts are forbidden. Once lnd has booted up, the user can then examine the set of channel states to see if they become borked once we connect to peers (indicative of local data loss).

Steps To Completion

Add new --safemode config parameter to the lnd binary.
If safe mode is enabled, reject all RPC level force close requests.
If safe mode is enabled, reject all automated force close requests by channel arbitrators.

The text was updated successfully, but these errors were encountered:

mlerner · 2019-07-28T04:13:34Z

I am going to take a shot at working on this.

alexbosworth · 2019-10-15T13:38:38Z

It would be nice to have a flag on force close to do a "dangerous" force close

mlerner · 2019-10-15T16:43:05Z

@alexbosworth: Is the case in which that would be helpful when you want to make sure that no other channels are force-closed besides ones you specifically choose? I could see that being useful.

What do you think about adding a confirmation message in the case of a "dangerous force close" command received over RPC (I'm not sure if there is a precedent for that type of user interaction)? Also, I'm assuming that automated force closing requests would still be prohibited.

Alternatively, one could argue that "safe mode" should prevent a user from performing dangerous operations, and that the user should restart lnd without "safe mode" on if they want to do dangerous things like force close channels with a node in an outdated state - part of the goal of this feature is to allow a user to start lnd when they "know it is in an outdated state".

alexbosworth · 2019-10-23T13:25:37Z

The use case I am specifically thinking about is one where you have an out of date backup and you want to use it to recover funds

I'm not sure if this is handled in this PR, but blanket banning force closes seems risky to me in the event of race conditions relating to HTLC resolution

So where I would see using this is:

User has an out of date backup
They load their out of date backup into safe mode
They recover as much as they can in safe mode, knowing that they are protected from breaching
When that is finished, they decide for themselves the risk of force closing with unresponsive peers, hopefully after a long period of time of no connectivity

halseth · 2019-10-31T09:34:36Z

Should possibly also reject channels state updates.

cfromknecht · 2019-10-31T17:29:54Z

Should possibly also reject channels state updates.

This is already done for restored channels

(sorry, tabbed to Close and Comment lol)

halseth · 2019-11-01T07:53:23Z

Should possibly also reject channels state updates.

This is already done for restored channels

Yeah, but in this case regular channels won't be marked borked/restored, so they can still have updates.

cfromknecht · 2019-11-02T20:25:47Z

why wouldn't they? isn't this supposed to be used after restoring w/ SCB?

Crypt-iQ · 2019-11-04T00:33:56Z

It's a little unclear when it's possible to leave "safe mode" and resume normal operation. If our node has a bad state, contacts the other peer to initiate a force close, and then leaves safe mode before the peer's force close tx is confirmed, it's possible for our node to force close.

Also I agree with @alexbosworth above that if all force closes are banned, there could be some legitimate, synced channels which need to be force closed but aren't.

halseth · 2019-11-07T09:24:40Z

why wouldn't they? isn't this supposed to be used after restoring w/ SCB?

If you are restoring from SCBs then I don't think safe mode is necessary, since you don't have any toxic data.

Crypt-iQ · 2019-11-07T12:54:20Z

why wouldn't they? isn't this supposed to be used after restoring w/ SCB?

If you are restoring from SCBs then I don't think safe mode is necessary, since you don't have any toxic data.

Yup we reviewed this PR in the lnd review club and conner also suggested maybe disabling several features like no bootstrap, no graph sync, no channel acceptance in addition to no force closures

ziggie1984 · 2024-02-01T10:28:12Z

I think the users don't always know whether they are in a old state, so I wonder if it makes sense to delay the channel_arbitrator actions like e.g. going on chain for an expired HTLC but instead at least wait for the peer connection to build up, because their a wrong state of the channel would cause our peer to Force-Close the channel avoiding probably that we will go onchain with the wrong state.

Roasbeef · 2024-02-01T20:48:01Z

@ziggie1984 good point. One way would be to have a mode to start in safe mode (so ppl could do it all the time), then later on check an endpoint to see if any actions would've' been executed, then allow an API call to upgrade to regular operation.

so I wonder if it makes sense to delay the channel_arbitrator actions like e.g. going on chain for an expired HTLC but instead at least wait for the peer connection to build up

FWIW, this would be the opposite of what was suggested in: #8166

I think a middle ground could make sense though. Need to think about it further.

morehouse · 2024-03-28T19:09:38Z

Perhaps safe mode could be automatically enabled on startup if the node is more than X blocks behind the chain. The more blocks behind, the more likely the DB is out-of-date.

This would allow #8166 DoS protections in the crash and restart case, while a node that's been offline for a while will use safe mode until upgraded.

mlerner mentioned this issue Oct 15, 2019

lnd: implement "safe mode" node stand up #3601

Closed

3 tasks

cfromknecht closed this as completed Oct 31, 2019

cfromknecht reopened this Oct 31, 2019

Roasbeef added this to the 0.10.0 milestone Jan 14, 2020

Roasbeef added the v0.10 label Jan 17, 2020

Roasbeef modified the milestones: 0.10.0, 0.11.0 Mar 10, 2020

cfromknecht added v0.11 and removed v0.10 labels Apr 21, 2020

Roasbeef removed this from the 0.11.0 milestone May 14, 2020

Roasbeef added this to the 0.12.0 milestone Jun 9, 2020

Roasbeef modified the milestones: 0.12.0, 0.13.0 Nov 4, 2020

Roasbeef removed this from the 0.13.0 milestone Jan 20, 2021

yyforyongyu removed the v0.11 label Feb 27, 2023

morehouse mentioned this issue Mar 29, 2024

[hardening]: synchronously handle on-chain events during startup #8166

Open

Roasbeef mentioned this issue Apr 1, 2024

[bug]: data loss and penalty txs after doing reboot -f while lnd had been stopped 30 minutes earlier #8607

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lnd: implement "safe mode" node stand up #3287

lnd: implement "safe mode" node stand up #3287

Roasbeef commented Jul 10, 2019

mlerner commented Jul 28, 2019

alexbosworth commented Oct 15, 2019

mlerner commented Oct 15, 2019

alexbosworth commented Oct 23, 2019

halseth commented Oct 31, 2019

cfromknecht commented Oct 31, 2019 •

edited

Loading

halseth commented Nov 1, 2019

cfromknecht commented Nov 2, 2019

Crypt-iQ commented Nov 4, 2019

halseth commented Nov 7, 2019

Crypt-iQ commented Nov 7, 2019

ziggie1984 commented Feb 1, 2024

Roasbeef commented Feb 1, 2024

morehouse commented Mar 28, 2024

lnd: implement "safe mode" node stand up #3287

lnd: implement "safe mode" node stand up #3287

Comments

Roasbeef commented Jul 10, 2019

Steps To Completion

mlerner commented Jul 28, 2019

alexbosworth commented Oct 15, 2019

mlerner commented Oct 15, 2019

alexbosworth commented Oct 23, 2019

halseth commented Oct 31, 2019

cfromknecht commented Oct 31, 2019 • edited Loading

halseth commented Nov 1, 2019

cfromknecht commented Nov 2, 2019

Crypt-iQ commented Nov 4, 2019

halseth commented Nov 7, 2019

Crypt-iQ commented Nov 7, 2019

ziggie1984 commented Feb 1, 2024

Roasbeef commented Feb 1, 2024

morehouse commented Mar 28, 2024

cfromknecht commented Oct 31, 2019 •

edited

Loading