Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lnd: implement "safe mode" node stand up #3287

Open
3 tasks
Roasbeef opened this issue Jul 10, 2019 · 14 comments
Open
3 tasks

lnd: implement "safe mode" node stand up #3287

Roasbeef opened this issue Jul 10, 2019 · 14 comments
Labels
commitments Commitment transactions containing the state of the channel intermediate Issues suitable for developers moderately familiar with the codebase and LN P3 might get fixed, nice to have safety General label for issues/PRs related to the safety of using the software

Comments

@Roasbeef
Copy link
Member

Although we now have the proper base set of tools in place (SCB) to enable nodes to safely reclaim their channels in the event of data loss, it's still possible that a node boots up with stale data. If this is the case, then the node is at risk of breaching the other channel peer inadvertently. Due to systems like the contractcourt which will automatically force close channels that have expired HTLCs, this can happen in an automated fashion on restarts.

Rather than resuming normal operation if one knows they may be restoring with an out dated state, we can instead implement a "safe mode" of sorts. When users boot up in this mode, all commitment broadcasts are forbidden. Once lnd has booted up, the user can then examine the set of channel states to see if they become borked once we connect to peers (indicative of local data loss).

Steps To Completion

  • Add new --safemode config parameter to the lnd binary.

  • If safe mode is enabled, reject all RPC level force close requests.

  • If safe mode is enabled, reject all automated force close requests by channel arbitrators.

@Roasbeef Roasbeef added intermediate Issues suitable for developers moderately familiar with the codebase and LN commitments Commitment transactions containing the state of the channel safety General label for issues/PRs related to the safety of using the software P3 might get fixed, nice to have labels Jul 10, 2019
@mlerner
Copy link
Contributor

mlerner commented Jul 28, 2019

I am going to take a shot at working on this.

@alexbosworth
Copy link
Contributor

It would be nice to have a flag on force close to do a "dangerous" force close

@mlerner
Copy link
Contributor

mlerner commented Oct 15, 2019

@alexbosworth: Is the case in which that would be helpful when you want to make sure that no other channels are force-closed besides ones you specifically choose? I could see that being useful.

What do you think about adding a confirmation message in the case of a "dangerous force close" command received over RPC (I'm not sure if there is a precedent for that type of user interaction)? Also, I'm assuming that automated force closing requests would still be prohibited.

Alternatively, one could argue that "safe mode" should prevent a user from performing dangerous operations, and that the user should restart lnd without "safe mode" on if they want to do dangerous things like force close channels with a node in an outdated state - part of the goal of this feature is to allow a user to start lnd when they "know it is in an outdated state".

@alexbosworth
Copy link
Contributor

The use case I am specifically thinking about is one where you have an out of date backup and you want to use it to recover funds

I'm not sure if this is handled in this PR, but blanket banning force closes seems risky to me in the event of race conditions relating to HTLC resolution

So where I would see using this is:

  1. User has an out of date backup
  2. They load their out of date backup into safe mode
  3. They recover as much as they can in safe mode, knowing that they are protected from breaching
  4. When that is finished, they decide for themselves the risk of force closing with unresponsive peers, hopefully after a long period of time of no connectivity

@halseth
Copy link
Contributor

halseth commented Oct 31, 2019

Should possibly also reject channels state updates.

@cfromknecht
Copy link
Contributor

cfromknecht commented Oct 31, 2019

Should possibly also reject channels state updates.

This is already done for restored channels

(sorry, tabbed to Close and Comment lol)

@halseth
Copy link
Contributor

halseth commented Nov 1, 2019

Should possibly also reject channels state updates.

This is already done for restored channels

Yeah, but in this case regular channels won't be marked borked/restored, so they can still have updates.

@cfromknecht
Copy link
Contributor

why wouldn't they? isn't this supposed to be used after restoring w/ SCB?

@Crypt-iQ
Copy link
Collaborator

Crypt-iQ commented Nov 4, 2019

It's a little unclear when it's possible to leave "safe mode" and resume normal operation. If our node has a bad state, contacts the other peer to initiate a force close, and then leaves safe mode before the peer's force close tx is confirmed, it's possible for our node to force close.

Also I agree with @alexbosworth above that if all force closes are banned, there could be some legitimate, synced channels which need to be force closed but aren't.

@halseth
Copy link
Contributor

halseth commented Nov 7, 2019

why wouldn't they? isn't this supposed to be used after restoring w/ SCB?

If you are restoring from SCBs then I don't think safe mode is necessary, since you don't have any toxic data.

@Crypt-iQ
Copy link
Collaborator

Crypt-iQ commented Nov 7, 2019

why wouldn't they? isn't this supposed to be used after restoring w/ SCB?

If you are restoring from SCBs then I don't think safe mode is necessary, since you don't have any toxic data.

Yup we reviewed this PR in the lnd review club and conner also suggested maybe disabling several features like no bootstrap, no graph sync, no channel acceptance in addition to no force closures

@Roasbeef Roasbeef added this to the 0.10.0 milestone Jan 14, 2020
@Roasbeef Roasbeef added the v0.10 label Jan 17, 2020
@Roasbeef Roasbeef modified the milestones: 0.10.0, 0.11.0 Mar 10, 2020
@cfromknecht cfromknecht added v0.11 and removed v0.10 labels Apr 21, 2020
@Roasbeef Roasbeef removed this from the 0.11.0 milestone May 14, 2020
@Roasbeef Roasbeef added this to the 0.12.0 milestone Jun 9, 2020
@Roasbeef Roasbeef modified the milestones: 0.12.0, 0.13.0 Nov 4, 2020
@Roasbeef Roasbeef removed this from the 0.13.0 milestone Jan 20, 2021
@yyforyongyu yyforyongyu removed the v0.11 label Feb 27, 2023
@ziggie1984
Copy link
Collaborator

I think the users don't always know whether they are in a old state, so I wonder if it makes sense to delay the channel_arbitrator actions like e.g. going on chain for an expired HTLC but instead at least wait for the peer connection to build up, because their a wrong state of the channel would cause our peer to Force-Close the channel avoiding probably that we will go onchain with the wrong state.

@Roasbeef
Copy link
Member Author

Roasbeef commented Feb 1, 2024

@ziggie1984 good point. One way would be to have a mode to start in safe mode (so ppl could do it all the time), then later on check an endpoint to see if any actions would've' been executed, then allow an API call to upgrade to regular operation.

so I wonder if it makes sense to delay the channel_arbitrator actions like e.g. going on chain for an expired HTLC but instead at least wait for the peer connection to build up

FWIW, this would be the opposite of what was suggested in: #8166

I think a middle ground could make sense though. Need to think about it further.

@morehouse
Copy link
Collaborator

Perhaps safe mode could be automatically enabled on startup if the node is more than X blocks behind the chain. The more blocks behind, the more likely the DB is out-of-date.

This would allow #8166 DoS protections in the crash and restart case, while a node that's been offline for a while will use safe mode until upgraded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
commitments Commitment transactions containing the state of the channel intermediate Issues suitable for developers moderately familiar with the codebase and LN P3 might get fixed, nice to have safety General label for issues/PRs related to the safety of using the software
Projects
None yet
Development

No branches or pull requests

9 participants