-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Postgres v13.3 or v14.1 backend failing with LND v0.14.1-beta #6103
Comments
I can reproduce this too. Will take a look. |
I may have found the issue. A context is cancelled after exiting the function: lnd/kvdb/postgres/readwrite_tx.go Line 163 in 088970e
But I believe that the subsequent Workaround for now is to set timeout to 0. |
I still have doubts about the timeouts. It is really quite complicated, all the different levels of timeouts. And we also use it in an anti-pattern where we use a global context. See also #5366 (comment) My proposal for fixing would be to remove the timeout config flag completely. The issue mentioned above can be fixed without too much trouble, but it seems safer to not do timeouts at all. There is no good way to recover from a timeout anyway. |
Nevermind I see it's related to the timeout setting |
@joostjager so currently it's recommended to set timeout to 0 if we want to test with pg? or should we hold tight until issue is fixed before continuing to test? |
Verified removing "db.postgres.timeout" configuration results in LND not crashing. |
@guggero what's your thought on the timeout issue? |
The problem isn't the timeout itself but the lnd/kvdb/postgres/readwrite_tx.go Line 163 in 088970e
If you look at how
What we instead need to do is to keep a list of all |
That is indeed what I tried to explain in my comment above, "a context is cancelled after exiting the function". Just setting timeout to zero prevents the problem from occurring because in that case A list of all cancels on the tx level will probably solve it, but an alternative that may have less impact is to require a context to be passed into 'our' A third option could be to fold However, don't you think that the timeout setting in general can cause more harm than good? Users setting it to a value that is too low and seeing their node stop or worst case reach an inconsistent state because these error paths are relatively new? |
Ah yes, we were talking about the same thing then. I like the first option where the I'm not sure about whether we should have the timeout in general or not. Perhaps it's nice to have for now while we're still figuring out how to best run with Postgres but don't need it in the future? But it seems it's hard to even choose a value that makes sense if the graph cache already takes multiple minutes to load... |
Ok, will create a pr to pull up the context to the next level. I also saw that it is only used a few times, so no big deal. The graph cache may take minutes to load, but that won't lead to a timeout because each operation within that loading completes in time. There is no timeout on the transaction itself. There is a gap there though, because afaik we also don't set a timeout on the transaction lnd/kvdb/postgres/readwrite_tx.go Line 42 in 088970e
Yes, we can think a bit longer about the timeout question. Maybe we can advise against setting it to a value for now. |
Background
I'm able to successfully run a node with bbolt/boltdb.
When I try to run a node with postgres, after doing a
lncli create
I get an error "Unable to process chain backend block connected notification: context canceled".Note it does successfully create the tables before crashing:
Can reproduce this with clean environments. Verified Bitcoind node, postgres db is in working order.
Your environment
lnd
v0.14.1-beta
uname -a
on *Nix)ubuntu 20.04
btcd
,bitcoind
, or other backendbitcoind v22.0
postgres version 13.3
Steps to reproduce
Expected behaviour
No crash
Actual behaviour
crash
[ERR] LNWL: Unable to process chain backend block connected notification: context canceled
v13.3or "error loading chain control: unable to create chain control: unable to create wallet: context canceled" v14.1
crash `
Logs when I use bbolt/botdb
Logs when using postgres v13.3
Logs when using postgres v14.1
The text was updated successfully, but these errors were encountered: