-
Notifications
You must be signed in to change notification settings - Fork 906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zero conf channel stuck in CHANNELD_AWAITING_LOCKIN sometimes: Owning subdaemon channeld died #5808
Comments
I had this issue on macOS. CLN was running in a directory that was not indexed by Spotlight. Putting back file indexing on the directory CLN was running in seems to have resolved the problem for me. So I guess the channeld daemon dying had something to do with unindexed file access. |
Channeld dying is a badly phrased log that stems from us losing the connection with the peer. This case is strange since you seem to be controlling both endpoints. Maybe this is related to the FD mixup we are hunting down on MacOS @ddustin ? |
It appears to happen more often when spotlight decides to index files, mds_stores being actively consuming a lot of CPU and disk reads/writes. I don't know whether that helps. I think it must be a race somewhere. Not necessarily the log line, but the fact that the channel is stuck in CHANNELD_AWAITING_LOCKIN. |
It definitely rhymes
^ this makes me think it's related. Are you able to recreate the problem easily? It would be nice to see if changing some constants effects how often it fails. In subd.c around line 602 change ie. Remove the WNOHANG In openingd.c around line 1557 change
to
In dualopen.c around line 4076 change
to
Are you opening a legacy v1 channel with this? Would be nice to know if the problem happens for both dualopen and the legacy open. |
I can't exactly reproduce the issue in any controlled way. I can only say 'it happens sometimes'. It does appear that when it does happen, it happens quite often in that period. I'm using cln v22.11 currently
I'm not sure whether that's a legacy v1 channel? Is it? There's some timing options set: I'll change those values in |
Cool 👍. I wonder if you can recreate it more easily if you put the CPU under heavy load while it's running |
@ddustin it doesn't help. Currently running v22.11.1 with those changes you suggested. Here's a case with a 'normal' channel open: Accepting side
Opening side
Just a wild guess, but could it have something to do with that block coming in? |
@ddustin Would it be an idea to use the old debugging technique |
I think this is the right place to report. I have the same situation and I've read docs for
These levels are actually below the purging threshold. Therefore my conclusion is that CLN has totally unreliable feerate estimation algo and this issue is a result of inability to correctly determine even minimal feerates. |
Issue and Steps to Reproduce
There is a CLN v22.11 node that opens a zero conf channel to another CLN v22.11 node. When running my integration tests on regtest, regularly the opened channel is stuck in CHANNELD_AWAITING_LOCKIN. The most relevant log line is:
2022-12-10T20:46:28.159Z INFO 036cf43e0ede57d72dfeabd0795fdc5f76244b396c9f19333e2ed1d89fcce90894-chan#2: Peer transient failure in CHANNELD_AWAITING_LOCKIN: channeld: Owning subdaemon channeld died (0)
There appears to be no recovery from the dying channeld subdaemon at this point.
Here's the relevant logs:
Failing flow
Channel funder
Accepting peer
Succesful flow
Channel funder
Accepting peer
The text was updated successfully, but these errors were encountered: