Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New channel doesn't become active without a disconnect #5366

Closed
openoms opened this issue Jul 1, 2022 · 13 comments · Fixed by #5261
Closed

New channel doesn't become active without a disconnect #5366

openoms opened this issue Jul 1, 2022 · 13 comments · Fixed by #5261

Comments

@openoms
Copy link
Contributor

openoms commented Jul 1, 2022

CLN 0.11.2 on aarch64 Debian

Opened a channel to a working, online LND node.
There were no errors with bitcoind or lightningd.
Still after 16 confirmations the state was CHANNELD_AWAITING_LOCKIN

A simple forced disconnect solved it:
lightning-cli disconnect NODE_ID true

Logs:

INFO    NODE_ID-chan#82: Peer transient failure in CHANNELD_AWAITING_LOCKIN: disconnect command
INFO    NODE_ID-chan#82: State changed from CHANNELD_AWAITING_LOCKIN to CHANNELD_NORMAL

The channel should switch to CHANNELD_NORMAL automatically.

Discussed in:
https://t.me/lightningd/14165

Another user experiencing the same:
https://t.me/lightningd/14132

@zerofeerouting
Copy link
Contributor

zerofeerouting commented Jul 1, 2022

It seems I am having the same issue:

2022-07-01T10:35:55.331Z INFO   [peer-pubkey]-channeld-chan#16792: Peer connection lost
2022-07-01T10:35:55.332Z INFO    [peer-pubkey]-chan#16792: Peer transient failure in CHANNELD_NORMAL: channeld: Owning subdaemon channeld died (62208)

This is what the LND peer's log say:

2022-07-01 12:29:37.247 [INF] PEER: unable to read message from 038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5@[144.76.24.71:9735](http://144.76.24.71:9735/): read tcp [192.168.255.10:43282](http://192.168.255.10:43282/)->[144.76.24.71:9735](http://144.76.24.71:9735/): use of closed network connection
2022-07-01 12:29:37.353 [INF] SRVR: Established connection to: 038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5@[144.76.24.71:9735](http://144.76.24.71:9735/)
2022-07-01 12:29:37.353 [INF] SRVR: Finalizing connection to 038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5@[144.76.24.71:9735](http://144.76.24.71:9735/), inbound=false
2022-07-01 12:29:37.399 [INF] PEER: NodeKey(038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5) loading ChannelPoint([txid:outnum])
2022-07-01 12:29:37.399 [INF] PEER: Negotiated chan series queries with 038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5
2022-07-01 12:29:37.400 [INF] DISC: Creating new GossipSyncer for peer=038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5
2022-07-01 12:29:37.448 [INF] DISC: Removing GossipSyncer for peer=038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5
2022-07-01 12:29:57.244 [WRN] RPCS: Gossip syncer for peer=038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5 not found
2022-07-01 12:30:18.886 [WRN] RPCS: Gossip syncer for peer=038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5 not found
2022-07-01 12:30:39.219 [WRN] RPCS: Gossip syncer for peer=038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5 not found
2022-07-01 12:31:00.375 [WRN] RPCS: Gossip syncer for peer=038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5 not found
2022-07-01 12:31:21.257 [WRN] RPCS: Gossip syncer for peer=038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5 not found

UPDATE - according to my node the channel was opened and in CHANNELD_NORMAL, but for the remote side it was still pending. So my report might be an unrelated LND issue. The channel became active for the counterparty after restarting LND.

@TrezorHannes
Copy link

Logging this here for completion.

The disconnect from the CLN Node didn't have any effects. LND gossip syncer still hung up on a closed network connection

2022-07-01 12:50:45.188 [INF] SRVR: Disconnecting from 038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5@144.76.24.71:9735
2022-07-01 12:50:45.188 [INF] PEER: disconnecting 038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5@144.76.24.71:9735, reason: server: disconnecting peer 038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5@144.76.24.71:9735
2022-07-01 12:50:45.188 [INF] PEER: unable to read message from 038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5@144.76.24.71:9735: read tcp 192.168.255.10:38824->144.76.24.71:9735: use of closed network connection
2022-07-01 12:50:45.263 [INF] SRVR: Established connection to: 038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5@144.76.24.71:9735
2022-07-01 12:50:45.263 [INF] SRVR: Finalizing connection to 038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5@144.76.24.71:9735, inbound=false
2022-07-01 12:50:45.288 [INF] PEER: NodeKey(038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5) loading ChannelPoint(bf10d408150927b999d99c84b9bbcbfd4a7158df3f2ca2c1c6c80b2b938e99a9:1)
2022-07-01 12:50:45.288 [INF] PEER: Negotiated chan series queries with 038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5
2022-07-01 12:50:45.288 [INF] DISC: Creating new GossipSyncer for peer=038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5
2022-07-01 12:50:45.389 [INF] DISC: Removing GossipSyncer for peer=038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5
2022-07-01 12:50:56.217 [WRN] RPCS: Gossip syncer for peer=038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5 not found
2022-07-01 12:51:17.219 [WRN] RPCS: Gossip syncer for peer=038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5 not found
2022-07-01 12:51:38.164 [WRN] RPCS: Gossip syncer for peer=038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5 not found
2022-07-01 12:51:39.615 [INF] SRVR: Disconnecting from 038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5@144.76.24.71:9735
2022-07-01 12:51:39.615 [INF] PEER: disconnecting 038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5@144.76.24.71:9735, reason: server: disconnecting peer 038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5@144.76.24.71:9735
2022-07-01 12:51:39.615 [INF] PEER: unable to read message from 038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5@144.76.24.71:9735: read tcp 192.168.255.10:41696->144.76.24.71:9735: use of closed network connection
2022-07-01 12:51:39.684 [INF] SRVR: Established connection to: 038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5@144.76.24.71:9735
2022-07-01 12:51:39.684 [INF] SRVR: Finalizing connection to 038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5@144.76.24.71:9735, inbound=false
2022-07-01 12:51:39.713 [INF] PEER: NodeKey(038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5) loading ChannelPoint(bf10d408150927b999d99c84b9bbcbfd4a7158df3f2ca2c1c6c80b2b938e99a9:1)
2022-07-01 12:51:39.714 [INF] PEER: Negotiated chan series queries with 038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5
2022-07-01 12:51:39.714 [INF] DISC: Creating new GossipSyncer for peer=038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5
2022-07-01 12:51:39.816 [INF] DISC: Removing GossipSyncer for peer=038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5
2022-07-01 12:51:43.608 [WRN] RPCS: Gossip syncer for peer=038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5 not found
2022-07-01 12:51:58.244 [WRN] RPCS: Gossip syncer for peer=038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5 not found
2022-07-01 12:52:19.860 [WRN] RPCS: Gossip syncer for peer=038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5 not found

channelinfo didn't show any issue like outdated timestamps on last_update (see my other CLN peer channel issue logged here with LND, so not sure what was causing my gossip to get stuck

lncli getchaninfo 817099867229519873
{
    "channel_id": "817099867229519873",
    "chan_point": "bf10d408150927b999d99c84b9bbcbfd4a7158df3f2ca2c1c6c80b2b938e99a9:1",
    "last_update": 1656669849,
    "node1_pub": "037f66e84e38fc2787d578599dfe1fcb7b71f9de4fb1e453c5ab85c05f5ce8c2e3",
    "node2_pub": "038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5",
    "capacity": "20000000",
    "node1_policy": {
        "time_lock_delta": 40,
        "min_htlc": "1000",
        "fee_base_msat": "1000",
        "fee_rate_milli_msat": "500",
        "disabled": false,
        "max_htlc_msat": "20000000000",
        "last_update": 1656669849
    },
    "node2_policy": {
        "time_lock_delta": 34,
        "min_htlc": "1",
        "fee_base_msat": "0",
        "fee_rate_milli_msat": "0",
        "disabled": false,
        "max_htlc_msat": "19800000000",
        "last_update": 1656669574
    }
}

I had to restart LND to get the gossipSyncer into a healthy state

2022-07-01 12:54:20.779 [INF] SRVR: Finalizing connection to 038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5@192.168.255.1:33322, inbound=true
2022-07-01 12:54:20.790 [INF] PEER: NodeKey(038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5) loading ChannelPoint(bf10d408150927b999d99c84b9bbcbfd4a7158df3f2ca2c1c6c80b2b938e99a9:1)
2022-07-01 12:54:20.797 [INF] PEER: disconnecting 038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5@192.168.255.1:33322, reason: server: disconnecting peer 038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5@192.168.255.1:33322
2022-07-01 12:54:20.835 [INF] SRVR: Finalizing connection to 038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5@192.168.255.1:40432, inbound=true
2022-07-01 12:54:20.840 [INF] PEER: NodeKey(038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5) loading ChannelPoint(bf10d408150927b999d99c84b9bbcbfd4a7158df3f2ca2c1c6c80b2b938e99a9:1)
2022-07-01 12:54:20.889 [INF] PEER: Negotiated chan series queries with 038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5
2022-07-01 12:54:20.889 [INF] DISC: Creating new GossipSyncer for peer=038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5
2022-07-01 12:54:20.909 [INF] PEER: Negotiated chan series queries with 038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5
2022-07-01 12:54:22.026 [INF] DISC: GossipSyncer(038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5): applying gossipFilter(start=2022-07-01 12:54:22.026420125 +0200 CEST m=+80.239078896, end=2158-08-07 19:22:37.026420125 +0200 CEST)

@cdecker
Copy link
Member

cdecker commented Jul 1, 2022

Any debug logs from the CLN node? Specifically we're looking for peer_out FUNDING_LOCKED and/or peer_in FUNDING_LOCKED which signals the transition from awaiting lockin to nomal, and only once both sides have sent and received it we can switch state.

@zerofeerouting
Copy link
Contributor

Unfortunately my loglevel is INFO at the moment. Maybe @openoms can be of help.

@wtogami
Copy link
Contributor

wtogami commented Jul 2, 2022

#5349
If you're trying to get debug logs from one peer in particular without the firehose of all peers this PR allows you to specify just one peer pubkey for log-level=debug:.

@cdecker
Copy link
Member

cdecker commented Jul 3, 2022

The problem here is that selectively enabling logs for one peer would require that we know ahead of time that the issue might occur, which we don't, so I don't see how per-peer log level selection helps here, especially since this might also happen while the operator is sleeping, e.g., for a slow confirmation or an incoming funding request, so how could they enable logging on that? I think the firehose method is the one that'll ultimately get us an example.

@openoms
Copy link
Contributor Author

openoms commented Jul 4, 2022

I have set the logs to io and opened a channel on testnet to my LND v0.15.0. Could not reproduce the problem.
Got "state": "CHANNELD_NORMAL", after looking in 9 confirmations.

@devastgh
Copy link

devastgh commented Jul 4, 2022

I have set the logs to io and opened a channel on testnet to my LND v0.15.0. Could not reproduce the problem. Got "state": "CHANNELD_NORMAL", after looking in 9 confirmations.

It's totally random for me too. Sometimes it happens.

@btweenthebars
Copy link

I thought this should already been fixed in 0.12 but I have a node initiated channel to me and it’s sitting in "CHANNELD_AWAITING_LOCKIN:Sent reestablish, waiting for theirs” for a long time , but as soon as force disconnected / reconnected, it immediately locked in.

Maybe this is issue is still lurking around and could become the root cause of other issues like stale connection leading to force close.

@JssDWt
Copy link
Contributor

JssDWt commented Dec 12, 2022

Having this issue regularly on regtest here: #5808

@olwe1
Copy link

olwe1 commented Jan 11, 2023

I have the same issue but the disconnect/connect trick didn't change the state CHANNELD_AWAITING_LOCKIN.
I also tried to restart CLN without success. I'm using Core Lightning v22.11.1.

EDIT: It worked after the third time trying to disconnect from peer.

@cdecker
Copy link
Member

cdecker commented Jan 13, 2023

Notice that the issue here manifests when we are waiting on the channel_ready message (formerly funding_locked) from the peer. If that peer does not send the message on reconnect we won't progress to the CHANNELD_NORMAL state. So it's not just our local behavior that causes this, but also the peer's behavior.

@Sjors
Copy link
Contributor

Sjors commented Jun 26, 2024

I opened a channel today with v24.05, no idea what node the other side is running (my guess is LND), which remained stuck in CHANNELD_AWAITING_LOCKIN after confirmation. The disconnect and reconnect trick above fixed it.

Every time a new block came in the log showed:

2024-06-26T16:38:26.293Z DEBUG   ...-chan#13945: Funding tx ... depth 6 of 3
2024-06-26T16:38:26.293Z DEBUG   ...-chan#13945: Sending towire_channeld_funding_depth with channel state CHANNELD_AWAITING_LOCKIN

At the three confirmation mark I don't see anything interesting (within grep -n3 distance):

DEBUG   lightningd: Got depth change 2->3 for ...
DEBUG   ...-chan#13945: Funding tx ... depth 3 of 3
DEBUG   ...-chan#13945: Sending towire_channeld_funding_depth with channel state CHANNELD_AWAITING_LOCKIN
DEBUG   lightningd: Feerate estimate for 2 blocks set to 6138 (was 6137)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants