Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto reconnect fails with some peers #5887

Closed
rkfg opened this issue Oct 24, 2021 · 8 comments · Fixed by #5925
Closed

Auto reconnect fails with some peers #5887

rkfg opened this issue Oct 24, 2021 · 8 comments · Fixed by #5925
Assignees
Labels
networking p2p Code related to the peer-to-peer behaviour

Comments

@rkfg
Copy link

rkfg commented Oct 24, 2021

Background

After restart lnd doesn't connect to some nodes. I use hybrid mode so Tor shouldn't be used for these. The channels are stuck in reconnect loop with little to no information. However, if I manually issue the connect command (using lncli or RTL) they connect successfully and stay online until the next lnd restart. Maybe there are some significant differences between the auto reconnect loop and the manual one? Could be related to #5632

Your environment

  • version of lnd: healthcheck/v1.2.0-1-gcac8da819
  • which operating system (uname -a on *Nix): Raspbian 5.10.63-v7l+
  • version of btcd, bitcoind, or other backend: bitcoind v22.0

Steps to reproduce

Here's the log (there might be some unrelated messages for different channels here, the node lnd has trouble connecting to is 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2, the channel point is 4c53746b9e98fe6979583c391d5a027ec27071bb9baebba4dcb47da67544be7e:1):

Oct 24 15:49:05 fullnode lnd[6231]: 2021-10-24 15:49:05.630 [INF] SRVR: Established connection to: 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2@104.196.200.39:9735
Oct 24 15:49:05 fullnode lnd[6231]: 2021-10-24 15:49:05.630 [INF] SRVR: Finalizing connection to 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2@104.196.200.39:9735, inbound=false
Oct 24 15:49:05 fullnode lnd[6231]: 2021-10-24 15:49:05.808 [INF] PEER: NodeKey(026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2) loading ChannelPoint(4c53746b9e98fe6979583c391d5a027ec27071bb9baebba4dcb47da67544be7e:1)
Oct 24 15:49:05 fullnode lnd[6231]: 2021-10-24 15:49:05.809 [INF] HSWC: Removing channel link with ChannelID(7ebe4475a67db4dca4bbae9bbb7170c27e025a1d393c587969fe989e6b74534d)
Oct 24 15:49:05 fullnode lnd[6231]: 2021-10-24 15:49:05.809 [INF] HSWC: ChannelLink(4c53746b9e98fe6979583c391d5a027ec27071bb9baebba4dcb47da67544be7e:1): starting
Oct 24 15:49:05 fullnode lnd[6231]: 2021-10-24 15:49:05.809 [INF] HSWC: Trimming open circuits for chan_id=692594:688:1, start_htlc_id=2091
Oct 24 15:49:05 fullnode lnd[6231]: 2021-10-24 15:49:05.809 [INF] HSWC: Adding live link chan_id=7ebe4475a67db4dca4bbae9bbb7170c27e025a1d393c587969fe989e6b74534d, short_chan_id=692594:688:1
Oct 24 15:49:05 fullnode lnd[6231]: 2021-10-24 15:49:05.809 [INF] NTFN: New block epoch subscription
Oct 24 15:49:05 fullnode lnd[6231]: 2021-10-24 15:49:05.810 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(4c53746b9e98fe6979583c391d5a027ec27071bb9baebba4dcb47da67544be7e:1)
Oct 24 15:49:05 fullnode lnd[6231]: 2021-10-24 15:49:05.810 [INF] HSWC: ChannelLink(4c53746b9e98fe6979583c391d5a027ec27071bb9baebba4dcb47da67544be7e:1): HTLC manager started, bandwidth=2001896044 mSAT
Oct 24 15:49:05 fullnode lnd[6231]: 2021-10-24 15:49:05.810 [INF] HSWC: ChannelLink(4c53746b9e98fe6979583c391d5a027ec27071bb9baebba4dcb47da67544be7e:1): attempting to re-synchronize
Oct 24 15:49:05 fullnode lnd[6231]: 2021-10-24 15:49:05.810 [INF] PEER: Negotiated chan series queries with 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2
Oct 24 15:49:05 fullnode lnd[6231]: 2021-10-24 15:49:05.810 [INF] DISC: Creating new GossipSyncer for peer=026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2
Oct 24 15:49:05 fullnode lnd[6231]: 2021-10-24 15:49:05.812 [INF] HSWC: ChannelLink(4c53746b9e98fe6979583c391d5a027ec27071bb9baebba4dcb47da67544be7e:1): received re-establishment message from remote side
Oct 24 15:49:06 fullnode lnd[6231]: 2021-10-24 15:49:06.381 [INF] PEER: unable to read message from 03a09f56bba3d2c200cc55eda2f1f069564a97c1fb74345e1560e2868a8ab3d7d0@62.171.139.240:9735: EOF
Oct 24 15:49:06 fullnode lnd[6231]: 2021-10-24 15:49:06.381 [INF] PEER: disconnecting 03a09f56bba3d2c200cc55eda2f1f069564a97c1fb74345e1560e2868a8ab3d7d0@62.171.139.240:9735, reason: read handler closed
Oct 24 15:49:06 fullnode lnd[6231]: 2021-10-24 15:49:06.583 [INF] DISC: Removing GossipSyncer for peer=03a09f56bba3d2c200cc55eda2f1f069564a97c1fb74345e1560e2868a8ab3d7d0
Oct 24 15:49:06 fullnode lnd[6231]: 2021-10-24 15:49:06.583 [INF] HSWC: Removing channel link with ChannelID(4125fbc74e89754752b3365899eea7fa77ea80e12780f2a9c88677e9ee04651d)
Oct 24 15:49:06 fullnode lnd[6231]: 2021-10-24 15:49:06.583 [INF] HSWC: ChannelLink(1c6504eee97786c8a9f28027e180ea77faa7ee995836b3524775894ec7fb2541:1): stopping
Oct 24 15:49:06 fullnode lnd[6231]: 2021-10-24 15:49:06.584 [INF] HSWC: ChannelLink(1c6504eee97786c8a9f28027e180ea77faa7ee995836b3524775894ec7fb2541:1): exited
Oct 24 15:49:07 fullnode lnd[6231]: 2021-10-24 15:49:07.689 [INF] SRVR: Established connection to: 03271338633d2d37b285dae4df40b413d8c6c791fbee7797bc5dc70812196d7d5c@3.95.117.200:9735
Oct 24 15:49:07 fullnode lnd[6231]: 2021-10-24 15:49:07.690 [INF] SRVR: Finalizing connection to 03271338633d2d37b285dae4df40b413d8c6c791fbee7797bc5dc70812196d7d5c@3.95.117.200:9735, inbound=false
Oct 24 15:49:07 fullnode lnd[6231]: 2021-10-24 15:49:07.710 [INF] PEER: unable to read message from 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2@104.196.200.39:9735: EOF
Oct 24 15:49:07 fullnode lnd[6231]: 2021-10-24 15:49:07.710 [INF] PEER: disconnecting 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2@104.196.200.39:9735, reason: read handler closed

After issuing lncli connect 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2@104.196.200.39:9735:

Oct 24 15:56:45 fullnode lnd[6231]: 2021-10-24 15:56:45.355 [INF] SRVR: Established connection to: 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2@104.196.200.39:9735
Oct 24 15:56:45 fullnode lnd[6231]: 2021-10-24 15:56:45.355 [INF] SRVR: Finalizing connection to 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2@104.196.200.39:9735, inbound=false
Oct 24 15:56:45 fullnode lnd[6231]: 2021-10-24 15:56:45.522 [INF] PEER: NodeKey(026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2) loading ChannelPoint(4c53746b9e98fe6979583c391d5a027ec27071bb9baebba4dcb47da67544be7e:1)
Oct 24 15:56:45 fullnode lnd[6231]: 2021-10-24 15:56:45.523 [INF] HSWC: Removing channel link with ChannelID(7ebe4475a67db4dca4bbae9bbb7170c27e025a1d393c587969fe989e6b74534d)
Oct 24 15:56:45 fullnode lnd[6231]: 2021-10-24 15:56:45.523 [INF] HSWC: ChannelLink(4c53746b9e98fe6979583c391d5a027ec27071bb9baebba4dcb47da67544be7e:1): starting
Oct 24 15:56:45 fullnode lnd[6231]: 2021-10-24 15:56:45.524 [INF] HSWC: Trimming open circuits for chan_id=692594:688:1, start_htlc_id=2091
Oct 24 15:56:45 fullnode lnd[6231]: 2021-10-24 15:56:45.524 [INF] HSWC: Adding live link chan_id=7ebe4475a67db4dca4bbae9bbb7170c27e025a1d393c587969fe989e6b74534d, short_chan_id=692594:688:1
Oct 24 15:56:45 fullnode lnd[6231]: 2021-10-24 15:56:45.524 [INF] HSWC: ChannelLink(4c53746b9e98fe6979583c391d5a027ec27071bb9baebba4dcb47da67544be7e:1): HTLC manager started, bandwidth=2001896044 mSAT
Oct 24 15:56:45 fullnode lnd[6231]: 2021-10-24 15:56:45.524 [INF] NTFN: New block epoch subscription
Oct 24 15:56:45 fullnode lnd[6231]: 2021-10-24 15:56:45.524 [INF] HSWC: ChannelLink(4c53746b9e98fe6979583c391d5a027ec27071bb9baebba4dcb47da67544be7e:1): attempting to re-synchronize
Oct 24 15:56:45 fullnode lnd[6231]: 2021-10-24 15:56:45.524 [INF] PEER: Negotiated chan series queries with 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2
Oct 24 15:56:45 fullnode lnd[6231]: 2021-10-24 15:56:45.525 [INF] DISC: Creating new GossipSyncer for peer=026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2
Oct 24 15:56:45 fullnode lnd[6231]: 2021-10-24 15:56:45.524 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(4c53746b9e98fe6979583c391d5a027ec27071bb9baebba4dcb47da67544be7e:1)
Oct 24 15:56:45 fullnode lnd[6231]: 2021-10-24 15:56:45.526 [INF] HSWC: ChannelLink(4c53746b9e98fe6979583c391d5a027ec27071bb9baebba4dcb47da67544be7e:1): received re-establishment message from remote side

After that the peer is connected and stays connected. There are 4-5 peers like that that I'm connected to. The majority of them (about 40 peers) work fine. However, if it's possible to connect to them reliably at all, just not automatically, it means the issue is still on our side.

Expected behaviour

lnd should connect by itself with no manual intervention.

Actual behaviour

I have to connect manually to those peers.

@Roasbeef Roasbeef added networking p2p Code related to the peer-to-peer behaviour labels Oct 25, 2021
@rkfg
Copy link
Author

rkfg commented Oct 28, 2021

Some more info after increasing debug level for SRVR and PEER to debug:

Oct 28 13:56:52 fullnode lnd[4723]: 2021-10-28 13:56:52.369 [DBG] SRVR: Attempting to re-establish persistent connection to peer 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2
Oct 28 13:56:52 fullnode lnd[4723]: 2021-10-28 13:56:52.698 [INF] SRVR: Established connection to: 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2@104.196.200.39:9735
Oct 28 13:56:52 fullnode lnd[4723]: 2021-10-28 13:56:52.699 [INF] SRVR: Finalizing connection to 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2@104.196.200.39:9735, inbound=false
Oct 28 13:56:52 fullnode lnd[4723]: 2021-10-28 13:56:52.702 [DBG] PEER: Sending Init to 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2@104.196.200.39:9735
Oct 28 13:56:52 fullnode lnd[4723]: 2021-10-28 13:56:52.857 [DBG] PEER: Received Init from 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2@104.196.200.39:9735
Oct 28 13:56:52 fullnode lnd[4723]: 2021-10-28 13:56:52.857 [DBG] PEER: Loaded 1 active channels from database with NodeKey(026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2)
Oct 28 13:56:52 fullnode lnd[4723]: 2021-10-28 13:56:52.873 [INF] PEER: NodeKey(026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2) loading ChannelPoint(4c53746b9e98fe6979583c391d5a027ec27071bb9baebba4dcb47da67544be7e:1)
Oct 28 13:56:52 fullnode lnd[4723]: 2021-10-28 13:56:52.875 [INF] HSWC: Removing channel link with ChannelID(7ebe4475a67db4dca4bbae9bbb7170c27e025a1d393c587969fe989e6b74534d)
Oct 28 13:56:52 fullnode lnd[4723]: 2021-10-28 13:56:52.875 [INF] HSWC: ChannelLink(4c53746b9e98fe6979583c391d5a027ec27071bb9baebba4dcb47da67544be7e:1): starting
Oct 28 13:56:52 fullnode lnd[4723]: 2021-10-28 13:56:52.875 [INF] HSWC: Trimming open circuits for chan_id=692594:688:1, start_htlc_id=2147
Oct 28 13:56:52 fullnode lnd[4723]: 2021-10-28 13:56:52.875 [INF] HSWC: Adding live link chan_id=7ebe4475a67db4dca4bbae9bbb7170c27e025a1d393c587969fe989e6b74534d, short_chan_id=692594:688:1
Oct 28 13:56:52 fullnode lnd[4723]: 2021-10-28 13:56:52.875 [INF] NTFN: New block epoch subscription
Oct 28 13:56:52 fullnode lnd[4723]: 2021-10-28 13:56:52.875 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(4c53746b9e98fe6979583c391d5a027ec27071bb9baebba4dcb47da67544be7e:1)
Oct 28 13:56:52 fullnode lnd[4723]: 2021-10-28 13:56:52.875 [INF] PEER: Negotiated chan series queries with 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2
Oct 28 13:56:52 fullnode lnd[4723]: 2021-10-28 13:56:52.875 [INF] DISC: Creating new GossipSyncer for peer=026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2
Oct 28 13:56:52 fullnode lnd[4723]: 2021-10-28 13:56:52.875 [DBG] PEER: Sending NodeAnnouncement(node=021c3ec6432d2b9b5abcb01dd64b3a8f2afe3ba7d8c021f63ffd0a994cd3bc9b88, update_time=2021-10-28 13:51:53 +0300 MSK) to 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2@104.196.200.39:9735
Oct 28 13:56:52 fullnode lnd[4723]: 2021-10-28 13:56:52.875 [DBG] SRVR: Notifying that peer 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2@104.196.200.39:9735 is online
Oct 28 13:56:52 fullnode lnd[4723]: 2021-10-28 13:56:52.875 [INF] HSWC: ChannelLink(4c53746b9e98fe6979583c391d5a027ec27071bb9baebba4dcb47da67544be7e:1): HTLC manager started, bandwidth=2584591793 mSAT
Oct 28 13:56:52 fullnode lnd[4723]: 2021-10-28 13:56:52.876 [INF] HSWC: ChannelLink(4c53746b9e98fe6979583c391d5a027ec27071bb9baebba4dcb47da67544be7e:1): attempting to re-synchronize
Oct 28 13:56:52 fullnode lnd[4723]: 2021-10-28 13:56:52.876 [DBG] PEER: Received ChannelReestablish(next_local_height=16269, remote_tail_height=16269) from 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2@104.196.200.39:9735
Oct 28 13:56:52 fullnode lnd[4723]: 2021-10-28 13:56:52.877 [DBG] PEER: Sending ChannelReestablish(next_local_height=16270, remote_tail_height=16268) to 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2@104.196.200.39:9735
Oct 28 13:56:52 fullnode lnd[4723]: 2021-10-28 13:56:52.877 [INF] HSWC: ChannelLink(4c53746b9e98fe6979583c391d5a027ec27071bb9baebba4dcb47da67544be7e:1): received re-establishment message from remote side
Oct 28 13:56:52 fullnode lnd[4723]: 2021-10-28 13:56:52.877 [DBG] PEER: Received NodeAnnouncement(node=026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2, update_time=2021-10-27 19:12:09 +0300 MSK) from 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2@104.196.200.39:9735

After this about 3 seconds pass and I get:

Oct 28 13:56:55 fullnode lnd[4723]: 2021-10-28 13:56:55.929 [INF] PEER: unable to read message from 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2@104.196.200.39:9735: EOF
Oct 28 13:56:55 fullnode lnd[4723]: 2021-10-28 13:56:55.930 [INF] PEER: disconnecting 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2@104.196.200.39:9735, reason: read handler closed
Oct 28 13:56:56 fullnode lnd[4723]: 2021-10-28 13:56:56.132 [DBG] SRVR: Peer 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2@104.196.200.39:9735 has been disconnected
Oct 28 13:56:56 fullnode lnd[4723]: 2021-10-28 13:56:56.132 [INF] DISC: Removing GossipSyncer for peer=026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2

The node in question is Boltz, they seem to use lnd as well. Manual connection works fine as before (the node stays connected):

Oct 28 14:02:02 fullnode lnd[4723]: 2021-10-28 14:02:02.437 [DBG] SRVR: Connecting to 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2@104.196.200.39:9735
Oct 28 14:02:02 fullnode lnd[4723]: 2021-10-28 14:02:02.633 [DBG] PEER: Received Pong(pong_bytes=00000000000000000000000000000000) from 023b826b81bc85998ff5fef73965bd8abb277f8ed0e3363b1172b637b59b911208@szumm24poped4nzi56zf2zvlywysmvshhnjmy3t7u25w6dben4huwjyd.onion:9735
Oct 28 14:02:02 fullnode lnd[4723]: 2021-10-28 14:02:02.757 [INF] SRVR: Established connection to: 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2@104.196.200.39:9735
Oct 28 14:02:02 fullnode lnd[4723]: 2021-10-28 14:02:02.757 [INF] SRVR: Finalizing connection to 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2@104.196.200.39:9735, inbound=false
Oct 28 14:02:02 fullnode lnd[4723]: 2021-10-28 14:02:02.761 [DBG] PEER: Sending Init to 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2@104.196.200.39:9735
Oct 28 14:02:02 fullnode lnd[4723]: 2021-10-28 14:02:02.910 [DBG] PEER: Received Init from 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2@104.196.200.39:9735
Oct 28 14:02:02 fullnode lnd[4723]: 2021-10-28 14:02:02.911 [DBG] PEER: Loaded 1 active channels from database with NodeKey(026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2)
Oct 28 14:02:02 fullnode lnd[4723]: 2021-10-28 14:02:02.924 [INF] PEER: NodeKey(026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2) loading ChannelPoint(4c53746b9e98fe6979583c391d5a027ec27071bb9baebba4dcb47da67544be7e:1)
Oct 28 14:02:02 fullnode lnd[4723]: 2021-10-28 14:02:02.925 [INF] HSWC: Removing channel link with ChannelID(7ebe4475a67db4dca4bbae9bbb7170c27e025a1d393c587969fe989e6b74534d)
Oct 28 14:02:02 fullnode lnd[4723]: 2021-10-28 14:02:02.925 [INF] HSWC: ChannelLink(4c53746b9e98fe6979583c391d5a027ec27071bb9baebba4dcb47da67544be7e:1): starting
Oct 28 14:02:02 fullnode lnd[4723]: 2021-10-28 14:02:02.925 [INF] HSWC: Trimming open circuits for chan_id=692594:688:1, start_htlc_id=2147
Oct 28 14:02:02 fullnode lnd[4723]: 2021-10-28 14:02:02.925 [INF] HSWC: Adding live link chan_id=7ebe4475a67db4dca4bbae9bbb7170c27e025a1d393c587969fe989e6b74534d, short_chan_id=692594:688:1
Oct 28 14:02:02 fullnode lnd[4723]: 2021-10-28 14:02:02.925 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(4c53746b9e98fe6979583c391d5a027ec27071bb9baebba4dcb47da67544be7e:1)
Oct 28 14:02:02 fullnode lnd[4723]: 2021-10-28 14:02:02.925 [INF] NTFN: New block epoch subscription
Oct 28 14:02:02 fullnode lnd[4723]: 2021-10-28 14:02:02.925 [INF] HSWC: ChannelLink(4c53746b9e98fe6979583c391d5a027ec27071bb9baebba4dcb47da67544be7e:1): HTLC manager started, bandwidth=2584591793 mSAT
Oct 28 14:02:02 fullnode lnd[4723]: 2021-10-28 14:02:02.925 [INF] HSWC: ChannelLink(4c53746b9e98fe6979583c391d5a027ec27071bb9baebba4dcb47da67544be7e:1): attempting to re-synchronize
Oct 28 14:02:02 fullnode lnd[4723]: 2021-10-28 14:02:02.925 [DBG] PEER: Sending NodeAnnouncement(node=021c3ec6432d2b9b5abcb01dd64b3a8f2afe3ba7d8c021f63ffd0a994cd3bc9b88, update_time=2021-10-28 13:51:53 +0300 MSK) to 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2@104.196.200.39:9735
Oct 28 14:02:02 fullnode lnd[4723]: 2021-10-28 14:02:02.925 [DBG] SRVR: Notifying that peer 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2@104.196.200.39:9735 is online
Oct 28 14:02:02 fullnode lnd[4723]: 2021-10-28 14:02:02.926 [INF] PEER: Negotiated chan series queries with 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2
Oct 28 14:02:02 fullnode lnd[4723]: 2021-10-28 14:02:02.926 [INF] DISC: Creating new GossipSyncer for peer=026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2
Oct 28 14:02:02 fullnode lnd[4723]: 2021-10-28 14:02:02.926 [DBG] PEER: Sending ChannelReestablish(next_local_height=16270, remote_tail_height=16268) to 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2@104.196.200.39:9735
Oct 28 14:02:02 fullnode lnd[4723]: 2021-10-28 14:02:02.927 [DBG] PEER: Received ChannelReestablish(next_local_height=16269, remote_tail_height=16269) from 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2@104.196.200.39:9735
Oct 28 14:02:02 fullnode lnd[4723]: 2021-10-28 14:02:02.927 [DBG] PEER: Received NodeAnnouncement(node=026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2, update_time=2021-10-27 19:12:09 +0300 MSK) from 026165850492521f4ac8abd9bd8088123446d126f648ca35e60f88177dc149ceb2@104.196.200.39:9735

Another misbehaving channel log:

Oct 28 14:01:07 fullnode lnd[4723]: 2021-10-28 14:01:07.843 [INF] SRVR: Established connection to: 03a09f56bba3d2c200cc55eda2f1f069564a97c1fb74345e1560e2868a8ab3d7d0@62.171.139.240:9735
Oct 28 14:01:07 fullnode lnd[4723]: 2021-10-28 14:01:07.844 [INF] SRVR: Finalizing connection to 03a09f56bba3d2c200cc55eda2f1f069564a97c1fb74345e1560e2868a8ab3d7d0@62.171.139.240:9735, inbound=false
Oct 28 14:01:07 fullnode lnd[4723]: 2021-10-28 14:01:07.846 [DBG] PEER: Sending Init to 03a09f56bba3d2c200cc55eda2f1f069564a97c1fb74345e1560e2868a8ab3d7d0@62.171.139.240:9735
Oct 28 14:01:07 fullnode lnd[4723]: 2021-10-28 14:01:07.922 [DBG] PEER: Received Init from 03a09f56bba3d2c200cc55eda2f1f069564a97c1fb74345e1560e2868a8ab3d7d0@62.171.139.240:9735
Oct 28 14:01:07 fullnode lnd[4723]: 2021-10-28 14:01:07.923 [DBG] PEER: Loaded 1 active channels from database with NodeKey(03a09f56bba3d2c200cc55eda2f1f069564a97c1fb74345e1560e2868a8ab3d7d0)
Oct 28 14:01:07 fullnode lnd[4723]: 2021-10-28 14:01:07.936 [INF] PEER: NodeKey(03a09f56bba3d2c200cc55eda2f1f069564a97c1fb74345e1560e2868a8ab3d7d0) loading ChannelPoint(1c6504eee97786c8a9f28027e180ea77faa7ee995836b3524775894ec7fb2541:1)
Oct 28 14:01:07 fullnode lnd[4723]: 2021-10-28 14:01:07.936 [INF] HSWC: Removing channel link with ChannelID(4125fbc74e89754752b3365899eea7fa77ea80e12780f2a9c88677e9ee04651d)
Oct 28 14:01:07 fullnode lnd[4723]: 2021-10-28 14:01:07.936 [INF] HSWC: ChannelLink(1c6504eee97786c8a9f28027e180ea77faa7ee995836b3524775894ec7fb2541:1): starting
Oct 28 14:01:07 fullnode lnd[4723]: 2021-10-28 14:01:07.937 [INF] HSWC: Trimming open circuits for chan_id=694697:848:1, start_htlc_id=4404
Oct 28 14:01:07 fullnode lnd[4723]: 2021-10-28 14:01:07.937 [INF] HSWC: Adding live link chan_id=4125fbc74e89754752b3365899eea7fa77ea80e12780f2a9c88677e9ee04651d, short_chan_id=694697:848:1
Oct 28 14:01:07 fullnode lnd[4723]: 2021-10-28 14:01:07.937 [INF] HSWC: ChannelLink(1c6504eee97786c8a9f28027e180ea77faa7ee995836b3524775894ec7fb2541:1): HTLC manager started, bandwidth=1850755224 mSAT
Oct 28 14:01:07 fullnode lnd[4723]: 2021-10-28 14:01:07.937 [INF] HSWC: ChannelLink(1c6504eee97786c8a9f28027e180ea77faa7ee995836b3524775894ec7fb2541:1): attempting to re-synchronize
Oct 28 14:01:07 fullnode lnd[4723]: 2021-10-28 14:01:07.937 [DBG] SRVR: Notifying that peer 03a09f56bba3d2c200cc55eda2f1f069564a97c1fb74345e1560e2868a8ab3d7d0@62.171.139.240:9735 is online
Oct 28 14:01:07 fullnode lnd[4723]: 2021-10-28 14:01:07.937 [DBG] PEER: Sending NodeAnnouncement(node=021c3ec6432d2b9b5abcb01dd64b3a8f2afe3ba7d8c021f63ffd0a994cd3bc9b88, update_time=2021-10-28 13:51:53 +0300 MSK) to 03a09f56bba3d2c200cc55eda2f1f069564a97c1fb74345e1560e2868a8ab3d7d0@62.171.139.240:9735
Oct 28 14:01:07 fullnode lnd[4723]: 2021-10-28 14:01:07.937 [INF] NTFN: New block epoch subscription
Oct 28 14:01:07 fullnode lnd[4723]: 2021-10-28 14:01:07.937 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(1c6504eee97786c8a9f28027e180ea77faa7ee995836b3524775894ec7fb2541:1)
Oct 28 14:01:07 fullnode lnd[4723]: 2021-10-28 14:01:07.938 [INF] PEER: Negotiated chan series queries with 03a09f56bba3d2c200cc55eda2f1f069564a97c1fb74345e1560e2868a8ab3d7d0
Oct 28 14:01:07 fullnode lnd[4723]: 2021-10-28 14:01:07.938 [DBG] PEER: Sending ChannelReestablish(next_local_height=11077, remote_tail_height=11073) to 03a09f56bba3d2c200cc55eda2f1f069564a97c1fb74345e1560e2868a8ab3d7d0@62.171.139.240:9735
Oct 28 14:01:07 fullnode lnd[4723]: 2021-10-28 14:01:07.938 [INF] DISC: Creating new GossipSyncer for peer=03a09f56bba3d2c200cc55eda2f1f069564a97c1fb74345e1560e2868a8ab3d7d0
Oct 28 14:01:07 fullnode lnd[4723]: 2021-10-28 14:01:07.939 [DBG] PEER: Received NodeAnnouncement(node=03a09f56bba3d2c200cc55eda2f1f069564a97c1fb74345e1560e2868a8ab3d7d0, update_time=2021-10-28 00:52:26 +0300 MSK) from 03a09f56bba3d2c200cc55eda2f1f069564a97c1fb74345e1560e2868a8ab3d7d0@62.171.139.240:9735
Oct 28 14:01:07 fullnode lnd[4723]: 2021-10-28 14:01:07.939 [DBG] PEER: Received ChannelReestablish(next_local_height=11074, remote_tail_height=11076) from 03a09f56bba3d2c200cc55eda2f1f069564a97c1fb74345e1560e2868a8ab3d7d0@62.171.139.240:9735
Oct 28 14:01:07 fullnode lnd[4723]: 2021-10-28 14:01:07.940 [INF] HSWC: ChannelLink(1c6504eee97786c8a9f28027e180ea77faa7ee995836b3524775894ec7fb2541:1): received re-establishment message from remote side

Then after 2 seconds:

Oct 28 14:01:09 fullnode lnd[4723]: 2021-10-28 14:01:09.026 [INF] PEER: unable to read message from 03a09f56bba3d2c200cc55eda2f1f069564a97c1fb74345e1560e2868a8ab3d7d0@62.171.139.240:9735: EOF
Oct 28 14:01:09 fullnode lnd[4723]: 2021-10-28 14:01:09.026 [INF] PEER: disconnecting 03a09f56bba3d2c200cc55eda2f1f069564a97c1fb74345e1560e2868a8ab3d7d0@62.171.139.240:9735, reason: read handler closed
Oct 28 14:01:09 fullnode lnd[4723]: 2021-10-28 14:01:09.071 [DBG] PEER: Received Pong(pong_bytes=00000000000000000000000000000000) from 02293138a23fc559325490324785fe2aebb37b6271d5cefca64891ad608dd4fbdb@mtwa3eu5rugrfa67eoonjkn42j3p52qvxbhejcpbfwaanouwyp3d6wad.onion:9735
Oct 28 14:01:09 fullnode lnd[4723]: 2021-10-28 14:01:09.219 [DBG] PEER: Received Pong(pong_bytes=00000000000000000000000000000000) from 02bba58dc4258b7e7b0afd061babe7d53fb4460a889246c5b9a58bda94d4a04197@zhw76i76qwnytbuas7ryg4x2yjhhnhvfwjud4h4zme75p5bixgzvm4qd.onion:9735
Oct 28 14:01:09 fullnode lnd[4723]: 2021-10-28 14:01:09.227 [DBG] SRVR: Peer 03a09f56bba3d2c200cc55eda2f1f069564a97c1fb74345e1560e2868a8ab3d7d0@62.171.139.240:9735 has been disconnected
Oct 28 14:01:09 fullnode lnd[4723]: 2021-10-28 14:01:09.228 [INF] DISC: Removing GossipSyncer for peer=03a09f56bba3d2c200cc55eda2f1f069564a97c1fb74345e1560e2868a8ab3d7d0
Oct 28 14:01:09 fullnode lnd[4723]: 2021-10-28 14:01:09.228 [INF] HSWC: Removing channel link with ChannelID(4125fbc74e89754752b3365899eea7fa77ea80e12780f2a9c88677e9ee04651d)
Oct 28 14:01:09 fullnode lnd[4723]: 2021-10-28 14:01:09.228 [INF] HSWC: ChannelLink(1c6504eee97786c8a9f28027e180ea77faa7ee995836b3524775894ec7fb2541:1): stopping
Oct 28 14:01:09 fullnode lnd[4723]: 2021-10-28 14:01:09.228 [INF] HSWC: ChannelLink(1c6504eee97786c8a9f28027e180ea77faa7ee995836b3524775894ec7fb2541:1): exited
Oct 28 14:01:09 fullnode lnd[4723]: 2021-10-28 14:01:09.228 [DBG] SRVR: Notifying that peer 03a09f56bba3d2c200cc55eda2f1f069564a97c1fb74345e1560e2868a8ab3d7d0@62.171.139.240:9735 is offline
Oct 28 14:01:09 fullnode lnd[4723]: 2021-10-28 14:01:09.228 [DBG] SRVR: removing peer 03a09f56bba3d2c200cc55eda2f1f069564a97c1fb74345e1560e2868a8ab3d7d0@62.171.139.240:9735

@ellemouton
Copy link
Collaborator

thanks for the extra DBG logs @rkfg! Will look into this soon 👍

@rkfg
Copy link
Author

rkfg commented Nov 1, 2021

One more thing to add. I had a couple of cases when a peer (AcrossFIRE IIRC) got disconnected with the same EOF reason (so random connection reset I assume) and fell into the same unsuccessful reconnect loop never being able to stay connected. I waited for a bit and watched the logs, it looked like the above, no reasonable errors just connect/disconnect. Then I forced connection using lncli and it instantly connected and worked as it should.

I also noticed this sometimes happens during forwarding a payment so the HTLC gets stuck at my node until I reconnect one of the peers. IIRC, it was the incoming channel peer. Another time an outgoing channel peer was disconnected during rebalance, after I reconnected it manually the attempt resolved quickly and proceeded as usual.

Connection reset during forward/payment isn't very rare but it usually reestablishes quickly and the process resumes. Sometimes, however, it gets stuck like this. As I understand there's already in-band pings that should prevent TCP connections from timing out. Otherwise that would explain why connection reset only happens when there's actual data to send/receive to/from the peer in question.

Maybe it's important but I use OpenVPN to my VPS and forward ports to my home node using iptables. I guess if that were a problem I'd see more of these issues but 90% of the peers work fine and only a few of my peers (always the same) have this problem.

@ellemouton
Copy link
Collaborator

ok cool, i have managed to reproduce this. Seems like it has something to do with the combo of peers having both clearnet and onion addresses advertised plus using hybrid mode. Diving into this today

@ellemouton
Copy link
Collaborator

ellemouton commented Nov 2, 2021

ok i think I found the issue (still need to figure out the best fix):

  • so this happens when a peer has multiple working addresses advertised (say 1 clearnet and 1 tor)
  • on start up, in establishPersistentConnections, all the peers known addresses are loaded and we attempt to connect to each one to see which is successful.
  • we try both and if one is successful, we close the other connection request.
  • but on the peer's side: they get the first request, they accept it, then they get the second one and assume that the previous one is stale and so they close the first one. Mean while we have already closed the second one. See that logic here

Before this PR, the above would also happen during the initial establishPersistentConnections run since it would always try all addresses at once but we would then kickoff peerTerminationWatcher which would only attempt to reconnect to a single address of the peer.
Now with that PR we also attempt to connect to the multiple addresses in peerTerminationWatcher which is why the problem just continues.

The reason that manual connection works is that we also just attempt the one address during that.

So, I see 2 possible solutions:

  1. Either the peer's logic (so their InboundPeerConnected function) needs to be updated to not close the first connection if the second one came very soon after the first. Or:
  2. the reconnection logic on our side needs to be updated to not attempt 2 connection requests at once. We should instead have a queue of addresses we want to try and then only try the next one once we know the first has failed. OR:
  3. a temporary solution would be to only try connect to clearnet addresses if hybrid tor mode is on

I think option 2 seems cleaner. But the tricky thing is that we let connMgr handle the reconnections so would need to add something that lets connMgr try for a bit and then if still no successful conn & there is a diff address to try: make connMgr cancel the current connReq and add new one. Rather than making connMgr try them all at the same time

@rkfg
Copy link
Author

rkfg commented Nov 5, 2021

For the record, can confirm that this issue is fixed for me. After running the version from master all alive peers connect successfully and stay connected. Thank you for your great work 🙏

@ellemouton
Copy link
Collaborator

yay! Glad to hear @rkfg! 🚀 thanks for the feedback!

@ZoltanAB
Copy link

I have a similar issue, it looks like sometimes my node can't reconnect to some peers. After a while, the channel gets force closed.
I am using lncli version 0.14.2-beta commit=v0.14.2-beta.

2022-05-29 23:04:10.077 [INF] PEER: unable to read message from 038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5@167.235.3.234:9735: EOF
2022-05-29 23:04:10.078 [INF] PEER: disconnecting 038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5@167.235.3.234:9735, reason: read handler closed
2022-05-29 23:04:10.077 [DBG] DISC: Processing network message: peer=038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5@167.235.3.234:9735, source=038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5, msg=ChannelAnnouncement, is_remote=true
2022-05-29 23:04:10.180 [DBG] SRVR: Peer 038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5@167.235.3.234:9735 has been disconnected
2022-05-29 23:04:10.180 [DBG] FNDG: Cancelling all reservations for peer 038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5
2022-05-29 23:04:10.180 [DBG] FNDG: No active reservations for node: 038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5
2022-05-29 23:04:10.180 [INF] DISC: Removing GossipSyncer for peer=038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5
2022-05-29 23:04:10.180 [DBG] SRVR: Notifying that peer 038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5@167.235.3.234:9735 is offline
2022-05-29 23:04:10.180 [DBG] SRVR: removing peer 038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5@167.235.3.234:9735
2022-05-29 23:04:10.180 [DBG] PRNF: PeerNotifier notifying peer: 038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5 offline
2022-05-29 23:04:10.180 [DBG] BTCN: Disconnected from 038fe1bd966b5cb0545963490c631eaa1924e2c4c0ea4e7dcb5d4582a1e7f2f1a5@167.235.3.234:9735 (reqid 129)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
networking p2p Code related to the peer-to-peer behaviour
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants