Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug]: "edge not found" for local channel #7261

Closed
C-Otto opened this issue Dec 18, 2022 · 21 comments · Fixed by #8805
Closed

[bug]: "edge not found" for local channel #7261

C-Otto opened this issue Dec 18, 2022 · 21 comments · Fixed by #8805
Labels
bug Unintended code behaviour gossip graph P2 should be fixed if one has time

Comments

@C-Otto
Copy link
Contributor

C-Otto commented Dec 18, 2022

Background

A peer opened a channel to my node. This channel is listed in lncli listchannels. However, lncli getchaninfo X shows an error (with X being the channel ID):

[lncli] rpc error: code = Unknown desc = edge not found

According to lncli listchannels, this appears to be a regular (inactive) channel. It's not private, it has "remote_constraints", there's an ID in "alias_scids". It's not a zero-conf channel. Strangely, there are two incoming HTLCs already, where "forwarding_channel" is set to 0. Currently, the channel is inactive, but it has been active before.

When reconnecting to the peer, I get:

[INF] PEER: Peer(xxx): loading ChannelPoint(xxx:1)
[WRN] PEER: Peer(xxx): Unable to find our forwarding policy for channel xxx:1, using default values

In my logs I also see:

[INF] RPCS: channel: xxx:1 not found by channel event store

I'd be happy to provide further information.

Your environment

  • lnd v0.15.5-beta
  • Linux server 5.10.0-13-amd64 #1 SMP Debian 5.10.106-1 (2022-03-17) x86_64 GNU/Linux
  • bitcoind v24.0.1
@C-Otto C-Otto added bug Unintended code behaviour needs triage labels Dec 18, 2022
@C-Otto
Copy link
Contributor Author

C-Otto commented Dec 18, 2022

Something weird happened during channel open. It looks like the second open attempt worked?

[INF] FNDG: Recv'd fundingRequest(amt=0.xxx BTC, push=0 mSAT, delay=144, pendingId=xxx) from peer(xxx)
[INF] FNDG: Requiring 4 confirmations for pendingChan(xxx): amt=0.xxx BTC, push_amt=0 mSAT, committype=tweakless, upfrontShutdown=
[INF] FNDG: Cancelling funding reservation for node_key=xxx, chan_id=xxx
[INF] PEER: Peer(xxx): unable to read message from peer: EOF
[ERR] FNDG: received funding error from xxx: chan_id=xxx, err=Channel open canceled by us

a few seconds later:
[INF] FNDG: Recv'd fundingRequest(amt=0.xxx BTC, push=0 mSAT, delay=144, pendingId=xxx) from peer(xxx)
[INF] FNDG: Requiring 4 confirmations for pendingChan(xxx): amt=0.xxx BTC, push_amt=0 mSAT, committype=tweakless, upfrontShutdown=
[INF] FNDG: Sending fundingResp for pending_id(xxx)
[INF] FNDG: completing pending_id(xxx) with ChannelPoint(xxx:1)
[INF] FNDG: sending FundingSigned for pending_id(xxx) over ChannelPoint(xxx:1)

The open transaction is confirmed and, according to mempool.space, opened two other channels to other peers.

@C-Otto
Copy link
Contributor Author

C-Otto commented Dec 18, 2022

I also get this:

[WRN] HSWC: ChannelLink(xxx:1): received warning message from peer: chan_id=0000000000000000000000000000000000000000000000000000000000000000, err=Malformed node_announcement [...]

The node_announcement is extremely long, filling roughly one screen.

@yyforyongyu
Copy link
Member

Discussed offline, will post more details here after proper investigation.

@Crypt-iQ
Copy link
Collaborator

Crypt-iQ commented Jan 5, 2023

I think this is distinct from #7229 since this was not a zero-conf channel

C-Otto added a commit to C-Otto/lnd-manageJ that referenced this issue Feb 26, 2023
@dannydeezy
Copy link

i get this issue too for a few channels

@guggero
Copy link
Collaborator

guggero commented Apr 20, 2023

@dannydeezy with what version of lnd? @C-Otto does this still happen with lnd 0.16.1?

@C-Otto
Copy link
Contributor Author

C-Otto commented Apr 20, 2023

I'm running lnd 0.16.0 with some WT related commits, not 0.16.1. Currently, I don't see this issue. The last time the error message appeared in my logs was 2023-04-09. I updated to lnd 0.16.0 a few days before, so I think this is not resolved, yet.

@litch
Copy link

litch commented Apr 20, 2023

We're running into this also, on nodes running 0.16.0

@ellemouton
Copy link
Collaborator

For those who recently experienced this bug, pls can you try updated to 0.16.1-beta.rc1 to see if that fixes your issue?

It might be that your issue is the same as this user's issue which is distinct from this issue even though it throws the same error message. That user's issue was fixed in 0.16.1-beta.rc1

@theborakompanioni
Copy link
Contributor

Same here after upgrading to 0.16.1-beta.rc3:

[INF] DISC: Creating new GossipSyncer for peer=$peer_id
[ERR] NANN: Unable to retrieve chan status for Channel($tx_id:1): edge not found
[INF] DISC: GossipSyncer($peer_id): applying new update horizon: start=2106-02-07 03:11:07 +1100 LHDT, end=2242-03-16 13:56:30 +1100 LHDT, backlog_size=0
[INF] PEER: Peer($peer_id): unable to read message from peer: EOF
[INF] PEER: Peer($peer_id): disconnecting $peer_id@$address, reason: read handler closed
[INF] NTFN: Cancelling epoch notification, epoch_id=39
[INF] DISC: Removing GossipSyncer for peer=$peer_id
[INF] HSWC: ChannelLink($tx_id:1): stopping
[WRN] HSWC: ChannelLink($tx_id:1): error when syncing channel states: link shutting down
[INF] HSWC: ChannelLink($tx_id:1): exited
[INF] HSWC: Removing channel link with ChannelID(x)

Edit: After restarting only LND node the problem did not go away. However, after restarting the other node (CLN) everything started to work as expected.

@ellemouton
Copy link
Collaborator

Thanks for the info @theborakompanioni! Question to everyone else experiencing this issue - is your channel peer for the problem channel a CLN node?

@litch
Copy link

litch commented Apr 25, 2023

No, both sides of the channel were LND in our case (we saw this a few times)

@SunnySarahNode
Copy link

SunnySarahNode commented Apr 25, 2023

I have same case with one of my peers.

My node: LND 0.16.0-beta commit=v0.16.0-beta
Peer node: LND 0.16.0

We both had "[lncli] rpc error: code = Unknown desc = edge not found"

(!) The issue was resolved on my peer's side by upgrading LND 0.16.0 -> 0.16.1-beta. Immediately after that the problem disappeared and the channel became active.

On my side the issue is currently active, obviously I need to upgrade my LND to 0.16.1 too, but I haven't done it yet.

Hope this information is helpful.

Addition logs from my side:


2023-04-22 12:42:57.884 [WRN] NANN: Unable to find channel policies for xxx:1, skipping. This is typical if the channel is in the process of closing.
...
2023-04-22 12:42:58.114 [INF] PEER: Peer(xxx): loading ChannelPoint(xxx:1)
2023-04-22 12:42:58.114 [WRN] PEER: Peer(xxx): Unable to find our forwarding policy for channel xxx:1, using default values
...
2023-04-22 12:42:58.344 [INF] HSWC: ChannelLink(1863a7ca2bb10229e44b158223aa04e0e3a185b056f55263e3e5b0d4291a5235:1): starting
...
2023-04-22 12:42:58.358 [INF] CNCT: Attempting to update ContractSignals for ChannelPoint(xxx:1)
2023-04-22 12:42:58.358 [INF] HSWC: ChannelLink(xxx:1): HTLC manager started, bandwidth=0 mSAT
2023-04-22 12:42:58.358 [INF] HSWC: ChannelLink(xxx:1): attempting to re-synchronize
2023-04-22 12:42:58.358 [WRN] HSWC: ChannelLink(xxx:1): error when syncing channel states: unable to send chan sync message for ChannelPoint(xxx:1): peer exiting
2023-04-22 12:42:58.359 [ERR] HSWC: ChannelLink(xxx:1): failing link: unable to synchronize channel states: unable to send chan sync message for ChannelPoint(xxx:1): peer exiting with error: unable to resume channel, recovery required
2023-04-22 12:42:58.359 [ERR] FNDG: Unable to advance state(xxx:1): failed adding to router graph: error generating channel announcement: unable to generate channel update announcement: channel not found
2023-04-22 12:42:58.359 [INF] HSWC: ChannelLink(xxx:1): exited

@ellemouton
Copy link
Collaborator

hi @SunnySarahNode - i think your specific case was solved in 0.16.1-beta (in #7613) . Pls can you update & see if that works? your peer managing to solve the problem on their side by upgrading makes me think it is the same as this user: #6987 (comment)

@saubyk saubyk added graph and removed needs triage labels Aug 30, 2023
@saubyk
Copy link
Collaborator

saubyk commented Aug 30, 2023

Closing this issue, as the fix has been provided with 0.16.1. Feel free to reopen if the problem persists.

@saubyk saubyk closed this as completed Aug 30, 2023
@JssDWt
Copy link
Contributor

JssDWt commented Aug 30, 2023

@saubyk
The fix in 0.16.1 fixes this for new channels, but does not retroactively fix the old ones. This is still an issue if you ever ended up in the situation where you ended up in this situation. Note that it affects forwarding of htlcs as well, because the channel policy cannot be found for the edge, which makes it use the default channel policy.

@saubyk saubyk reopened this Aug 30, 2023
@saubyk saubyk added the P2 should be fixed if one has time label Aug 30, 2023
@miephos
Copy link

miephos commented Feb 21, 2024

Hello,
I have the same issue. I recently opened a channel to a peer, but I get the same error mentioned. I restarted LND, disconnected and connected again to the peer. Didn't help.

If I run, I get the following error

lncli getchaninfo 9133577121506xxxxx
[lncli] rpc error: code = Unknown desc = edge not found

Version: 0.17.4-beta

I have attached the log trace of the opening transaction till I received the error.
boltz-open-channel.log

Maybe you can make some sense out of it. Would be nice to be able to use the channel.

Big thanks!!

@yyforyongyu
Copy link
Member

From the logs, the policy was not found even in lnd v0.17.4,

2024-02-16 08:51:13.826 [WRN] PEER: Peer(02d96eadea3d780104449aca5c93461ce67c1564e2e1d73225fa67dd3b997a6018): Unable to find our forwarding policy for channel c33ee3205ba326e66570078ddfd004be41307718e337e499c8ca2e9158c44bf6:1, using default values

which is originated here when the edge cannot be found:

lnd/peer/brontide.go

Lines 1024 to 1026 in 98c52df

p.log.Warnf("Unable to find our forwarding policy "+
"for channel %v, using default values",
chanPoint)

These four lines are also weird,

2024-02-16 08:51:37.856 [ERR] NANN: Unable to retrieve chan status for Channel(c33ee3205ba326e66570078ddfd004be41307718e337e499c8ca2e9158c44bf6:1): edge not found
2024-02-16 08:52:37.843 [ERR] NANN: Unable to retrieve chan status for Channel(c33ee3205ba326e66570078ddfd004be41307718e337e499c8ca2e9158c44bf6:1): edge not found
2024-02-16 08:53:37.842 [ERR] NANN: Unable to retrieve chan status for Channel(c33ee3205ba326e66570078ddfd004be41307718e337e499c8ca2e9158c44bf6:1): edge not found
2024-02-16 08:54:37.841 [ERR] NANN: Unable to retrieve chan status for Channel(c33ee3205ba326e66570078ddfd004be41307718e337e499c8ca2e9158c44bf6:1): edge not found

As we'd expect markPendingInactiveChannels to be called every 30s (default) here, but it's called 4 times instead,

case <-m.statusSampleTicker.C:
// First, do a sweep and mark any ChanStatusEnabled
// channels that are not active within the htlcswitch as
// ChanStatusPendingDisabled. The channel will then be
// disabled if no request to enable is received before
// the ChanDisableTimeout expires.
m.markPendingInactiveChannels()

There's also a disconnect/connect attempt here, I assume it's triggered manually?

2024-02-16 08:51:12.148 [INF] SRVR: Disconnecting from 02d96eadea3d780104449aca5c93461ce67c1564e2e1d73225fa67dd3b997a6018@45.86.229.190:9736
2024-02-16 08:51:12.148 [INF] PEER: Peer(02d96eadea3d780104449aca5c93461ce67c1564e2e1d73225fa67dd3b997a6018): disconnecting 02d96eadea3d780104449aca5c93461ce67c1564e2e1d73225fa67dd3b997a6018@45.86.229.190:9736, reason: server: DisconnectPeer called
2024-02-16 08:51:12.148 [INF] PEER: Peer(02d96eadea3d780104449aca5c93461ce67c1564e2e1d73225fa67dd3b997a6018): unable to read message from peer: read next header: read tcp 10.10.21.21:55308->45.86.229.190:9736: use of closed network connection
2024-02-16 08:51:12.157 [ERR] RPCS: [connectpeer]: error connecting to peer: already connected to peer: 02d96eadea3d780104449aca5c93461ce67c1564e2e1d73225fa67dd3b997a6018@45.86.229.190:9736
2024-02-16 08:51:12.158 [ERR] RPCS: [/lnrpc.Lightning/ConnectPeer]: already connected to peer: 02d96eadea3d780104449aca5c93461ce67c1564e2e1d73225fa67dd3b997a6018@45.86.229.190:9736
2024-02-16 08:51:12.248 [INF] DISC: Removing GossipSyncer for peer=02d96eadea3d780104449aca5c93461ce67c1564e2e1d73225fa67dd3b997a6018

@miephos I've noticed the chan was forced closed - do you by chance still have the logs? What about the channel type, like is it a public or private chan?

@miephos
Copy link

miephos commented Jun 22, 2024

@miephos I've noticed the chan was forced closed - do you by chance still have the logs? What about the channel type, like is it a public or private chan?

The channel was a public channel. I did a manual disconnect and connect to peer. It still seemed stuck. So I force closed the the channel.

I still have the logs. Let me know how to provide them, if needed.

@yyforyongyu
Copy link
Member

@miephos it'd be great to provide some logs! You can find me via the handle yyforyongyu on slack/twitter/keybase, thanks!

@ziggie1984
Copy link
Collaborator

Yeah really strange behaviour would it make sense to bring this log to the Error level, so that we at least know when the edge failed to be added to the graph, otherwise there are some situations where we basically shadow the err which makes it really hard to tell what happened if node runners don't run the gossiper in the debug level ?

lnd/discovery/gossiper.go

Lines 2532 to 2533 in 71ba355

log.Debugf("Router rejected edge for short_chan_id(%v): %v",
ann.ShortChannelID.ToUint64(), err)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Unintended code behaviour gossip graph P2 should be fixed if one has time
Projects
None yet