Some gossip never seems to sync in laboratory setup #6531

joostjager · 2022-05-12T14:50:12Z

Background

I am experimenting with a pathfinding benchmark setup that spins up a bunch of nodes inside docker: https://github.com/bottlepay/pathfinding-benchmark/tree/lnd-6531

One essential step before starting the test is to make sure that all channels are opened and the latest channel policies gossiped to the test node that is going to make the test payments.

To detect whether the test node is fully synced, I set the default lnd base fee to 999 everywhere. What I'd want to see in the DescribeGraph output of the test node is every channel represented with policies that don't have a 999 base fee. (Unfortunately it seems impossible to override the policy directly when a channel is created with lnd, meaning that an extra channel_update is always needed.)

Your environment

lnd v0.14.3-beta
bitcoind

Steps to reproduce

Run run.sh

Note: make sure that there's sufficient memory, otherwise you may run into the death loop that is described here: #6210 (comment)

Expected behaviour

At some point during the setup stage, the test runner will wait for all channel policies to be received. In the used test graph, there should be 94 non-999-basefee policies. Then it proceeds with making the pathfinding test payments.

Actual behaviour

The test runner gets stuck on

testrunner_1               | 2022-05-12T14:24:12.807Z	DEBUG	Gossiped edges	{"count": 90, "expected": 94}

It keeps waiting for the last 4 (number varies) up to date policies, but isn't getting them.

To try to get propagation working, I'm using the following lnd.conf settings:

trickledelay=100
historicalsyncinterval=10s

Also the code waits for 2 minutes before changing the channel policies from the default, to avoid hitting the gossip rate limit.

Repro isn't 100%, sometimes it does get fully synced.

I've encountered propagation issues on mainnet too occasionally. Of course there could be many lnd and non-lnd related reasons for that. But perhaps this set up could help uncover a latent bug somewhere in the gossip chain.

The text was updated successfully, but these errors were encountered:

joostjager · 2022-05-13T12:05:47Z

Some things that I've found out so far:

The missing channel updates never even left the node that sets the policy. The problem isn't propagation on the network, but the update not even being broadcast.
The reason why it isn't broadcast inAuthenticatedGossiper.sendBatch is because the active syncers don't have a an update horizon set (

lnd/discovery/syncer.go

Line 1250 in 7106ea5

if g.remoteUpdateHorizon == nil {

)
Looking at the logs, GossipTimestampRange, which should set the horizon, is sent and received, but only later. In other cases it doesn't seem to be sent at all.

Is there a race condition with a peer being a syncer but without a horizon yet?

I am not familiar enough with this sub-system to determine what this means exactly, but perhaps it gives a clue?

joostjager · 2022-12-06T09:45:15Z

@saubyk is this issue worth prioritizing? There are quite a few open issues/bugs around gossip that may be hard to reproduce. But this one is easy to repro reliably.

I know from experience that this bug has a big impact. Without a patch applied, I couldn't get my node announcement to broadcast at all.

saubyk · 2022-12-06T16:46:18Z

@joostjager yes we are prioritizing the fix for the gossip propagation issue
#7186 is the pr I believe for this

joostjager · 2022-12-06T16:58:32Z

It seems that one is not the fix: #7186 (comment)

yyforyongyu · 2022-12-06T17:37:15Z

Nope it's not. Meanwhile what's the topology looking like? I'll try to run the experiment to see what went wrong.

joostjager · 2022-12-06T18:19:06Z

The topology is defined here. It are a lot of nodes, and part of the test setup is to verify that the designated sender node has received all channel updates.

For this test, I use a modified lnd version: https://github.com/bottlepay/lnd/commits/pathfinding-benchmark-mod-mpp. Docker-compose is automatically pulling in this branch and building it.

It contains three changes:

Reduced scrypt parameters to improve memory footprint
Reduced cache pre-allocation to improve memory footprint
The gossip patch.

Without the patch, the sender node isn't going to get a full picture of the graph.

Roasbeef · 2022-12-07T02:40:41Z

Can you try out #7239 in your test bed?

In this commit, we modify our gossip broadcast logic to ensure that we always will send out our own gossip messages regardless of the filtering/feature policies of the peer. Before this commit, it was possible that when we went to broadcast an announcement, none of our peers actually had us as a syncer peer (lnd terminology). In this case, the FilterGossipMsg function wouldn't do anything, as they don't have an active timestamp filter set. When we go to them merge the syncer map, we'd add all these peers we didn't send to, meaning we would skip them when it came to broadcast time. In this commit, we now split things into two phases: we'll broadcast _our_ own announcements to all our peers, but then do the normal filtering and chunking for the announcements we got from a remote peer. Fixes lightningnetwork#6531 Fixes lightningnetwork#7223 Fixes lightningnetwork#7073

joostjager · 2022-12-07T08:31:45Z

@Roasbeef ran it three times to be sure, and problem seems solved wtih #7239

In this commit, we modify our gossip broadcast logic to ensure that we always will send out our own gossip messages regardless of the filtering/feature policies of the peer. Before this commit, it was possible that when we went to broadcast an announcement, none of our peers actually had us as a syncer peer (lnd terminology). In this case, the FilterGossipMsg function wouldn't do anything, as they don't have an active timestamp filter set. When we go to them merge the syncer map, we'd add all these peers we didn't send to, meaning we would skip them when it came to broadcast time. In this commit, we now split things into two phases: we'll broadcast _our_ own announcements to all our peers, but then do the normal filtering and chunking for the announcements we got from a remote peer. Fixes lightningnetwork#6531 Fixes lightningnetwork#7223 Fixes lightningnetwork#7073

Roasbeef added bug Unintended code behaviour p2p Code related to the peer-to-peer behaviour gossip labels May 12, 2022

joostjager mentioned this issue Oct 31, 2022

[bug]: Fee policies not broadcasting, odd behavior #7073

Closed

yyforyongyu added the graph label Nov 17, 2022

sputn1ck mentioned this issue Nov 17, 2022

[bug]: Incomplete channel graph #7166

Closed

Roasbeef mentioned this issue Nov 29, 2022

Fix potential channel announcements missing #7186

Merged

LNBIG-COM mentioned this issue Dec 1, 2022

[bug]: The problem with processing IP & Alias from a gossip protocol? #7223

Closed

Roasbeef mentioned this issue Dec 7, 2022

discovery: ensure we prioritize sending out our own local announcements #7239

Merged

yyforyongyu mentioned this issue Dec 7, 2022

discovery: don't merge a peer when no message is sent to it #7240

Closed

Roasbeef closed this as completed in #7239 Dec 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some gossip never seems to sync in laboratory setup #6531

Some gossip never seems to sync in laboratory setup #6531

joostjager commented May 12, 2022 •

edited

Loading

joostjager commented May 13, 2022 •

edited

Loading

joostjager commented Dec 6, 2022

saubyk commented Dec 6, 2022

joostjager commented Dec 6, 2022

yyforyongyu commented Dec 6, 2022

joostjager commented Dec 6, 2022 •

edited

Loading

Roasbeef commented Dec 7, 2022

joostjager commented Dec 7, 2022

Some gossip never seems to sync in laboratory setup #6531

Some gossip never seems to sync in laboratory setup #6531

Comments

joostjager commented May 12, 2022 • edited Loading

Background

Your environment

Steps to reproduce

Expected behaviour

Actual behaviour

joostjager commented May 13, 2022 • edited Loading

joostjager commented Dec 6, 2022

saubyk commented Dec 6, 2022

joostjager commented Dec 6, 2022

yyforyongyu commented Dec 6, 2022

joostjager commented Dec 6, 2022 • edited Loading

Roasbeef commented Dec 7, 2022

joostjager commented Dec 7, 2022

joostjager commented May 12, 2022 •

edited

Loading

joostjager commented May 13, 2022 •

edited

Loading

joostjager commented Dec 6, 2022 •

edited

Loading