-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some gossip never seems to sync in laboratory setup #6531
Comments
Some things that I've found out so far:
Is there a race condition with a peer being a syncer but without a horizon yet? I am not familiar enough with this sub-system to determine what this means exactly, but perhaps it gives a clue? |
@saubyk is this issue worth prioritizing? There are quite a few open issues/bugs around gossip that may be hard to reproduce. But this one is easy to repro reliably. I know from experience that this bug has a big impact. Without a patch applied, I couldn't get my node announcement to broadcast at all. |
@joostjager yes we are prioritizing the fix for the gossip propagation issue |
It seems that one is not the fix: #7186 (comment) |
Nope it's not. Meanwhile what's the topology looking like? I'll try to run the experiment to see what went wrong. |
The topology is defined here. It are a lot of nodes, and part of the test setup is to verify that the designated sender node has received all channel updates. For this test, I use a modified lnd version: https://github.com/bottlepay/lnd/commits/pathfinding-benchmark-mod-mpp. Docker-compose is automatically pulling in this branch and building it. It contains three changes:
Without the patch, the sender node isn't going to get a full picture of the graph. |
Can you try out #7239 in your test bed? |
In this commit, we modify our gossip broadcast logic to ensure that we always will send out our own gossip messages regardless of the filtering/feature policies of the peer. Before this commit, it was possible that when we went to broadcast an announcement, none of our peers actually had us as a syncer peer (lnd terminology). In this case, the FilterGossipMsg function wouldn't do anything, as they don't have an active timestamp filter set. When we go to them merge the syncer map, we'd add all these peers we didn't send to, meaning we would skip them when it came to broadcast time. In this commit, we now split things into two phases: we'll broadcast _our_ own announcements to all our peers, but then do the normal filtering and chunking for the announcements we got from a remote peer. Fixes lightningnetwork#6531 Fixes lightningnetwork#7223 Fixes lightningnetwork#7073
In this commit, we modify our gossip broadcast logic to ensure that we always will send out our own gossip messages regardless of the filtering/feature policies of the peer. Before this commit, it was possible that when we went to broadcast an announcement, none of our peers actually had us as a syncer peer (lnd terminology). In this case, the FilterGossipMsg function wouldn't do anything, as they don't have an active timestamp filter set. When we go to them merge the syncer map, we'd add all these peers we didn't send to, meaning we would skip them when it came to broadcast time. In this commit, we now split things into two phases: we'll broadcast _our_ own announcements to all our peers, but then do the normal filtering and chunking for the announcements we got from a remote peer. Fixes lightningnetwork#6531 Fixes lightningnetwork#7223 Fixes lightningnetwork#7073
Background
I am experimenting with a pathfinding benchmark setup that spins up a bunch of nodes inside docker: https://github.com/bottlepay/pathfinding-benchmark/tree/lnd-6531
One essential step before starting the test is to make sure that all channels are opened and the latest channel policies gossiped to the test node that is going to make the test payments.
To detect whether the test node is fully synced, I set the default lnd base fee to 999 everywhere. What I'd want to see in the
DescribeGraph
output of the test node is every channel represented with policies that don't have a 999 base fee. (Unfortunately it seems impossible to override the policy directly when a channel is created with lnd, meaning that an extra channel_update is always needed.)Your environment
Steps to reproduce
Run
run.sh
Note: make sure that there's sufficient memory, otherwise you may run into the death loop that is described here: #6210 (comment)
Expected behaviour
At some point during the setup stage, the test runner will wait for all channel policies to be received. In the used test graph, there should be 94 non-999-basefee policies. Then it proceeds with making the pathfinding test payments.
Actual behaviour
The test runner gets stuck on
It keeps waiting for the last 4 (number varies) up to date policies, but isn't getting them.
To try to get propagation working, I'm using the following
lnd.conf
settings:Also the code waits for 2 minutes before changing the channel policies from the default, to avoid hitting the gossip rate limit.
Repro isn't 100%, sometimes it does get fully synced.
I've encountered propagation issues on mainnet too occasionally. Of course there could be many lnd and non-lnd related reasons for that. But perhaps this set up could help uncover a latent bug somewhere in the gossip chain.
The text was updated successfully, but these errors were encountered: