p2p/discover: improved node revalidation #29572

fjl · 2024-04-18T07:59:33Z

Node discovery periodically revalidates the nodes in its table by sending PING, checking if they are still alive. I recently noticed some issues with the implementation of this process, which can cause strange results such as nodes dropping unexpectedly, certain nodes not getting revalidated often enough, and bad results being returned to incoming FINDNODE queries.

Let me first describe how the revalidation process worked previously:

We set a randomized timer < 10s. When that timer expires, a random bucket is chosen, and within that bucket the last node will be validated.
The idea of revalidating the last node was taken from the original Kademlia paper. Certain contacts, such as incoming PING, will move nodes to the first position of the bucket. Other events (i.e. adding nodes from NODES responses) will put them in the back of the bucket. This is supposed to play out such that we always pick a node that requires revalidation the most because any successful contact moves it back to the front. The bucket behaves like a queue, basically.
We first send a PING message to the chosen node. If it responds, we increase it's livenessChecks value by one. Since PONG also has the node's ENR sequence number, we request the node's new ENR when it has changed.
If the node does not respond to PING, we immediately remove it from the bucket. In place of the old node, we put a random one from the bucket's replacement cache (a list of recently-encountered nodes). However, this only happens if the node is still the last node after revalidation. This condition exists because another event may have updated the node's position, in which case it shouldn't be removed.
Finally, note there are some edge cases to consider. when we fetch an updated ENR from the node it can have an updated endpoint, which might not fit into the bucket/table IP limits anymore. In that case, we can't apply the update and just stick with the older ENR. We could also drop the node at that point, but it will be dropped later anyway if the node really isn't reachable on the old endpoint anymore.

Now on to issues with the above process:

Revalidation is too slow. We check one node every 5s, and the table's top 10 buckets of 16 nodes are expected to be full at all times. Assuming an even distribution across all table members, we check each node every 160 * 5s == 13.3 min. Note this time applies also to all nodes, even the ones freshly added to the table from a query. It's just too slow to maintain a healthy table.
And the distribution isn't even. The concept of moving nodes around within the bucket made less sense the longer I looked at it, because it just complicates things in the implementation. Also, since the process chooses a random bucket and only then picks the node, nodes in deeper buckets will be revalidated more often simply because those buckets are usually less full. The distribution of revalidation requests across table nodes should be even because they may all go offline with an equal chance.
Node replacement edge cases are mostly handled correctly by the current implementation, but it's really hard to follow the code, and I had a lot of trouble seeing through it. That part about not replacing the node if it's not last anymore is just useless. There is also at least one code path where nodes were deleted without choosing a replacement.

Here is my proposed design for the new revalidation process:

We maintain two 'revalidation lists' containing the table nodes. The lists could be named 'fast' and 'slow'.
The process chooses random nodes from each list on a randomized interval, the interval being faster for the 'fast' list, and performs revalidation for the chosen node.
Whenever a node is newly inserted into the table, it goes into the 'fast' list. Once validation passes, it transfers to the 'slow' list. If a request fails, or the node changes endpoint, it transfers back into 'fast'.
livenessChecks is incremented by one for successful checks. Unlike the old implementation, we will not drop the node on the first failing check. We instead quickly decay the livenessChecks by / 5 or so to give it another chance.
Order of nodes in bucket doesn't matter anymore.
I intend to write the implementation in a way that makes it easy to dynamically adjust the rate of revalidation requests if needed. This is important because the implementation also uses revalidation requests as an input to the endpoint predictor. We could increase activity if the predictor doesn't have enough statements, for example.

p2p/discover/table_reval.go

fjl · 2024-04-24T14:20:30Z

p2p/discover/table_reval.go

+
+	if !resp.didRespond {
+		// Revalidation failed.
+		n.livenessChecks /= 5


maybe use / 3

Co-authored-by: Martin HS <martin@swende.se>

This is to better reflect their purpose. The previous naming of 'seen' and 'verified' was kind of arbitrary, especially since 'verified' was the stricter one.

p2p/discover/table.go

lightclient

LGTM, should probably merge and see how it looks live.

p2p/discover/table_reval.go

In #29572, I assumed the revalidation list that the node is contained in could only ever be changed by the outcome of a revalidation request. But turns out that's not true: if the node gets removed due to FINDNODE failure, it will also be removed from the list it is in. This causes a crash. The invariant is: while node is in table, it is always in exactly one of the two lists. So it seems best to store a pointer to the current list within the node itself.

It seems the semantic differences between addFoundNode and addInboundNode were lost in #29572. My understanding is addFoundNode is for a node you have not contacted directly (and are unsure if is available) whereas addInboundNode is for adding nodes that have contacted the local node and we can verify they are active. handleAddNode seems to be the consolidation of those two methods, yet it bumps the node in the bucket (updating it's IP addr) even if the node was not an inbound. This PR fixes this. It wasn't originally caught in tests like TestTable_addSeenNode because the manipulation of the node object actually modified the node value used by the test. New logic is added to reject non-inbound updates unless the sequence number of the (signed) ENR increases. Inbound updates, which are published by the updated node itself, are always accepted. If an inbound update changes the endpoint, the node will be revalidated on an expedited schedule. Co-authored-by: Felix Lange <fjl@twurst.com>

Node discovery periodically revalidates the nodes in its table by sending PING, checking if they are still alive. I recently noticed some issues with the implementation of this process, which can cause strange results such as nodes dropping unexpectedly, certain nodes not getting revalidated often enough, and bad results being returned to incoming FINDNODE queries. In this change, the revalidation process is improved with the following logic: - We maintain two 'revalidation lists' containing the table nodes, named 'fast' and 'slow'. - The process chooses random nodes from each list on a randomized interval, the interval being faster for the 'fast' list, and performs revalidation for the chosen node. - Whenever a node is newly inserted into the table, it goes into the 'fast' list. Once validation passes, it transfers to the 'slow' list. If a request fails, or the node changes endpoint, it transfers back into 'fast'. - livenessChecks is incremented by one for successful checks. Unlike the old implementation, we will not drop the node on the first failing check. We instead quickly decay the livenessChecks give it another chance. - Order of nodes in bucket doesn't matter anymore. I am also adding a debug API endpoint to dump the node table content. Co-authored-by: Martin HS <martin@swende.se>

) In ethereum#29572, I assumed the revalidation list that the node is contained in could only ever be changed by the outcome of a revalidation request. But turns out that's not true: if the node gets removed due to FINDNODE failure, it will also be removed from the list it is in. This causes a crash. The invariant is: while node is in table, it is always in exactly one of the two lists. So it seems best to store a pointer to the current list within the node itself.

It seems the semantic differences between addFoundNode and addInboundNode were lost in ethereum#29572. My understanding is addFoundNode is for a node you have not contacted directly (and are unsure if is available) whereas addInboundNode is for adding nodes that have contacted the local node and we can verify they are active. handleAddNode seems to be the consolidation of those two methods, yet it bumps the node in the bucket (updating it's IP addr) even if the node was not an inbound. This PR fixes this. It wasn't originally caught in tests like TestTable_addSeenNode because the manipulation of the node object actually modified the node value used by the test. New logic is added to reject non-inbound updates unless the sequence number of the (signed) ENR increases. Inbound updates, which are published by the updated node itself, are always accepted. If an inbound update changes the endpoint, the node will be revalidated on an expedited schedule. Co-authored-by: Felix Lange <fjl@twurst.com>

Node discovery periodically revalidates the nodes in its table by sending PING, checking if they are still alive. I recently noticed some issues with the implementation of this process, which can cause strange results such as nodes dropping unexpectedly, certain nodes not getting revalidated often enough, and bad results being returned to incoming FINDNODE queries. In this change, the revalidation process is improved with the following logic: - We maintain two 'revalidation lists' containing the table nodes, named 'fast' and 'slow'. - The process chooses random nodes from each list on a randomized interval, the interval being faster for the 'fast' list, and performs revalidation for the chosen node. - Whenever a node is newly inserted into the table, it goes into the 'fast' list. Once validation passes, it transfers to the 'slow' list. If a request fails, or the node changes endpoint, it transfers back into 'fast'. - livenessChecks is incremented by one for successful checks. Unlike the old implementation, we will not drop the node on the first failing check. We instead quickly decay the livenessChecks give it another chance. - Order of nodes in bucket doesn't matter anymore. I am also adding a debug API endpoint to dump the node table content. Co-authored-by: Martin HS <martin@swende.se>

) In ethereum#29572, I assumed the revalidation list that the node is contained in could only ever be changed by the outcome of a revalidation request. But turns out that's not true: if the node gets removed due to FINDNODE failure, it will also be removed from the list it is in. This causes a crash. The invariant is: while node is in table, it is always in exactly one of the two lists. So it seems best to store a pointer to the current list within the node itself.

Node discovery periodically revalidates the nodes in its table by sending PING, checking if they are still alive. I recently noticed some issues with the implementation of this process, which can cause strange results such as nodes dropping unexpectedly, certain nodes not getting revalidated often enough, and bad results being returned to incoming FINDNODE queries. In this change, the revalidation process is improved with the following logic: - We maintain two 'revalidation lists' containing the table nodes, named 'fast' and 'slow'. - The process chooses random nodes from each list on a randomized interval, the interval being faster for the 'fast' list, and performs revalidation for the chosen node. - Whenever a node is newly inserted into the table, it goes into the 'fast' list. Once validation passes, it transfers to the 'slow' list. If a request fails, or the node changes endpoint, it transfers back into 'fast'. - livenessChecks is incremented by one for successful checks. Unlike the old implementation, we will not drop the node on the first failing check. We instead quickly decay the livenessChecks give it another chance. - Order of nodes in bucket doesn't matter anymore. I am also adding a debug API endpoint to dump the node table content. Co-authored-by: Martin HS <martin@swende.se>

) In ethereum#29572, I assumed the revalidation list that the node is contained in could only ever be changed by the outcome of a revalidation request. But turns out that's not true: if the node gets removed due to FINDNODE failure, it will also be removed from the list it is in. This causes a crash. The invariant is: while node is in table, it is always in exactly one of the two lists. So it seems best to store a pointer to the current list within the node itself.

It seems the semantic differences between addFoundNode and addInboundNode were lost in ethereum#29572. My understanding is addFoundNode is for a node you have not contacted directly (and are unsure if is available) whereas addInboundNode is for adding nodes that have contacted the local node and we can verify they are active. handleAddNode seems to be the consolidation of those two methods, yet it bumps the node in the bucket (updating it's IP addr) even if the node was not an inbound. This PR fixes this. It wasn't originally caught in tests like TestTable_addSeenNode because the manipulation of the node object actually modified the node value used by the test. New logic is added to reject non-inbound updates unless the sequence number of the (signed) ENR increases. Inbound updates, which are published by the updated node itself, are always accepted. If an inbound update changes the endpoint, the node will be revalidated on an expedited schedule. Co-authored-by: Felix Lange <fjl@twurst.com>

internal/testlog: fix level matching

bc62df2

fjl requested a review from zsfelfoldi as a code owner April 18, 2024 07:59

fjl changed the title ~~p2p/discover: new node revalidation logic~~ p2p/discover: improved node revalidation Apr 18, 2024

fjl added 20 commits April 20, 2024 11:58

p2p/discover: add debug API

8f4c72c

cmd/devp2p: add discv4 RPC server

37149d0

cmd/devp2p: add listen command

1a6a64b

cmd/devp2p: fix import

f8c7448

p2p/discover: new revalidation

df8e793

p2p/discover: fix build

70b8918

p2p/discover: update

0a4b314

p2p/discover: fix some tests

fad3fe7

p2p/discover: fix shutdown hang

f2ac692

p2p/discover: fix test

00e71f5

p2p/discover: fix spin condition

1d896e8

p2p/discover: add live flag in debug node

dc3f78b

p2p/discover: return nextTime from run()

c1afb3c

p2p/discover: simplify rand timer

7667ebf

p2p/discover: move back to fast list when endpoint changed or dead

b44673f

p2p/discover: fix crash in move

822b059

p2p/discover: improve list moving

c2ad0a7

p2p/discover: faster

0b44fc4

p2p/discover: remove logging checks in for dead nodes (it's always zero)

30eb2bf

p2p/discover: faster decay

81e553f

fjl force-pushed the discover-reval-new branch from 623b811 to 81e553f Compare April 22, 2024 08:01

holiman reviewed Apr 24, 2024

View reviewed changes

p2p/discover/table_reval.go Outdated Show resolved Hide resolved

holiman reviewed Apr 24, 2024

View reviewed changes

p2p/discover/table_reval.go Outdated Show resolved Hide resolved

fjl commented Apr 24, 2024

View reviewed changes

p2p/discover/table_reval.go Outdated

if !resp.didRespond {

// Revalidation failed.

n.livenessChecks /= 5

Copy link

Contributor Author

fjl Apr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use / 3

fjl and others added 3 commits April 24, 2024 16:20

Update p2p/discover/table_reval.go

0048a0f

Co-authored-by: Martin HS <martin@swende.se>

p2p/discover: fix addedAt time

041ce1b

p2p/discover: rename add node methods

2c2c7f7

This is to better reflect their purpose. The previous naming of 'seen' and 'verified' was kind of arbitrary, especially since 'verified' was the stricter one.

p2p/discover: change to / 3

6e37d52

fjl force-pushed the discover-reval-new branch from f26076b to 6e37d52 Compare May 21, 2024 12:29

p2p/discover: explain

9ec2553

fjl commented May 22, 2024

View reviewed changes

p2p/discover/table.go Show resolved Hide resolved

p2p/discover: shuffle in appendLiveNodes

de7e1c7

lightclient approved these changes May 22, 2024

View reviewed changes

p2p/discover/table_reval.go Show resolved Hide resolved

fjl added 4 commits May 23, 2024 11:36

p2p/discover: fix flaky test

4ace554

p2p/discover: add documentation

16b4386

p2p: add accessors for discovery instances

4386e8a

node: add p2p debug API

6101ec3

fjl merged commit 6a9158b into ethereum:master May 23, 2024
2 of 3 checks passed

fjl added this to the 1.14.4 milestone May 23, 2024

lightclient mentioned this pull request May 24, 2024

p2p/discover: fix update logic in handleAddNode #29836

Merged

This was referenced Jun 5, 2024

ethereum 1.14.4 Homebrew/homebrew-core#173743

Merged

ethereum 1.14.5 Homebrew/homebrew-core#173874

Merged

pratikspatil024 mentioned this pull request Jun 13, 2024

p2p: cherry-pick commits from geth for peering issues maticnetwork/bor#1267

Merged

18 tasks

buddh0 mentioned this pull request Nov 8, 2024

upstream: Prague code merge [v1.13.15, v1.14.11] bnb-chain/bsc#2753

Closed

buddh0 mentioned this pull request Nov 22, 2024

upstream: Prague code merge [v1.13.15, v1.14.11] bnb-chain/bsc#2761

Merged

This was referenced Dec 9, 2024

release: prepare for release v1.5.1-alpha bnb-chain/bsc#2789

Merged

release: prepare for release v1.5.1-alpha (#2789) bnb-chain/bsc#2790

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

p2p/discover: improved node revalidation #29572

p2p/discover: improved node revalidation #29572

fjl commented Apr 18, 2024 •

edited

Loading

fjl Apr 24, 2024

lightclient left a comment

p2p/discover: improved node revalidation #29572

p2p/discover: improved node revalidation #29572

Conversation

fjl commented Apr 18, 2024 • edited Loading

fjl Apr 24, 2024

Choose a reason for hiding this comment

lightclient left a comment

Choose a reason for hiding this comment

fjl commented Apr 18, 2024 •

edited

Loading