Expunge any added "invalid" nodes from k-buckets #226

joshuakarp · 2021-08-16T08:15:27Z

Specification

Any node that is added that turns out to be invalid, should be expunged from the any state in the nodes.

We need a means of handling any nodes that are found to be invalid.

I make the initial assumption that an "invalid" keynode can only be introduced by a manual insertion into the buckets database. Therefore, an "invalid" node can only be inserted from the nodes add CLI call, or from the specified seed nodes when we instantiate a keynode. I'm of the opinion that nodes added from discovery (through Kademlia) should be expected to be correct. However, do note that any nodes added by nodes add can be discovered through Kademlia. Thus, we're currently at a state where "invalid" nodes could be discovered.

From my perspective, there's a couple of ways a node can be deemed as "invalid":

invalid host/port: any malformed IP is caught at the CLI level, but an invalid well-formed IP + port can't be ascertained easily (a timeout of connection can signify that the node is simply offline).
invalid node ID: malformed node IDs are caught at CLI level, and we also check the node ID against the certificate on connection.

As such, we don't currently have an easy way of determining an "invalid" node.

I believe the best approach to this is making appropriate restrictions on how/what we add to the buckets database. See additional commentary in the below section regarding this.

Additional context

This was originally brought up in review of the nodes CLI refactoring.

When discussing the nodes add command, I made the initial assumption that we would need to create a connection to the other node before we could successfully add to the database. This would ensure that the node details must be correct.

However, this is not the case https://gitlab.com/MatrixAI/Engineering/Polykey/js-polykey/-/merge_requests/198#note_641488589:

I'm not sure about this constraint. PK nodes should be able to be offline at random times, and it's a fairly lazy network.
I reckon you should be able to add nodes into the graph without initial connection.
The problem with kademlia you mentioned is known in literature. It's called kademlia poisoning. Variants are like sybil attacks as well.
We haven't investigated mitigations against it right now, and the kademlia system can be suspect to DDOS attack as well.
The problem you say can also occur by just someone subverting its own kademlia database, so the mitigation for this can not be on the assumption that all nodes are friendly. It requires a different approach. But we can address this later after the release.
For now, you just want to ensure that when you do the discovery you might want to avoid adding nodes into your search if that node is offline or cannot be accessed (you may try to discard them early) so you don't keep around dead nodes in your node graph.
Of course you don't want delete dead nodes if they part of your gestalt though.
We will need to go through a proper design overview to work through these issues.

Tasks

...
...
...

The text was updated successfully, but these errors were encountered:

CMCDragonkai · 2022-01-24T08:23:56Z

@tegefaulkes can you have a review of this issue in relation to #310 as well given you're touching that work, please write a comment as to how this may be implemented, whether it should go into #310 or a later PR.

CMCDragonkai · 2022-01-28T04:03:22Z

Apparently there's a method in NodeGraph to remove nodes, but it just isn't used yet.

We have to develop a policy for when the nodegraph's bad nodes get garbage collected.

@joshuakarp can you check how other usages of Kademlia deal with this as well? Such as in https://hypercore-protocol.org/protocol/#hyperswarm just search through the src or issues, use sourcegraph, it can help, like how I did it with fsck-objects.c in working out the GC algorithm for dangling commit objects for VaultInternal. Once you've done this flesh out the spec above.

CMCDragonkai · 2022-03-11T05:44:09Z

This relates to the kademlia spec about expiring old nodes in a bucket. In kademlia, we actually need to ping the old node when we hit the bucket limit instead of just dropping the old node. We should do this by connecting NodeGraph with NodeManager.

To do this, we need to move some logic out of NodeGraph into NodeManager, such as checking the bucket limit, and performing the ping on the old node.

Additionally when we try to connect to a node id, and this fails for variety of reasons:

Certificates are invalid and thus node ids don't match
Or that we cannot connect to it at all

Then we should also just delete the record from the NodeGraph. Not sure if it requires any kind of retry here.

CMCDragonkai · 2022-03-11T05:44:56Z

Since this has some deep interplay with DB transactions as well since the NodeGraph is being mutated by potentially NodeConnectionManager and NodeManager, then I'll be working on this.

CMCDragonkai · 2022-03-11T05:59:55Z

We can think of this issue as figuring out the NodeId removal policies.

Whereas #344 is about figuring out NodeId adding policies.

CMCDragonkai · 2022-03-30T08:24:48Z

Invalid node IDs could happen due to reconnection and the certificate no longer verifies. This should integrate into node ID changes and any natural forgetting/revoking process. We may want to be aggressive or passive with pruning the NG to reduce the amount of Kademlia poisoning or whatever.

CMCDragonkai · 2022-06-14T03:00:33Z

As with #150 (comment), we are leaving this to the natural invalidation process of the node graph. Further p2p simulations are needed before we can see if explicit garbage collection process is needed.

joshuakarp added the development Standard development label Aug 16, 2021

tegefaulkes mentioned this issue Jan 24, 2022

Extracting Node Connection Management out of NodeManager to NodeConnectionManager #310

Merged

46 tasks

CMCDragonkai assigned CMCDragonkai and joshuakarp and unassigned CMCDragonkai Jan 28, 2022

CMCDragonkai added the design Requires design label Jan 28, 2022

joshuakarp mentioned this issue Feb 3, 2022

Investigate means of authenticating a node ID when adding to NodeGraph #322

Closed

9 tasks

This was referenced Feb 18, 2022

Seed node not adding details of connecting node to its NodeGraph #344

Closed

Testnet Deployment #326

Closed

CMCDragonkai assigned CMCDragonkai and unassigned joshuakarp Mar 11, 2022

tegefaulkes mentioned this issue Mar 17, 2022

Setting a Node involves NodeManager and NodeGraph, where NodeManager handles bucket overflow policy #359

Closed

10 tasks

tegefaulkes self-assigned this Mar 17, 2022

tegefaulkes mentioned this issue Jun 1, 2022

Preparing for Testnet Deployment #378

Merged

29 tasks

CMCDragonkai closed this as completed Jun 14, 2022

CMCDragonkai added the r&d:polykey:core activity 3 Peer to Peer Federated Hierarchy label Jul 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expunge any added "invalid" nodes from k-buckets #226

Expunge any added "invalid" nodes from k-buckets #226

joshuakarp commented Aug 16, 2021

CMCDragonkai commented Jan 24, 2022

CMCDragonkai commented Jan 28, 2022

CMCDragonkai commented Mar 11, 2022

CMCDragonkai commented Mar 11, 2022

CMCDragonkai commented Mar 11, 2022

CMCDragonkai commented Mar 30, 2022

CMCDragonkai commented Jun 14, 2022

Expunge any added "invalid" nodes from k-buckets #226

Expunge any added "invalid" nodes from k-buckets #226

Comments

joshuakarp commented Aug 16, 2021

Specification

Additional context

Tasks

CMCDragonkai commented Jan 24, 2022

CMCDragonkai commented Jan 28, 2022

CMCDragonkai commented Mar 11, 2022

CMCDragonkai commented Mar 11, 2022

CMCDragonkai commented Mar 11, 2022

CMCDragonkai commented Mar 30, 2022

CMCDragonkai commented Jun 14, 2022