Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DHT provider records: adopt sloppiness, rejection and backtracking from Coral #163

Open
raulk opened this issue Apr 10, 2019 · 7 comments
Open
Assignees

Comments

@raulk
Copy link
Member

raulk commented Apr 10, 2019

Context

Provider records in our Kad DHT implementation are equivalent to the concept of pointers as proposed in the Coral DHT [0].

Unfortunately they are implemented half-way in go-libp2p, and probably in other languages too. That makes them scale poorly.

Problem statement

Nodes close to a popular hash H get swamped with provider records. Every time a new provider appears for the same hash H, it advertises on exactly the same nodes. We currently advertise at the KValue=20 nodes closest to the hash H. Nodes that download hash H become providers in turn of that hash, therefore compounding the problem when hashes become very popular.

There is no concept of backpressure or spill over for provider records. No limits, throttling policies or validations are enforced. Peers swallow any provider record they receive. This ends up causing all kinds of local negative effects.

Proposal

Adopt these features of Coral:

  1. rejection/throttling,
  2. sloppy hashing / spill over,
  3. backtracking.

Rejection/throttling: nodes restrict the amount of provider records stored per hash, and the rate of registration. The limit can be locally enforced and need not be synchronised across the network. Attacks by injecting uncollaborative nodes (with low or zero local limits) are nullified by spill-over.

Sloppy hashing/spill over: on a rejection, the requester backtracks along the traversal path and tries to store the record in earlier nodes.

This has the effect of expanding the radius of advertisement of popular hashes, such that subsequent lookups for that content will result in faster resolutions.

However, depending on how lookups are implemented, this could end up overloading new providers, and weaning traffic off elder providers. I feel this is an oversight in Coral.

Backtracking: the process by which we walk back our traversal trail hitting earlier nodes along the way, attempting to register our record further from the desired target, thus expanding the radius where the hash is advertised.

Coral's approach is insecure, insofar a single adversary along the path rejecting your record will derail you and cause you to backtrack, even if closer peers to the key had spare capacity.

To resist such attacks, we can continue traversing forward a few more steps to confirm the rejection, or we can traverse disjoint paths.

Additionally it does not play well with TTLs and disappearing content. If a hash has spilled over several layers, and early registrants fail to reprovide (if they're offline or they pruned the hash), farther nodes holding recent registrations who happen to be over-capacity could preclude us from finding closer nodes that now have vacancies due to expired and unrenewed earlier records. This could result in progressive expansion of the radius, with a hollow centre.


See these instructional videos for reference:


[0] Freedman, Michael & Mazieres, David. (2003). Sloppy Hashing and Self-Organizing Clusters. Lecture Notes in Computer Science. 2735. 10.1007/978-3-540-45172-3_4.

@raulk raulk changed the title DHT provider records: adopt sloppiness and backtracking from Coral DHT provider records: adopt sloppiness, rejection and backtracking from Coral Apr 10, 2019
@anacrolix
Copy link

anacrolix commented Apr 11, 2019

@raulk It's my belief that a lot of benefits of this approach are provided by improved traversal code, which I was alluding to in our talk about plans for Q2. In particular with better control over termination and concurrency in traversals, it will be easier to adopt some of these approaches. I'm not sure if an issue exists for it (better traversal logic).

@raulk
Copy link
Member Author

raulk commented Apr 11, 2019

@anacrolix there's an issue for termination/cutoff: libp2p/go-libp2p-kad-dht#290. Could you file an issue in kad-dht for your proposals on concurrency?

As much as we need to fix the query traversal logic in go-libp2p, I'm afraid it won't alleviate the provider overload problem described herein.

@yusefnapora
Copy link
Contributor

could you elaborate a bit on

This could result in progressive expansion of the radius, with a hollow centre.

Is this because the spill-over always backs further away from the query target when a provider is over capacity? And a more optimal pattern would be for an overburdened node to ask nodes closer to the target if they can re-provide?

@raulk
Copy link
Member Author

raulk commented Apr 16, 2019

@yusefnapora

Is this because the spill-over always backs further away from the query target when a provider is over capacity?

Yes, eventually the radius where a key is advertised key expands the more popular it gets (and the more providers are willing to serve it). If older providers fail to reprovide, and the logic to select the placement for an advertisement backs as soon as we encounter the first rejection, this will end up creating a hollow centre with an ever-growing radius around it. (Note that my reasoning is out of intuition, not formal modelling).

@anacrolix
Copy link

Note that the most robust and long-living nodes should have preference in routing tables, so that generally your search for nodes to store data on will favour those stronger nodes. i.e. You're most likely to store data on the best nodes, right from the get-go, with a correct implementation. That may be obscured at the moment by libp2p/go-libp2p-kad-dht#283 and libp2p/go-libp2p-kad-dht#254, which allow the property to be broken by transient failures in nodes or networks.

@raulk
Copy link
Member Author

raulk commented Apr 17, 2019

@anacrolix I believe you’re mixing up things. This is about solving provider record overload affecting neighbours of popular hashes. It has no relation with the routing table whatsoever. Your routing table is a means to get you to places, not a prescription of where data is stored.

@Stebalien
Copy link
Member

However, depending on how lookups are implemented, this could end up overloading new providers, and weaning traffic off elder providers. I feel this is an oversight in Coral.

Peers generally take different paths to find content so this should be fine. However, we may need to add some randomness into how we put peers into our routing table (or maybe add some randomness into how we tell peers about the next N closer peers).

To resist such attacks, we can continue traversing forward a few more steps to confirm the rejection, or we can traverse disjoint paths.

We have to use disjoint paths here. Traversing further will only yield sibyls.

Additionally it does not play well with TTLs and disappearing content. If a hash has spilled over several layers, and early registrants fail to reprovide (if they're offline or they pruned the hash), farther nodes holding recent registrations who happen to be over-capacity could preclude us from finding closer nodes that now have vacancies due to expired and unrenewed earlier records. This could result in progressive expansion of the radius, with a hollow centre.

Won't they just tell us about closer nodes? That is, when putting a record, we'll first find the 20 closest nodes, then see that these nodes are overloaded and start backtracking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Triage
Development

No branches or pull requests

7 participants