-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DHT provider records: adopt sloppiness, rejection and backtracking from Coral #163
Comments
@raulk It's my belief that a lot of benefits of this approach are provided by improved traversal code, which I was alluding to in our talk about plans for Q2. In particular with better control over termination and concurrency in traversals, it will be easier to adopt some of these approaches. I'm not sure if an issue exists for it (better traversal logic). |
@anacrolix there's an issue for termination/cutoff: libp2p/go-libp2p-kad-dht#290. Could you file an issue in kad-dht for your proposals on concurrency? As much as we need to fix the query traversal logic in go-libp2p, I'm afraid it won't alleviate the provider overload problem described herein. |
could you elaborate a bit on
Is this because the spill-over always backs further away from the query target when a provider is over capacity? And a more optimal pattern would be for an overburdened node to ask nodes closer to the target if they can re-provide? |
Yes, eventually the radius where a key is advertised key expands the more popular it gets (and the more providers are willing to serve it). If older providers fail to reprovide, and the logic to select the placement for an advertisement backs as soon as we encounter the first rejection, this will end up creating a hollow centre with an ever-growing radius around it. (Note that my reasoning is out of intuition, not formal modelling). |
Note that the most robust and long-living nodes should have preference in routing tables, so that generally your search for nodes to store data on will favour those stronger nodes. i.e. You're most likely to store data on the best nodes, right from the get-go, with a correct implementation. That may be obscured at the moment by libp2p/go-libp2p-kad-dht#283 and libp2p/go-libp2p-kad-dht#254, which allow the property to be broken by transient failures in nodes or networks. |
@anacrolix I believe you’re mixing up things. This is about solving provider record overload affecting neighbours of popular hashes. It has no relation with the routing table whatsoever. Your routing table is a means to get you to places, not a prescription of where data is stored. |
Peers generally take different paths to find content so this should be fine. However, we may need to add some randomness into how we put peers into our routing table (or maybe add some randomness into how we tell peers about the next N closer peers).
We have to use disjoint paths here. Traversing further will only yield sibyls.
Won't they just tell us about closer nodes? That is, when putting a record, we'll first find the 20 closest nodes, then see that these nodes are overloaded and start backtracking. |
Context
Provider records in our Kad DHT implementation are equivalent to the concept of pointers as proposed in the Coral DHT [0].
Unfortunately they are implemented half-way in go-libp2p, and probably in other languages too. That makes them scale poorly.
Problem statement
Nodes close to a popular hash
H
get swamped with provider records. Every time a new provider appears for the same hashH
, it advertises on exactly the same nodes. We currently advertise at the KValue=20 nodes closest to the hashH
. Nodes that download hashH
become providers in turn of that hash, therefore compounding the problem when hashes become very popular.There is no concept of backpressure or spill over for provider records. No limits, throttling policies or validations are enforced. Peers swallow any provider record they receive. This ends up causing all kinds of local negative effects.
Proposal
Adopt these features of Coral:
Rejection/throttling: nodes restrict the amount of provider records stored per hash, and the rate of registration. The limit can be locally enforced and need not be synchronised across the network. Attacks by injecting uncollaborative nodes (with low or zero local limits) are nullified by spill-over.
Sloppy hashing/spill over: on a rejection, the requester backtracks along the traversal path and tries to store the record in earlier nodes.
This has the effect of expanding the radius of advertisement of popular hashes, such that subsequent lookups for that content will result in faster resolutions.
However, depending on how lookups are implemented, this could end up overloading new providers, and weaning traffic off elder providers. I feel this is an oversight in Coral.
Backtracking: the process by which we walk back our traversal trail hitting earlier nodes along the way, attempting to register our record further from the desired target, thus expanding the radius where the hash is advertised.
Coral's approach is insecure, insofar a single adversary along the path rejecting your record will derail you and cause you to backtrack, even if closer peers to the key had spare capacity.
To resist such attacks, we can continue traversing forward a few more steps to confirm the rejection, or we can traverse disjoint paths.
Additionally it does not play well with TTLs and disappearing content. If a hash has spilled over several layers, and early registrants fail to reprovide (if they're offline or they pruned the hash), farther nodes holding recent registrations who happen to be over-capacity could preclude us from finding closer nodes that now have vacancies due to expired and unrenewed earlier records. This could result in progressive expansion of the radius, with a hollow centre.
See these instructional videos for reference:
[0] Freedman, Michael & Mazieres, David. (2003). Sloppy Hashing and Self-Organizing Clusters. Lecture Notes in Computer Science. 2735. 10.1007/978-3-540-45172-3_4.
The text was updated successfully, but these errors were encountered: