Limit scope of queries to k closest peers #97

dirkmc · 2019-04-10T21:57:52Z

Stop querying once we have successfully queried >= K peers, and the only remaining peers waiting to be queried are further from the key than those K peers.

I also fixed a bug where we would return peers that we hadn't successfully connected to and refactored the Query class.

dirkmc · 2019-04-10T22:00:29Z

There are four places we perform queries:

getClosestPeers()
findPeer()
findProviders()
getMany()

I think it makes sense to implement this behaviour for the first two, but I'm not sure about findProviders() and getMany(). @jhiesey @vasco-santos @jacobheun I'd like to hear your thoughts.

vasco-santos

I totally agree on moving in this direction! If the other folks agree, I will have a deeper look to the PR codebase

jacobheun · 2019-04-11T11:28:09Z

I think the same logic applies for findProviders() and getMany(). Is your concern around these two not going deep enough into the network to find potentially closer peers?

dirkmc · 2019-04-11T12:57:00Z

findProviders() and getMany() use disjoint paths whereas getClosestPeers() and findPeer() do not, so I wanted to consider what effect that might have.

For example when running getMany(), we first work out the number of disjoint paths, which will be equal to alpha (ie 3). Results are divided among the paths, such that each path can return at most Math.ceil(valueCount / numPaths). So if we attempt to fetch 16 values with getMany() (as IPNS does by default) the number of values per path will be Math.ceil(16 / 3) = 6. If K=20, the value is stored at 20 peers. Consider a topology where there are 8 values along one path, 9 values along the second path and 3 along the third path. In the best case we will return 6, 6 and 3 = 15 values (because the maximum per path is 6).

Limiting the scope of the query makes it run faster, because we examine less nodes, but it also makes it less likely that we will discover all possible paths to the values, meaning getMany() is more likely to fail.

jacobheun · 2019-04-11T13:45:19Z

Limiting the scope of the query makes it run faster, because we examine less nodes, but it also makes it less likely that we will discover all possible paths to the values, meaning getMany() is more likely to fail.

It sounds like this is where properly streaming responses would make it easier to achieve both. Until we add the streaming responses, we'll likely have to have the limits on those two be higher, because that seems likely to fail a lot. The math is not in IPNS's favor.

dirkmc · 2019-04-11T20:22:03Z

Streaming responses should help. For now we could also decrease the required number of values for IPNS to be less than 16. There can only be one source crypto key for IPNS records, which in practice usually means only one source node, so IPNS records will almost always all have the same value for the same key (they would only differ when an IPNS publish hasn't reached all the nodes that a previous IPNS publish reached).

I implemented a streaming version of the DHT in this PR: #82. It's waiting on async/await PRs for a few dependencies to be approved.

test/kad-dht.spec.js

vasco-santos · 2019-04-12T11:38:05Z

I agree with @dirkmc and starting by targetting getClosestPeers() and findPeer()!

After that, once we introduce the stream we figure out what limits we can add to the remaining two.

dirkmc · 2019-04-13T10:30:36Z

I think I've found a reason that the find peer test is often failing.

I noticed that by default we limit the number of peers returned from the FIND_NODE RPC to 6, whereas the Kademlia spec (and go-libp2p-kad-dht) specify that it should be KValue (20 by default).

This will cause a problem in networks where many nodes have accurate knowledge of the closest peers to a particular key (eg if that key is requested frequently). For example in the following scenario:

alpha (3) peers are chosen to initiate a getClosestPeers() request
each of the 3 peers returns the 6 closest peers to the key
each of those 6 closest peers return each other
the total number of peers queried will end up being low, potentially as low as 10

This will cause a problem for some DHT operations, for example when doing a PUT to the DHT, this will cause a problem because it needs to store the PUT value on at least KValue (20) nodes.

Does anyone know why this limit of 6 was chosen?

Here's what the Kademlia spec says:

FIND NODE takes a 160-bit ID as an argument. The recipient of the RPC returns <IP address, UDP port, Node ID> triples for the k nodes it knows about closest to the target ID. These triples can come from a single k-bucket, or they may come from multiple k-buckets if the closest k-bucket is not full. In any case, the RPC recipient must return k items (unless there are fewer than k nodes in all its k-buckets combined, in which case it returns every node it knows about).

dirkmc · 2019-04-13T16:08:38Z

I fixed a bug in the queueing logic, and changed the test so that the topology is more realistic (nodes know about peers close to them), so the tests now pass consistently

jacobheun · 2019-04-15T10:58:14Z

Does anyone know why this limit of 6 was chosen?

No, and it looks like it's been there since day 1, #1. We should use what's in the spec unless we have clear, documented reasoning for deviating.

It looks like the bundlesize is failing, with the added code this is understandable. We should bump https://github.com/libp2p/js-libp2p-kad-dht/blob/v0.14.12/.aegir.js#L2 to 192kb.

vasco-santos · 2019-04-15T13:29:20Z

@dirkmc did you check if in go land, they also use 6 ? If they do not, I am totally in favor of following the spec

jacobheun · 2019-04-15T13:42:24Z

whereas the Kademlia spec (and go-libp2p-kad-dht) specify that it should be KValue (20 by default)

While go is following the spec on this item, we really should avoid doing things here based on the go implementation. The go implementation has known issues and regularly references needing to follow the Kademlia spec properly. I think we should focus energy in making sure we're abiding by original Kademlia spec and pushing the libp2p spec for it, libp2p/specs#108, forward, rather than attempting to mimic the go implementation.

As long as we're not breaking interop, and we document where and why we're deviating from go (and informing them if they're not aware), I think that's the approach we should be taking.

test/query.spec.js

dirkmc · 2019-04-16T03:26:32Z

I changed the number of closer peers to be returned to K (@vasco-santos the go DHT does use the k value also: https://github.com/libp2p/go-libp2p-kad-dht/blob/master/handlers.go#L23)

src/query.js

jacobheun

LGTM

dirkmc · 2019-04-16T10:10:00Z

@jacobheun do you think we should just enable this for getClosestPeers() and findPeer(), or for all four queries including findProviders() and getMany()?

jacobheun · 2019-04-16T10:22:23Z

Perhaps we should start with getClosestPeers() and findPeer(), and then make a determination about the other two? Outlining the pros and cons for them would be helpful.

It might be valuable to have the logic in findProviders() moreso than getMany(), but it we can't yet stream, having outright failures is not good.

One of the current unfortunate things about findProviders() having that threshold, is that getting new content can be potentially impossible to get. For example, in ipfs-bitswap, when I try to get new content you've created, if you're the only provider when I search, I'll never get the content because 1 is less than my minimum threshold. It will eventually timeout because I can't find 20 peers. For DHT performance in IPFS we have to fix that.

Maybe it's worth adding findProviders until we can properly support streaming?

vasco-santos · 2019-04-16T10:44:04Z

@jacobheun totally agree that we should follow the kademlia spec first and then try to iterate on improving it. In that time, we may check the go implementation looking for possible optimizations they did to the paper. We just need to make this visible for everyone being on the same page.

Regarding the 4 functions that perform queries, I think we should do this by steps. I would go with these two now. However, I agree that it would be worth exploring this for findProviders as well. My proposal is to get the first two merged and then work on adding the findProviders in a separate PR. That way, we can easily test the differences and revert findProviders if needed. What do you think? @dirkmc @jacobheun

dirkmc · 2019-04-16T10:48:48Z

Thinking about it a bit more, we may need to decide for each case in which we call any of those four functions. For example when doing a DHT PUT we call getClosestPeers() and store the value at each of those peers. We want to be sure that those really are the closest peers or it will decrease the chance of getMany() finding enough values.

vasco-santos · 2019-04-16T14:06:39Z

@dirkmc do you think it is worth putting together a list of all the possible cases and discuss for each of them which is the best solution. We should also get @jhiesey involved on these decisions

dirkmc · 2019-04-17T01:27:34Z

Yes I think that makes sense, I will put together a list

dirkmc · 2019-04-17T01:42:36Z

The paths through the code that create a Query are:

getMany()
findProviders()
findPeer()
- As a public API method
- Called by random walk to discover peers
getClosestPeers()
- As a public API method
- Called by provide() to add self as a provider for a key
- Called by put() to find nodes on which to store a key / value in the DHT

jacobheun · 2019-04-17T08:50:34Z

So getClosestPeers() belongs only to write actions to the network. This makes sense because we want to be accurate when we're storing values. The downside of this is that getClosestPeers should, by design, be "slow", moreso for newer peers with sparse routing tables.

For the fetching functions getMany, findPeer, findProviders we should be as quick as possible. As mentioned, I think this may warrant reducing the tolerance on number of peers required for a success. Finding 16 records when there are potentially only 20 nodes who have values feels too high to me, especially if the query should be relatively quick. With a network that is still in flux, with so many new nodes coming and going, I think lowering the threshold at least initially is a reasonable approach.

For eventually getting the tolerance back up to a higher level (16), there's discussion about adopting sloppiness at libp2p/specs#163. While it's not the same concept of sloppiness, I was thinking about how being sloppy with the number of writes, combined with eventual accuracy, might be helpful in alleviating some of this. For example, we discussed concerns about spending enough time finding closest peers when performing a PUT, so maybe exiting early isn't ideal, there could be closer peers. If we operated "sloppily" we could perform a PUT with the closest kValue (20) we have when we would normally end the query, and then continue searching looking for eventual accuracy. Once the query was "done", we could PUT to any closer peers we hadn't already. This breaks the Spec a bit because we're potentially writing to more than kValue peers, but I think there is merit in the fault tolerance it provides to the network.

dirkmc · 2019-04-17T13:14:57Z

That's an interesting idea. I wonder in practice how often we won't find the actual closest 20 peers. Maybe it's worth merging this in and trying it out in the real network?

jacobheun · 2019-04-17T13:31:27Z

Maybe it's worth merging this in and trying it out in the real network?

Yes, I think getting this out so we can test with it and iterate is a good path forward.

vasco-santos · 2019-04-17T13:56:33Z

Thanks for the list @dirkmc

I liked your proposal @jacobheun

I will merge this as it looks good and we need to test it, but I believe we will want to iterate on this.
There is important information that we will need for the next iterations, so I will create an issue for tracking the good ideas from this PR and we can keep discussing there.

ghost assigned dirkmc Apr 10, 2019

ghost added the status/in-progress In progress label Apr 10, 2019

vasco-santos reviewed Apr 11, 2019

View reviewed changes

dirkmc force-pushed the fix/find-peer-limit-scope branch 3 times, most recently from c955bc5 to c59a738 Compare April 11, 2019 22:59

feat: limit scope of queries to k closest peers

57ce00a

dirkmc force-pushed the fix/find-peer-limit-scope branch from c59a738 to 57ce00a Compare April 12, 2019 00:14

dirkmc commented Apr 12, 2019

View reviewed changes

test/kad-dht.spec.js Outdated Show resolved Hide resolved

dirkmc marked this pull request as ready for review April 12, 2019 00:16

dirkmc added 2 commits April 14, 2019 00:05

fix: queueing logic

67e46b9

test: fix find peer test

6fabae8

jacobheun reviewed Apr 15, 2019

View reviewed changes

test/query.spec.js Outdated Show resolved Hide resolved

dirkmc added 3 commits April 16, 2019 11:00

chore: increase maximum allowed bundle size

abc0df9

fix: find peer should return k peers

89c3123

test: make scoped query test more specific

6710c2f

jacobheun reviewed Apr 16, 2019

View reviewed changes

src/query.js Outdated Show resolved Hide resolved

fix: ensure queue callback always called

08296cf

jacobheun approved these changes Apr 16, 2019

View reviewed changes

vasco-santos approved these changes Apr 17, 2019

View reviewed changes

vasco-santos merged commit f03619e into libp2p:master Apr 17, 2019

ghost removed the status/in-progress In progress label Apr 17, 2019

vasco-santos mentioned this pull request Apr 18, 2019

Scope of queries #103

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit scope of queries to k closest peers #97

Limit scope of queries to k closest peers #97

dirkmc commented Apr 10, 2019

dirkmc commented Apr 10, 2019

vasco-santos left a comment

jacobheun commented Apr 11, 2019

dirkmc commented Apr 11, 2019

jacobheun commented Apr 11, 2019

dirkmc commented Apr 11, 2019

vasco-santos commented Apr 12, 2019

dirkmc commented Apr 13, 2019

dirkmc commented Apr 13, 2019

jacobheun commented Apr 15, 2019

vasco-santos commented Apr 15, 2019

jacobheun commented Apr 15, 2019

dirkmc commented Apr 16, 2019

jacobheun left a comment

dirkmc commented Apr 16, 2019

jacobheun commented Apr 16, 2019

vasco-santos commented Apr 16, 2019

dirkmc commented Apr 16, 2019

vasco-santos commented Apr 16, 2019

dirkmc commented Apr 17, 2019

dirkmc commented Apr 17, 2019

jacobheun commented Apr 17, 2019

dirkmc commented Apr 17, 2019

jacobheun commented Apr 17, 2019

vasco-santos commented Apr 17, 2019

Limit scope of queries to k closest peers #97

Limit scope of queries to k closest peers #97

Conversation

dirkmc commented Apr 10, 2019

dirkmc commented Apr 10, 2019

vasco-santos left a comment

Choose a reason for hiding this comment

jacobheun commented Apr 11, 2019

dirkmc commented Apr 11, 2019

jacobheun commented Apr 11, 2019

dirkmc commented Apr 11, 2019

vasco-santos commented Apr 12, 2019

dirkmc commented Apr 13, 2019

dirkmc commented Apr 13, 2019

jacobheun commented Apr 15, 2019

vasco-santos commented Apr 15, 2019

jacobheun commented Apr 15, 2019

dirkmc commented Apr 16, 2019

jacobheun left a comment

Choose a reason for hiding this comment

dirkmc commented Apr 16, 2019

jacobheun commented Apr 16, 2019

vasco-santos commented Apr 16, 2019

dirkmc commented Apr 16, 2019

vasco-santos commented Apr 16, 2019

dirkmc commented Apr 17, 2019

dirkmc commented Apr 17, 2019

jacobheun commented Apr 17, 2019

dirkmc commented Apr 17, 2019

jacobheun commented Apr 17, 2019

vasco-santos commented Apr 17, 2019