network: Suggest peer by address space gap (#2065)

* network/kademlia: proposed solution for peer suggestion in Kademlia by using address space gaps. A thorough description can be found here: ethersphere/SWIPs#32 Co-authored-by: Álvaro <kortatu@gmail.com>
ethersphere · Jan 13, 2020 · e7e98cf · e7e98cf
1 parent 2c9e315
commit e7e98cf
Show file tree

Hide file tree

Showing 6 changed files with 466 additions and 17 deletions.
diff --git a/network/README.md b/network/README.md
@@ -11,8 +11,8 @@ the latter on the downstream peer.
 
 Subscribe on StreamerPeer launches an incoming streamer that sends
 a subscribe msg upstream. The streamer on the upstream peer
-handles the subscribe msg by installing the relevant outgoing streamer
-. The modules now engage in a process of upstream sending a sequence of hashes of
+handles the subscribe msg by installing the relevant outgoing streamer.
+The modules now engage in a process of upstream sending a sequence of hashes of
 chunks downstream (OfferedHashesMsg). The downstream peer evaluates which hashes are needed
 and get it delivered by sending back a msg (WantedHashesMsg).
 
@@ -121,7 +121,7 @@ the constructor is the Run function itself. which takes a streamerpeer as argume
 ### provable streams
 
 The swarm  hash over the hash stream has many advantages. It implements a provable data transfer
-and provide efficient storage for receipts in the form of inclusion proofs useable for finger pointing litigation.
+and provide efficient storage for receipts in the form of inclusion proofs usable for finger pointing litigation.
 When challenged on a missing chunk, upstream peer will provide an inclusion proof of a chunk hash against the state of the
 sync stream. In order to be able to generate such an inclusion proof, upstream peer needs to store the hash index (counting consecutive hash-size segments) alongside the chunk data and preserve it even when the chunk data is deleted until the chunk is no longer insured.
 if there is no valid insurance on the files the entry may be deleted.
@@ -150,3 +150,89 @@ and simply iterate on index per bin when syncing with a peer.
 priority queues are used for sending chunks so that user triggered requests should be responded to first, session syncing second, and historical with lower priority.
 The request on chunks remains implemented as a dataless entry in the memory store.
 The lifecycle of this object should be more carefully thought through, ie., when it fails to retrieve it should be removed.
+
+## Address space gaps
+In order to optimize Kademlia load balancing, performance and peer suggestion, we define the concept of `address space gap`
+or simply `gap`. 
+A `gap` is a portion of the overlay address space in which the current node does not know any peer. It could be represented
+as a range of addresses: `0xxx`, meaning `0000-0111`
+
+The `proximity order of a gap` or `gap po` is the proximity order of that address space with respect to the nearest peer(s)
+in the kademlia connected table (and considering also the current node address). For example if the node address is `0000`, 
+the gap of addresses `1xxx` has proximity order 0. However the proximity order of the gap `01xx` has po 1.
+
+The `size of a gap` is defined as the number of addresses that could fit in it. If the area of the whole address space is 1,
+the `size of a gap` could be defined from the `gap po` as `1 / 2 ^ (po + 1)`. For example, our previous `1xxx` gap has a size of
+`1 / (2 ^ 1) = 1/2`. The size of `01xx` is `1 / (2 ^ 2) = 1/4`. 
+
+In order to increment performance of content retrieval and delivery the node should minimize the size of its gaps, because this 
+means that it knows peers near almost all addresses. If the minimum `gap` in the kademlia table is 4, it means that whatever 
+look up or forwarding done will be at least 4 po far away. On the other hand, if the node has a 0 po `gap`, it means that
+for half the addresses, the next jump will be still 0 po away!.
+
+### Gaps for peer suggestion
+The current kademlia bootstrap algorithm try to fill in the bins (or po spaces) until some level of saturation is reached.
+In the process of doing that, the `gaps` will diminish, but not in the optimal way.
+
+For example, if the node address is `00000000`, it is connected only with one peer in bin 0 `10000000` and the known 
+addresses for bin 0 are: `10000001` and `11000000`. The current algorithm we will take the first `callable` one, so 
+for example, it may suggest `10000001` as next peer. This is not optimal, as the biggest `gap` in bin 0 will still be 
+po 1 => `11xxxxxx`. If however, the algorithm is improved searching for a peer which covers a bigger `gap`, `11000000` would
+be selected and now the biggest `gaps` will be po2 => `111xxxx` and `101xxxx`.
+
+Additionally, even though the node does not have an address in a particular `gap`, it could still select the furthest away 
+from the current peers so it covers a bigger `gap`. In the previous example with node `00000000` and one peer already connected
+`10000000`, if the known addresses are `10000001` and `1001000`, the best suggestion would be the last one, because it is po 3
+from the nearest peer as opposed to `10000001` that is only po 7 away. The best case will cover a `gap` of po 3 size 
+(1/16 of area or 16 addresses) and the other one just po 7 size (1/256 area or 1 address).
+
+### Gaps and load balancing
+One additional benefit in considering `gaps` is load balancing. If the target addresses are distributed randomly 
+(although address popularity is another problem that can also be studied from the `gap` perspective), the request will
+be automatically load balanced if we try to connect to peers covering the bigger `gaps`. Continuing with our example,
+if in bin 0 we have peers `10000000` and `10000001` (Fig. 1), almost all addresses in space `1xxxxxxx`, that is, half of the 
+addresses will have the same distance from both peers. If we need to send to some of those address we will need to use
+one of those peers. This could be done randomly, always the first or with some load balancing accounting to use the least
+used one. 
+![Fig. 1](https://raw.githubusercontent.com/kortatu/swarm_doc/master/address_space_gaps-lb-1.png)
+Fig.1 - Closer peers needs an external Load Balancing mechanism
+
+This last method will still be useful, but if the `gap` filling strategy is used, most probably both peers will
+be separated enough that they never compete for an address and a natural load balancing will be made among them (for example,
+`10000000` and `11000000` will be used each for half the addresses in bin 0 (Fig. 2)).
+![Fig. 2](https://raw.githubusercontent.com/kortatu/swarm_doc/master/address_space_gaps-lb-2.png)
+Fig.2 - Peers chosen by space address gap have a natural load balancing
+### Implementation
+The search for gaps can be done easily using a proximity order tree or `pot`. Traversing the bins of a node, a `gap` is
+found if there is some of the po's missing starting from furthest (left). In each level the starting po to search for is the
+parent po (not 0, because in the second level, under a node of po=0, the minimum po that could be found is 1).
+
+Implementation of the function that looks for the bigger Gap in a `pot` can be seen in
+`pot.BiggestAddressGap`. That function returns the biggest gap in the form of a po and
+a node under the gap can be found.
+
+This function is used in `kademlia.suggestPeerInBinByGap`, which it returns a BzzAddress in a particular bin which fills
+ up the biggest address gap. This function is not used in `SuggestPeer`, but it will be enough to replace the call to 
+ `suggestPeerInBin` with the new one.
+
+ ### Further improvements
+ Instead of the size of a gap, maybe it could be more interesting to see the ratio between size and number of current 
+ peers serving that gap. If we have `n` current peers that are equidistant to a particular gap of size `s`,
+the load of each of these peers will be on average `s/n`. 
+We can define a gap's `temperature` as that number `s/n`. When looking for new peers to connect, instead of looking for
+bigger gaps we could look for `hotter` gaps.
+For example, if in our first example, we can't find a peer in `11xxxxxx` and we instead, used the best peer, we could end
+with the configuration in Fig. 3.
+![Fig. 3](https://raw.githubusercontent.com/kortatu/swarm_doc/master/address_space_gaps-lb-3.png)
+Fig. 3 - Comparing gaps temperature
+
+Here we still have `11xxxxxx` as the biggest gap (po=1, size 1/4), same size as `01xxxxxx`. But if consider temperature,
+`01xxxxxx` is hotter because is served only by our node `00000000`, being its temperature is `(1/4)/ 1 = 1/4`. However,
+`11xxxxxx` is now served by two peers, so its temperature is `(1/4) / 2 = 1/8`, and that will mean that we will select
+`01xxxxxx` as the hotter one.
+
+There is a way of implementing temperature calculation so its cost it is the same as looking for biggest gap. Temperature
+can be calculated on the fly as the gap is found using a `pot`.
+
+Other metrics could be considered in the temperature, as recently number of requests per address space, performance of
+current peers...
diff --git a/network/kademlia.go b/network/kademlia.go
@@ -427,28 +427,77 @@ func (k *Kademlia) SuggestPeer() (suggestedPeer *BzzAddr, saturationDepth int, c
 					return false
 				}
 			}
-			// curPO found
-			// find a callable peer out of the addresses in the unsaturated bin
-			// stop if found
-			bin.ValIterator(func(val pot.Val) bool {
-				e := val.(*entry)
-				if k.callable(e) {
-					suggestedPeer = e.BzzAddr
-					return false
-				}
-
-				return true
-			})
+			suggestedPeer = k.suggestPeerInBin(bin)
 			return cur < len(bins) && suggestedPeer == nil
 		}, true)
 	}
+
 	if uint8(saturationDepth) < k.saturationDepth {
 		k.saturationDepth = uint8(saturationDepth)
 		return suggestedPeer, saturationDepth, true
 	}
 	return suggestedPeer, 0, false
 }
 
+func (k *Kademlia) suggestPeerInBin(bin *pot.Bin) *BzzAddr {
+	var foundPeer *BzzAddr
+	// curPO found
+	// find a callable peer out of the addresses in the unsaturated bin
+	// stop if found
+	bin.ValIterator(func(val pot.Val) bool {
+		e := val.(*entry)
+		if k.callable(e) {
+			foundPeer = e.BzzAddr
+			return false
+		}
+		return true
+	})
+	return foundPeer
+}
+
+//suggestPeerInBinByGap tries to find the best peer to connect in a particular bin looking for the biggest
+//address gap in the current connections bin of same proximity order instead of using the first address that is
+//callable. In case there is no current bin of po = bin.ProximityOrder, or is empty, the usual suggestPeerInBin algorithm
+//will take place.
+//bin parameter is the bin in the addresses in which to select a BzzAddr
+//return value is the BzzAddr selected
+func (k *Kademlia) suggestPeerInBinByGap(bin *pot.Bin) *BzzAddr {
+	connBin := k.defaultIndex.conns.PotWithPo(k.base, bin.ProximityOrder, Pof)
+	if connBin == nil {
+		return k.suggestPeerInBin(bin)
+	}
+	gapPo, gapVal := connBin.BiggestAddressGap()
+	// I need an address in the missing gapPo space with respect to gapVal
+	// the lower gapPo the biggest the address space gap
+	var foundPeer *BzzAddr
+	var candidatePeer *BzzAddr
+	furthestPo := 256
+	// find a callable peer out of the addresses in the unsaturated bin
+	// stop if found
+	bin.ValIterator(func(val pot.Val) bool {
+		e := val.(*entry)
+		addrPo, _ := Pof(gapVal, e.BzzAddr, bin.ProximityOrder)
+		if k.callable(e) {
+			if addrPo == gapPo {
+				foundPeer = e.BzzAddr
+				return false
+			}
+			if addrPo < furthestPo {
+				furthestPo = addrPo
+				candidatePeer = e.BzzAddr
+			}
+			return true
+		}
+		return true
+	})
+	if foundPeer != nil {
+		return foundPeer
+	} else {
+		// Peer with an address po away from pin not found, so we return the farthest
+		return candidatePeer
+	}
+}
+
 // On inserts the peer as a kademlia peer into the live peers
 func (k *Kademlia) On(p *Peer) (uint8, bool) {
 	k.lock.Lock()

diff --git a/network/kademlia_test.go b/network/kademlia_test.go
@@ -1069,3 +1069,78 @@ func TestCapabilityNeighbourhoodDepth(t *testing.T) {
 		t.Fatalf("cap 'one' expected depth 2, was %d", depth)
 	}
 }
+
+//TestSuggestPeerInBinByGap will check that when several addresses are available for register in the same bin, the
+//one suggested is the one that fills the biggest gap of address in that bin.
+func TestSuggestPeerInBinByGap(t *testing.T) {
+	tk := newTestKademlia(t, "11111111")
+	tk.Register("00000000", "00000001")
+	bin0 := tk.getAddressBin(0)
+	if bin0 == nil {
+		t.Errorf("Expected bin 0 in addresses to be found but is nil")
+	}
+
+	// Adding 00000000 for example, doesn't really mater among the first two
+	tk.On("00000000")
+	tk.Register("01000000")
+	suggestedByGapPeer := tk.suggestPeerInBinByGap(tk.getAddressBin(0))
+	binaryString := bzzAddrToBinary(suggestedByGapPeer)
+	// Expected suggestion is 01000000 because it covers bigger part of the address space in bin 0.
+	if binaryString != "01000000" {
+		t.Errorf("Expected suggestion by gap to be 01000000 because is in po=1 gap, but got %v", binaryString)
+	}
+	// Adding 01000000
+	tk.On(binaryString)
+	//Now wi will try to fill in po 1
+	tk.Register("10000000", "11110000")
+	bin1 := tk.getAddressBin(1)
+	//Among the two peers in first one (10000000) covers more gap than the other one in our kademlia table (is farther from
+	// our base 11111111)
+	suggestedByGapPeer = tk.suggestPeerInBinByGap(bin1)
+	binaryString = bzzAddrToBinary(suggestedByGapPeer)
+	if binaryString != "10000000" {
+		t.Errorf("Expected suggestion by gap to be 10000000 because is in po=1 gap, but got %v", binaryString)
+	}
+}
+
+//TestSuggestPeerInBinByGapCandidate checks than when suggesting addresses, if an address in the desired gap can't be
+//found, the furthest away from the reference peer will be chosen (the one with lower po so it will fill up a bigger
+//part of the gap)
+func TestSuggestPeerInBinByGapCandidate(t *testing.T) {
+	tk := newTestKademlia(t, "11111111")
+	tk.On("00000000", "10000000")
+	//Registering address (10000100) po=5 from 1000000 to leave a big gap [2..4]
+	tk.On("10000100")
+	//Now we are going to suggest a biggest gap that doesn't match with any of the available addresses. The algorithm
+	//should take the furthest from the reference address (parent of the gap, so 10000000)
+	//Now we have a gap po=2 under 10000000 in bin1. We are not going to register an address po=2 (f.ex. 10100000) but
+	//two addresses at po=3 and po=4 from it. Algorithm should return the farthest candidate(po=3).
+	//10010000 => po=3 from 10000000
+	//10001000 => po=4 from 10000000
+	tk.Register("10010000", "10001000")
+	suggestedCandidate := tk.suggestPeerInBinByGap(tk.getAddressBin(1))
+	binaryString := bzzAddrToBinary(suggestedCandidate)
+	if binaryString != "10010000" {
+		t.Errorf("Expected furthest candidate to be 10010000 at po=3, but got %v", binaryString)
+	}
+}
+
+//getAddressBin is an utility function to obtain a Bin by po
+func (tk *testKademlia) getAddressBin(po int) *pot.Bin {
+	var theBin *pot.Bin
+	tk.defaultIndex.addrs.EachBin(tk.base, Pof, po, func(bin *pot.Bin) bool {
+		if bin.ProximityOrder == po {
+			theBin = bin
+			return false
+		} else if bin.ProximityOrder > po {
+			return false
+		} else {
+			return true
+		}
+	}, true)
+	return theBin
+}
+
+func bzzAddrToBinary(bzzAddress *BzzAddr) string {
+	return byteToBitString(bzzAddress.OAddr[0])
+}
diff --git a/network_test.go b/network_test.go
@@ -352,13 +352,14 @@ func testSwarmNetwork(t *testing.T, o *testSwarmNetworkOptions, steps ...testSwa
 
 			for syncing := true; syncing; {
 				syncing = false
+				time.Sleep(1 * time.Second)
+
 				for _, id := range nodeIDs {
 					if sim.MustNodeItem(id, bucketKeyInspector).(*api.Inspector).IsPullSyncing() {
 						syncing = true
+						break
 					}
 				}
-
-				time.Sleep(1 * time.Second)
 			}
 
 			for {

diff --git a/pot/pot.go b/pot/pot.go
@@ -925,3 +925,86 @@ func (t *Pot) sstring(indent string) string {
 	}
 	return s
 }
+
+//PotWithPo returns a Pot with all elements with proximity order desiredPo w.r.t. pivotVal.
+//is similar to obtain a bin but in a tree structure that helps in some calculations
+func (t *Pot) PotWithPo(pivotVal Val, desiredPo int, pof Pof) *Pot {
+	if t == nil || t.size == 0 {
+		return nil
+	}
+	pivotProximityOrder, _ := pof(t.pin, pivotVal, 0)
+	pivotPot, pivotBinIndex := t.getPos(pivotProximityOrder)
+	if pivotProximityOrder < desiredPo {
+		if pivotPot != nil && pivotPot.po == pivotProximityOrder {
+			return pivotPot.PotWithPo(pivotVal, desiredPo, pof)
+		} else { //There is no bin with the desired po
+			return nil
+		}
+	}
+	if pivotProximityOrder == desiredPo {
+		prunedPot := t.clone()
+		prunedPot.po = desiredPo
+		actualPivotPlace := pivotBinIndex
+		if pivotPot == nil {
+			actualPivotPlace--
+		}
+		var removedBinsSize int
+		for i := 0; i < len(prunedPot.bins) && i <= actualPivotPlace; i++ {
+			removedBinsSize += prunedPot.bins[i].size
+		}
+		prunedPot.size = prunedPot.size - removedBinsSize
+		if prunedPot.bins != nil {
+			prunedPot.bins = prunedPot.bins[actualPivotPlace+1:]
+		}
+		return prunedPot
+	}
+	// if pivotProximityOrder > desiredPo
+	for i := 0; i < len(t.bins); i++ {
+		n := t.bins[i]
+		if n.po == desiredPo {
+			return n
+		}
+	}
+	return nil
+}
+
+//BiggestAddressGap tries to find the biggest address not covered by an element in the address space.
+//Biggest gaps tend to be top left of the tree (if the pot is rendered root top and bins with po = 0 left).
+//As the bins progress to the right or down (higher proximity order) the address space gap left is smaller.
+//An address gap is defined as a missing proximity order without any value. So for example, a root value with two
+//bins, one with po 0 and one with po 2 has a gap in po=1. Of course it also has a gap in po>=3 but that gap is smaller
+//in number of addresses contained. If the total space area is 1, the space covered by a bin of proximity order n can
+//be defined as 1/2^n. So po=0 will occupy half of the area, po=5 1/32 of the area and so on.
+//When a gap is found there is no need to go further on that level because advancing (horizontally or vertically) will
+//decrease the maximum gap space by half.
+//The function returns the proximity order of the gap and the reference value where the gap has been found (so the
+//exact address set can be calculated)
+func (t *Pot) BiggestAddressGap() (po int, val Val) {
+	if t == nil || t.size == 0 {
+		return 0, nil
+	}
+
+	if len(t.bins) == 0 {
+		return t.po + 1, t.pin
+	}
+
+	wrt := t.pin
+	biggest := 256
+	last := t.po
+	for _, subPot := range t.bins {
+		if subPot.po > last+1 && last+1 <= biggest {
+			wrt = t.pin
+			biggest = last + 1
+			break
+		} else {
+			last = subPot.po
+			subBiggest, aVal := subPot.BiggestAddressGap()
+			if subBiggest < biggest {
+				biggest = subBiggest
+				wrt = aVal
+			}
+		}
+	}
+
+	return biggest, wrt
+}