Skip to content
This repository has been archived by the owner on Aug 2, 2021. It is now read-only.

network: Kademlia Load Balancing #1774

Merged
merged 36 commits into from
Nov 12, 2019
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
0f31b5a
Proposed solution for peers load balancing in Kademlia
kortatu Sep 18, 2019
c1c9ac5
network: created global capabilityIndex
kortatu Sep 19, 2019
fc65d5a
typo
kortatu Sep 19, 2019
7dc9568
renamed globalIndex to defaultIndex
kortatu Sep 19, 2019
ca11c52
Load balancing capability test
kortatu Sep 20, 2019
97ab47e
Removed color balancing and using a KademliaLoadBalancer
kortatu Sep 25, 2019
f0fc99d
Missing file in commit
kortatu Sep 25, 2019
0941717
Merge branch 'master' into issue-1757
kortatu Sep 26, 2019
e9263d7
Fixed lint and test when mergin master
kortatu Sep 26, 2019
ed1f9a9
Subscription to peer changed closed only by writer, not by subscriptors
kortatu Sep 26, 2019
904e204
Added an alternative method for initializing a new peer count
kortatu Sep 27, 2019
29303f1
Merge branch 'master' into issue-1757
kortatu Sep 27, 2019
e53ae25
go fmt
kortatu Sep 27, 2019
f82db7f
extracted pubsub channel to a package
kortatu Sep 30, 2019
0e346e6
network: fixed pr comments
kortatu Oct 1, 2019
5665ea9
Merge branch 'master' into issue-1757
kortatu Oct 3, 2019
4c81a24
network/kademlia: Pr comments, tests commented and fixed
kortatu Oct 14, 2019
5b44ab3
network/kademlia: better naming for pub/sub channels in kademlialoadb…
kortatu Oct 14, 2019
52dacb0
Fixed wrong test in kademlia load balancer. Also fixed waiting methods.
kortatu Oct 15, 2019
3301920
Merge branch 'master' into issue-1757
kortatu Oct 15, 2019
49e7d09
Debug functions moved out of kademlialoadbalancer.go
kortatu Oct 15, 2019
ad5eac5
More comments, fixed EachConn po for peers in the same bin as the piv…
kortatu Oct 16, 2019
e78e2b3
resourceUseStats moved to a diffrente file. Use() renamed to AddUseCo…
kortatu Oct 16, 2019
21a0d2f
added gopubsub unit tests
kortatu Oct 29, 2019
c615cdd
Moved resource_use_stats to its own package. Renamed gopubsub to pubs…
kortatu Oct 29, 2019
340ce15
Pubsub now closes all go routines when closing. Removed commented code
kortatu Oct 29, 2019
96a8245
Merge branch 'master' into issue-1757
kortatu Oct 29, 2019
697b049
fix closing channel
kortatu Oct 29, 2019
5cb6d46
Added close channel for stopping blocked publishing goroutines
kortatu Oct 30, 2019
abc51ab
Added unit tests for pubsubchannel to check ongoing goroutines
kortatu Oct 30, 2019
9f951d1
Better accounting of pending messages/goroutines
kortatu Oct 30, 2019
d8f8059
Reverted metrics to regular int counters
kortatu Oct 30, 2019
c34062c
Removed testKademliaBackend, using testKademlia instead. Minor PR fixes
kortatu Oct 31, 2019
e7eefaf
PubSubChannel now publish messages semi-asynchronously filling first …
kortatu Nov 4, 2019
3019822
Exit publishing goroutine on psc.quitC close.
kortatu Nov 6, 2019
2879509
Avoid data race in getAllUseCounts
kortatu Nov 12, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions network/discovery.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ import (
"fmt"
"sync"

"github.com/ethereum/go-ethereum/common/hexutil"
"github.com/ethereum/go-ethereum/rlp"
"github.com/ethersphere/swarm/pot"
)
Expand All @@ -37,6 +38,7 @@ type Peer struct {
mtx sync.RWMutex //
peers map[string]bool // tracks node records sent to the peer
depth uint8 // the proximity order advertised by remote as depth of saturation
key string // peer key. Hex form of Address()
}

// NewPeer constructs a discovery peer
Expand All @@ -45,6 +47,7 @@ func NewPeer(p *BzzPeer, kad *Kademlia) *Peer {
kad: kad,
BzzPeer: p,
peers: make(map[string]bool),
key: hexutil.Encode(p.Address()),
}
// record remote as seen so we never send a peer its own record
d.seen(p.BzzAddr)
Expand All @@ -66,6 +69,16 @@ func (d *Peer) HandleMsg(ctx context.Context, msg interface{}) error {
}
}

// Key returns a string representation of this peer to be used in maps.
func (d *Peer) Key() string {
return d.key
}

// Label returns a short string representation for debugging purposes
func (d *Peer) Label() string {
return d.key[:4]
}

// NotifyDepth sends a message to all connections if depth of saturation is changed
func NotifyDepth(depth uint8, kad *Kademlia) {
f := func(val *Peer, po int) bool {
Expand Down
126 changes: 126 additions & 0 deletions network/gopubsub/pubsub.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
package gopubsub

import (
"fmt"
"strconv"
"sync"
)

//PubSubChannel represents a pubsub system where subscriber can .Subscribe() and publishers can .Publish() or .Close().
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kortatu Please have a look at @janos implementation of a similar mechanism with SubscribePull function in storage/localstore/subscribe_pull.go and SubscribeToNeighbourhoodDepthChange in kademlia.go.
If you use a similar approach, far less state needs to be maintained and the subscription logic can be included directly in the kademlia file (i.e. if the Subscribe function returns a notification channel and a stop function)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, I will take a look at it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have read SubscribeToNeighbourhoodDepthChange and SubscribePull, it is the initial implementation I used, but it has a problem, first, is not extracted to a resuable file, and also, it has the problem that the subscriber can not stop the subscription from their side.
In PubSubChannel, whenever the subscriber wants to stop the subscription, the channel is marked for closing, but not actually closed (because the only one closing a channel should be the writer of the channel).
I think those two pieces of code should be migrated to PubSubChannel for code sanity.

Copy link
Member

@acud acud Oct 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not extracted to a resuable file

I'm not sure this little amount of code justifies being generalized into a component in its own

the subscriber can not stop the subscription from their side

I'm not really following... SubscribePull returns c <-chan Descriptor, stop func(), where stop func() is a function to deregister the returned channel from the subscriptions list of channels that should be notified. Calling stop stops the subscription from the side of the subscriber. Or maybe I'm missing something here?

Copy link
Contributor Author

@kortatu kortatu Oct 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you stop a channel from the reader side and the writer tries to write, you get an error. A channel should be always closed by the writer.
Anyway, I still see the main reason to create PubSubChannel, if you don't see that this amount of code used in 3 places (already) must be extracted, maybe I have other threshold for code reuse.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pubsub implementation has its own tests, that currently the other code doesn't.

Which tests are testing this pubsub implementation? I see no tests in gopubsub package.

And I think that we have tests for SubscribePull and ToNeighbourhoodDepthChange.

But anyway I don't think I need to justify the refactor of some common functionality to an external file.

I am sorry if you feel offended. But, PR reviews because of any code change I think that it is ok to question any code or comment in PR review. Even, my questions are mainly about problems that you mentioned that you have found in other places.

I do not think that anybody has any objections on abstracting common functionality, I think that the abstraction implementation is in question here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to commit the test file.
I found the problem with closing the subscription channel making the kademlia load balancing tests. As the addition of peers is done asynchronously, it could happen that the Load Balancer unsubscribe the moment a peer is added to Kademlia. Maybe is just a test thing, but is more correct to not close the channel from the reader side.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you are referring to for SubscribeToNeighbourhoodDepthChange, but since the channel is removed in the same function and there is syncOnce protecting the close, there should be no possibility for both send to the closed channel and close of the closed channel. This can be done with additional channel, not to close the returned channel from the client side but with more complexity. I found that this approach solved both problems in a simpler way since there already is Kademlia.lock.

In localstore SubscribePush a different approach is made and the returned channel is not closed on client side, but that subscription has different functionality then SubscribeToNeighbourhoodDepthChange.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it is working because the user of the API (the code subscribing) is doing things "kindly". I think an interface should not rely on that. As a generic publish/subscriber channel, I think that my implementation is safer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have nothing to add. Thanks.

type PubSubChannel struct {
subscriptions []*Subscription
subsMutex sync.RWMutex
nextId int
}

//Subscription is created in PubSubChannel using pubSub.Subscribe(). Subscribers can receive using .ReceiveChannel().
// or .Unsubscribe()
type Subscription struct {
closed bool
removeSub func()
signal chan interface{}
closeOnce sync.Once
id string
}

//New creates a new PubSubChannel.
func New() *PubSubChannel {
return &PubSubChannel{
subscriptions: make([]*Subscription, 0),
}
}

//Subscribe creates a subscription to a channel, each subscriber should keep its own Subscription instance.
func (psc *PubSubChannel) Subscribe() *Subscription {
psc.subsMutex.Lock()
defer psc.subsMutex.Unlock()
newSubscription := newSubscription(strconv.Itoa(psc.nextId))
psc.nextId++
psc.subscriptions = append(psc.subscriptions, &newSubscription)
newSubscription.removeSub = func() {
psc.subsMutex.Lock()
defer psc.subsMutex.Unlock()

for i, subscription := range psc.subscriptions {
if subscription.signal == newSubscription.signal {
fmt.Println("Unsubscribing", "id", subscription.id)
subscription.closed = true
psc.subscriptions = append(psc.subscriptions[:i], psc.subscriptions[i+1:]...)
}
}
}
return &newSubscription
}

//Publish broadcasts a message asynchronously to each subscriber.
//If some of the subscriptions(channels) has been marked as closeable, it does it now.
func (psc *PubSubChannel) Publish(msg interface{}) {
psc.subsMutex.RLock()
defer psc.subsMutex.RUnlock()
for i, sub := range psc.subscriptions {
if sub.closed {
fmt.Println("Subscription was closed", "id", sub.id)
sub.closeChannel()
} else {
go func(sub *Subscription, index int) {
sub.signal <- msg
}(sub, i)

}
}
}

//NumSubscriptions returns how many subscriptions are currently active.
func (psc *PubSubChannel) NumSubscriptions() int {
psc.subsMutex.RLock()
defer psc.subsMutex.RUnlock()
return len(psc.subscriptions)
}

//Close cancels all subscriptions closing the channels associated with them.
//Usually the publisher is in charge of calling Close().
func (psc *PubSubChannel) Close() {
psc.subsMutex.Lock()
defer psc.subsMutex.Unlock()
for _, sub := range psc.subscriptions {
sub.closed = true
sub.closeChannel()
}
}

//Unsubscribe cancels subscription from the subscriber side. Channel is marked as closed but only writer should close it.
func (sub *Subscription) Unsubscribe() {
sub.closed = true
sub.removeSub()
}

//ReceiveChannel returns the channel where the subscriber will receive messages.
func (sub *Subscription) ReceiveChannel() <-chan interface{} {
return sub.signal
}

//IsClosed returns if the subscription is closed via Unsubscribe() or Close() in the pubSub that creates it.
func (sub *Subscription) IsClosed() bool {
return sub.closed
}

//ID returns a unique id in the PubSubChannel of this subscription. Useful for debugging.
func (sub *Subscription) ID() string {
return sub.id
}

func (sub *Subscription) closeChannel() {
sub.closeOnce.Do(func() {
close(sub.signal)
})
}

func newSubscription(id string) Subscription {
return Subscription{
closed: false,
removeSub: nil,
signal: make(chan interface{}),
closeOnce: sync.Once{},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to initialise value, it is implicit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but id doesn't hurt to make it explicit, it is clear for the reader. Swarm code is full of false boolean initialisations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree that it is better, it is obvious from the struct declaration.

id: id,
}
}
Loading