-
Notifications
You must be signed in to change notification settings - Fork 339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(pullstorage): share common iterators in pullstorage #1683
Conversation
255f3e4
to
01e604d
Compare
01e604d
to
19be370
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! While reading, I got an impression that singleflight could have been used for the same purpose, but the implementation in this PR is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r1.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @acud, @aloknerurkar, @anatollupacescu, @esadakar, and @mrekucci)
pkg/pullsync/pullstorage/pullstorage.go, line 109 at r1 (raw file):
s.openSubsMu.Lock() for _, c := range s.openSubs[k] { c <- intervalChunks{chs: chs, topmost: topmost, err: err}
there must be a select default here or the channel be buffered, otherwise this leaks
19be370
to
9c26456
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 3 of 4 files reviewed, 1 unresolved discussion (waiting on @aloknerurkar, @anatollupacescu, @esadakar, @mrekucci, and @zelig)
pkg/pullsync/pullstorage/pullstorage.go, line 109 at r1 (raw file):
Previously, zelig (Viktor Trón) wrote…
there must be a select default here or the channel be buffered, otherwise this leaks
actually it will deadlock completely. I added a select default case here and some comments about why this is needed 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 3 of 4 files reviewed, 1 unresolved discussion (waiting on @anatollupacescu, @esadakar, @mrekucci, and @zelig)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 4 files at r1, 1 of 1 files at r2.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @acud, @anatollupacescu, @esadakar, and @zelig)
pkg/pullsync/pullstorage/pullstorage_test.go, line 311 at r2 (raw file):
go func() { <-time.After(200 * time.Millisecond)
Is there a reason why a timer is used instead of time.Sleep
? If not, please consider using time.Sleep
since creating a new timer is more precious of resources. Also, consider documenting the needed delay and why 200ms if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @acud, @anatollupacescu, @esadakar, and @zelig)
9c26456
to
0a4cb62
Compare
This PR optimizes the way we open pull sync subscription to localstore in
pullstorage
package.The assumptions are the following:
So for example, two peers which are live-syncing from a third peer will both want anything from a given interval to
math.MaxUint64
. Since both peers are expected to track the tip of the bin more or less, and since some debouncing is introduced by theflipflop
package, it is expected that in most cases, theFrom
value will be shared between peers around the same timeframe.Through the pull sync protocol handler, these independent requests will yield to two identical calls to
IntervalChunks
inmakeOffer
:Once the notification about new chunks in a bin comes through in
localstore
, these identical iterators will be triggered, resulting in duplicate work and possible CPU overhead syncing open iterators.The aim of this PR is to reuse existing, open subscriptions and leverage the same response to multiple requesters at the same time, lowering the amount of I/O and CPU we're doing significantly, moving the bottleneck to (hopefully) the transport level.
This is also crucial to leverage due to:
swarm.MaxPO
to32
instead of16
. This may result in a lot more open subscriptions from peers on a lot more bins which would amplify the problem significantlyThis is a proposed solution on how to reduce the amount of goroutines by sharing the open subscriptions and communicating the results by the originator goroutine once they come in to the other goroutines.
Since the other subscribed goroutines may terminate for whatever reason before the result is given, they must remove themselves from the result subscription if that happens.
A major concern using this strategy in the past was that one peer may slow another down, since the results may take time to actually send to the other peers that may want the result, causing delays in other peers not getting the chunk in time. I believe this is not an issue, since the other subscriptions wait for the results in separate goroutines, it is therefore a very cheap operation for the main goroutine that communicates the results to others to do so, since all goroutines will just have to read from the channel, and subsequent sends are not accounted within the scope of the package.
This change is