perf(pullstorage): share common iterators in pullstorage #1683

acud · 2021-05-10T14:25:45Z

This PR optimizes the way we open pull sync subscription to localstore in pullstorage package.

The assumptions are the following:

when starting to sync, different peers will do individual, one-time-cost historical syncing which is relevant only to them
when doing live-syncing, most peers will want the same intervals, since most of them are assumed to be tracking the tip of the binIDs they are interested in

So for example, two peers which are live-syncing from a third peer will both want anything from a given interval to math.MaxUint64. Since both peers are expected to track the tip of the bin more or less, and since some debouncing is introduced by the flipflop package, it is expected that in most cases, the From value will be shared between peers around the same timeframe.

Through the pull sync protocol handler, these independent requests will yield to two identical calls to IntervalChunks in makeOffer:

// makeOffer tries to assemble an offer for a given requested interval.
func (s *Syncer) makeOffer(ctx context.Context, rn pb.GetRange) (o *pb.Offer, addrs []swarm.Address, err error) {
 -->	chs, top, err := s.storage.IntervalChunks(ctx, uint8(rn.Bin), rn.From, rn.To, maxPage)
	if err != nil {
		return o, nil, err
	}

Once the notification about new chunks in a bin comes through in localstore, these identical iterators will be triggered, resulting in duplicate work and possible CPU overhead syncing open iterators.
The aim of this PR is to reuse existing, open subscriptions and leverage the same response to multiple requesters at the same time, lowering the amount of I/O and CPU we're doing significantly, moving the bottleneck to (hopefully) the transport level.

This is also crucial to leverage due to:

Recent changes to kademlia upper bound which is set by the postage contract data. This might under certain circumstances result in very large neighborhoods
The change of swarm.MaxPO to 32 instead of 16. This may result in a lot more open subscriptions from peers on a lot more bins which would amplify the problem significantly

This is a proposed solution on how to reduce the amount of goroutines by sharing the open subscriptions and communicating the results by the originator goroutine once they come in to the other goroutines.
Since the other subscribed goroutines may terminate for whatever reason before the result is given, they must remove themselves from the result subscription if that happens.

A major concern using this strategy in the past was that one peer may slow another down, since the results may take time to actually send to the other peers that may want the result, causing delays in other peers not getting the chunk in time. I believe this is not an issue, since the other subscriptions wait for the results in separate goroutines, it is therefore a very cheap operation for the main goroutine that communicates the results to others to do so, since all goroutines will just have to read from the channel, and subsequent sends are not accounted within the scope of the package.

This change is

janos

Very nice! While reading, I got an impression that singleflight could have been used for the same purpose, but the implementation in this PR is fine.

zelig

Reviewed 4 of 4 files at r1.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @acud, @aloknerurkar, @anatollupacescu, @esadakar, and @mrekucci)

pkg/pullsync/pullstorage/pullstorage.go, line 109 at r1 (raw file):

		s.openSubsMu.Lock()
		for _, c := range s.openSubs[k] {
			c <- intervalChunks{chs: chs, topmost: topmost, err: err}

there must be a select default here or the channel be buffered, otherwise this leaks

acud

Reviewable status: 3 of 4 files reviewed, 1 unresolved discussion (waiting on @aloknerurkar, @anatollupacescu, @esadakar, @mrekucci, and @zelig)

pkg/pullsync/pullstorage/pullstorage.go, line 109 at r1 (raw file):

Previously, zelig (Viktor Trón) wrote…

there must be a select default here or the channel be buffered, otherwise this leaks

actually it will deadlock completely. I added a select default case here and some comments about why this is needed 👍

aloknerurkar

Reviewable status: 3 of 4 files reviewed, 1 unresolved discussion (waiting on @anatollupacescu, @esadakar, @mrekucci, and @zelig)

mrekucci

Reviewed 3 of 4 files at r1, 1 of 1 files at r2.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @acud, @anatollupacescu, @esadakar, and @zelig)

pkg/pullsync/pullstorage/pullstorage_test.go, line 311 at r2 (raw file):

	go func() {
		<-time.After(200 * time.Millisecond)

Is there a reason why a timer is used instead of time.Sleep? If not, please consider using time.Sleep since creating a new timer is more precious of resources. Also, consider documenting the needed delay and why 200ms if possible.

mrekucci

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @acud, @anatollupacescu, @esadakar, and @zelig)

…)" This reverts commit 553a1b5.

bee-runner bot added the pull-request label May 10, 2021

acud force-pushed the pullsync-share-iterators branch 2 times, most recently from 255f3e4 to 01e604d Compare May 10, 2021 14:27

acud self-assigned this May 10, 2021

acud added the enhancement enhancement of existing functionality label May 10, 2021

acud force-pushed the pullsync-share-iterators branch from 01e604d to 19be370 Compare May 10, 2021 14:37

acud requested review from janos, zelig, aloknerurkar, anatollupacescu, istae and mrekucci May 10, 2021 14:58

acud added the ready for review The PR is ready to be reviewed label May 10, 2021

janos approved these changes May 10, 2021

View reviewed changes

zelig reviewed May 11, 2021

View reviewed changes

acud force-pushed the pullsync-share-iterators branch from 19be370 to 9c26456 Compare May 11, 2021 04:53

acud commented May 11, 2021

View reviewed changes

aloknerurkar approved these changes May 11, 2021

View reviewed changes

mrekucci requested a review from zelig May 11, 2021 15:23

mrekucci approved these changes May 11, 2021

View reviewed changes

perf: share common iterators in pullstorage

0a4cb62

acud force-pushed the pullsync-share-iterators branch from 9c26456 to 0a4cb62 Compare May 11, 2021 17:46

acud merged commit 553a1b5 into master May 11, 2021

acud deleted the pullsync-share-iterators branch May 11, 2021 18:28

acud added a commit that referenced this pull request May 21, 2021

Revert "perf(pullstorage): share common iterators in pullstorage (#1683…

5ed6c6b

…)" This reverts commit 553a1b5.

acud added a commit that referenced this pull request May 22, 2021

perf(pullstorage): revert share common iterators in pullstorage (#1683)

540b818

acud added a commit that referenced this pull request May 22, 2021

perf(pullstorage): revert share common iterators in pullstorage (#1683)

1611d48

acud added a commit that referenced this pull request May 22, 2021

perf(pullstorage): revert share common iterators in pullstorage (#1683)

cf93871

acud added a commit that referenced this pull request May 22, 2021

revert: share common iterators in pullstorage (#1683)

82e4bb6

acud added a commit that referenced this pull request May 22, 2021

revert: share common iterators in pullstorage (#1683)

aeffdcf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(pullstorage): share common iterators in pullstorage #1683

perf(pullstorage): share common iterators in pullstorage #1683

acud commented May 10, 2021 •

edited

Loading

janos left a comment

zelig left a comment

acud left a comment

aloknerurkar left a comment

mrekucci left a comment

mrekucci left a comment

perf(pullstorage): share common iterators in pullstorage #1683

perf(pullstorage): share common iterators in pullstorage #1683

Conversation

acud commented May 10, 2021 • edited Loading

janos left a comment

Choose a reason for hiding this comment

zelig left a comment

Choose a reason for hiding this comment

acud left a comment

Choose a reason for hiding this comment

aloknerurkar left a comment

Choose a reason for hiding this comment

mrekucci left a comment

Choose a reason for hiding this comment

mrekucci left a comment

Choose a reason for hiding this comment

acud commented May 10, 2021 •

edited

Loading