Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(pullstorage): share common iterators in pullstorage #1683

Merged
merged 1 commit into from
May 11, 2021

Conversation

acud
Copy link
Member

@acud acud commented May 10, 2021

This PR optimizes the way we open pull sync subscription to localstore in pullstorage package.

The assumptions are the following:

  • when starting to sync, different peers will do individual, one-time-cost historical syncing which is relevant only to them
  • when doing live-syncing, most peers will want the same intervals, since most of them are assumed to be tracking the tip of the binIDs they are interested in

So for example, two peers which are live-syncing from a third peer will both want anything from a given interval to math.MaxUint64. Since both peers are expected to track the tip of the bin more or less, and since some debouncing is introduced by the flipflop package, it is expected that in most cases, the From value will be shared between peers around the same timeframe.

Through the pull sync protocol handler, these independent requests will yield to two identical calls to IntervalChunks in makeOffer:

// makeOffer tries to assemble an offer for a given requested interval.
func (s *Syncer) makeOffer(ctx context.Context, rn pb.GetRange) (o *pb.Offer, addrs []swarm.Address, err error) {
 -->	chs, top, err := s.storage.IntervalChunks(ctx, uint8(rn.Bin), rn.From, rn.To, maxPage)
	if err != nil {
		return o, nil, err
	}

Once the notification about new chunks in a bin comes through in localstore, these identical iterators will be triggered, resulting in duplicate work and possible CPU overhead syncing open iterators.
The aim of this PR is to reuse existing, open subscriptions and leverage the same response to multiple requesters at the same time, lowering the amount of I/O and CPU we're doing significantly, moving the bottleneck to (hopefully) the transport level.

This is also crucial to leverage due to:

  1. Recent changes to kademlia upper bound which is set by the postage contract data. This might under certain circumstances result in very large neighborhoods
  2. The change of swarm.MaxPO to 32 instead of 16. This may result in a lot more open subscriptions from peers on a lot more bins which would amplify the problem significantly

This is a proposed solution on how to reduce the amount of goroutines by sharing the open subscriptions and communicating the results by the originator goroutine once they come in to the other goroutines.
Since the other subscribed goroutines may terminate for whatever reason before the result is given, they must remove themselves from the result subscription if that happens.

A major concern using this strategy in the past was that one peer may slow another down, since the results may take time to actually send to the other peers that may want the result, causing delays in other peers not getting the chunk in time. I believe this is not an issue, since the other subscriptions wait for the results in separate goroutines, it is therefore a very cheap operation for the main goroutine that communicates the results to others to do so, since all goroutines will just have to read from the channel, and subsequent sends are not accounted within the scope of the package.


This change is Reviewable

@acud acud force-pushed the pullsync-share-iterators branch 2 times, most recently from 255f3e4 to 01e604d Compare May 10, 2021 14:27
@acud acud self-assigned this May 10, 2021
@acud acud added the enhancement enhancement of existing functionality label May 10, 2021
@acud acud force-pushed the pullsync-share-iterators branch from 01e604d to 19be370 Compare May 10, 2021 14:37
@acud acud added the ready for review The PR is ready to be reviewed label May 10, 2021
Copy link
Member

@janos janos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! While reading, I got an impression that singleflight could have been used for the same purpose, but the implementation in this PR is fine.

Copy link
Member

@zelig zelig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r1.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @acud, @aloknerurkar, @anatollupacescu, @esadakar, and @mrekucci)


pkg/pullsync/pullstorage/pullstorage.go, line 109 at r1 (raw file):

		s.openSubsMu.Lock()
		for _, c := range s.openSubs[k] {
			c <- intervalChunks{chs: chs, topmost: topmost, err: err}

there must be a select default here or the channel be buffered, otherwise this leaks

@acud acud force-pushed the pullsync-share-iterators branch from 19be370 to 9c26456 Compare May 11, 2021 04:53
Copy link
Member Author

@acud acud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 3 of 4 files reviewed, 1 unresolved discussion (waiting on @aloknerurkar, @anatollupacescu, @esadakar, @mrekucci, and @zelig)


pkg/pullsync/pullstorage/pullstorage.go, line 109 at r1 (raw file):

Previously, zelig (Viktor Trón) wrote…

there must be a select default here or the channel be buffered, otherwise this leaks

actually it will deadlock completely. I added a select default case here and some comments about why this is needed 👍

Copy link
Contributor

@aloknerurkar aloknerurkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: 3 of 4 files reviewed, 1 unresolved discussion (waiting on @anatollupacescu, @esadakar, @mrekucci, and @zelig)

@mrekucci mrekucci requested a review from zelig May 11, 2021 15:23
Copy link
Contributor

@mrekucci mrekucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 4 files at r1, 1 of 1 files at r2.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @acud, @anatollupacescu, @esadakar, and @zelig)


pkg/pullsync/pullstorage/pullstorage_test.go, line 311 at r2 (raw file):

	go func() {
		<-time.After(200 * time.Millisecond)

Is there a reason why a timer is used instead of time.Sleep? If not, please consider using time.Sleep since creating a new timer is more precious of resources. Also, consider documenting the needed delay and why 200ms if possible.

Copy link
Contributor

@mrekucci mrekucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @acud, @anatollupacescu, @esadakar, and @zelig)

@acud acud force-pushed the pullsync-share-iterators branch from 9c26456 to 0a4cb62 Compare May 11, 2021 17:46
@acud acud merged commit 553a1b5 into master May 11, 2021
@acud acud deleted the pullsync-share-iterators branch May 11, 2021 18:28
acud added a commit that referenced this pull request May 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement enhancement of existing functionality pull-request ready for review The PR is ready to be reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants