Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KV scan batch size control #4875

Merged
merged 3 commits into from
Dec 28, 2022
Merged

KV scan batch size control #4875

merged 3 commits into from
Dec 28, 2022

Conversation

nopcoder
Copy link
Contributor

@nopcoder nopcoder commented Dec 27, 2022

Close #4712

Pass listing in catalog level to the KV driver level.
Listing works with paginations, while KV with iterators.
In this PR the caller can pass recommended batch size to the KV level so the implementation can utilize our resources better and perform faster for specific actions.

@nopcoder nopcoder added the area/KV Improvements to the KV store implementation label Dec 27, 2022
@nopcoder nopcoder requested a review from guy-har December 27, 2022 13:51
@nopcoder nopcoder self-assigned this Dec 27, 2022
@nopcoder nopcoder requested a review from a team December 27, 2022 13:55
@nopcoder nopcoder force-pushed the feature/kv-batch-size branch from c7faa13 to fd09890 Compare December 27, 2022 21:47
Copy link
Contributor

@guy-har guy-har left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great change, looks great.
Added some minor comments
Requesting changes only because that the DynamoDB scan doesn't choose the scan size correctly

@@ -60,6 +60,11 @@ type ValueWithPredicate struct {
Predicate Predicate
}

type ScanOptions struct {
KeyStart []byte
BatchSize int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should add some documentation here explaining what BatchSize represents

pkg/kv/store.go Outdated
@@ -79,7 +84,9 @@ type Store interface {

// Scan returns entries that can be read by key order, starting at or after the `start` position
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Scan returns entries that can be read by key order, starting at or after the `start` position
// Scan returns entries that can be read by key order

pkg/kv/store.go Outdated
@@ -79,7 +84,9 @@ type Store interface {

// Scan returns entries that can be read by key order, starting at or after the `start` position
// partitionKey is optional, passing it might increase performance.
Scan(ctx context.Context, partitionKey, start []byte) (EntriesIterator, error)
// 'options' holds additional parameters to limit the number of records
// and set the prefix to scan.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it set the prefix to scan or the start key?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, fixing the comment.

@@ -1872,7 +1873,7 @@ func (g *Graveler) addCommitNoLock(ctx context.Context, repository *RepositoryRe
}

func (g *Graveler) isStagingEmpty(ctx context.Context, repository *RepositoryRecord, branch *Branch) (bool, error) {
itr, err := g.listStagingArea(ctx, branch)
itr, err := g.listStagingArea(ctx, branch, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
itr, err := g.listStagingArea(ctx, branch, 0)
itr, err := g.listStagingArea(ctx, branch, 1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - good call to optimize empty checks too

@@ -1904,7 +1905,7 @@ func (g *Graveler) isSealedEmpty(ctx context.Context, repository *RepositoryReco
if len(branch.SealedTokens) == 0 {
return true, nil
}
itrs, err := g.sealedTokensIterator(ctx, branch)
itrs, err := g.sealedTokensIterator(ctx, branch, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
itrs, err := g.sealedTokensIterator(ctx, branch, 0)
itrs, err := g.sealedTokensIterator(ctx, branch, 1)

Copy link
Contributor Author

@nopcoder nopcoder Dec 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - good call to optimize empty checks too

@@ -94,8 +94,8 @@ func (m *Manager) DropKey(ctx context.Context, st graveler.StagingToken, key gra

// List TODO niro: Remove batchSize parameter post KV
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think we can remove this todo

Comment on lines +204 to +206
if opts.PrefetchSize > 0 {
opts.PrefetchValues = true
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch

Comment on lines 357 to 362
if s.params.ScanLimit != 0 {
queryInput.SetLimit(s.params.ScanLimit)
}
if options.BatchSize != 0 {
queryInput.Limit = aws.Int64(int64(options.BatchSize))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should choose the minimum between ScanLimit and batchSize

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing

@nopcoder nopcoder marked this pull request as ready for review December 28, 2022 12:14
@nopcoder nopcoder added the include-changelog PR description should be included in next release changelog label Dec 28, 2022
@nopcoder nopcoder requested a review from guy-har December 28, 2022 12:22
@nopcoder nopcoder merged commit 6f8e7fc into master Dec 28, 2022
@nopcoder nopcoder deleted the feature/kv-batch-size branch December 28, 2022 13:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/KV Improvements to the KV store implementation include-changelog PR description should be included in next release changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

KV listing with a limit parameter reads too many items
2 participants