Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-21.1: sql: hint scan batch size by expected row count #62365

Merged
merged 2 commits into from
Mar 24, 2021

Conversation

jordanlewis
Copy link
Member

Backport 2/2 commits from #62282.

/cc @cockroachdb/release


Might close #62198.

Previously, the "dynamic batch size" strategy for the vectorized
engine's batch allocator worked the same in every situation: batches
would start at size 1, then double on each re-allocation, until they hit
their maximum size of 1024.

Now, to improve performance for scans that return a number of rows
somewhere in between 1 and 1024, we pass in the optimizer's best guess
of the number of rows that the scan will produce all the way down into
the TableReader. That guess is used as the initial size of the batch if
it's less than 1024.

Release note (performance improvement): improve the performance for the
vectorized engine when scanning fewer than 1024 rows at a time.

@jordanlewis jordanlewis requested a review from a team as a code owner March 22, 2021 17:06
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@jordanlewis jordanlewis requested a review from a team March 22, 2021 17:06
Previously, the colfetcher ignored limit hints: it always fetched data
from KV until its batch was full. This produces bad behavior if the
batch size is larger than the limit hint. For example, if the expected
row count was 500, causing us to create a 500-sized batch, but the limit
hint for whatever reason was only 20, we would still go ahead and fetch
500 rows.

This, in practice, does not appear to show up too easily - if the
optimizer is doing its job, the batch size should always be equal to the
limit hint for limited scans.

Release note: None
Previously, the "dynamic batch size" strategy for the vectorized
engine's batch allocator worked the same in every situation: batches
would start at size 1, then double on each re-allocation, until they hit
their maximum size of 1024.

Now, to improve performance for scans that return a number of rows
somewhere in between 1 and 1024, we pass in the optimizer's best guess
of the number of rows that the scan will produce all the way down into
the TableReader. That guess is used as the initial size of the batch if
it's less than 1024.

Release note (performance improvement): improve the performance for the
vectorized engine when scanning fewer than 1024 rows at a time.
Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rebased and picked up the commits that were actually merged on master.

Reviewed 1 of 1 files at r1, 16 of 16 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis)

@yuzefovich
Copy link
Member

I'll go ahead and merge this too since it is fixing a GA release blocker.

@yuzefovich yuzefovich merged commit 23e7cb5 into cockroachdb:release-21.1 Mar 24, 2021
@jordanlewis jordanlewis deleted the backport21.1-62282 branch March 26, 2021 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants