release-21.1: sql: hint scan batch size by expected row count #62365

jordanlewis · 2021-03-22T17:06:41Z

Backport 2/2 commits from #62282.

/cc @cockroachdb/release

Might close #62198.

Previously, the "dynamic batch size" strategy for the vectorized
engine's batch allocator worked the same in every situation: batches
would start at size 1, then double on each re-allocation, until they hit
their maximum size of 1024.

Now, to improve performance for scans that return a number of rows
somewhere in between 1 and 1024, we pass in the optimizer's best guess
of the number of rows that the scan will produce all the way down into
the TableReader. That guess is used as the initial size of the batch if
it's less than 1024.

Release note (performance improvement): improve the performance for the
vectorized engine when scanning fewer than 1024 rows at a time.

cockroach-teamcity · 2021-03-22T17:06:49Z

This change is

Previously, the colfetcher ignored limit hints: it always fetched data from KV until its batch was full. This produces bad behavior if the batch size is larger than the limit hint. For example, if the expected row count was 500, causing us to create a 500-sized batch, but the limit hint for whatever reason was only 20, we would still go ahead and fetch 500 rows. This, in practice, does not appear to show up too easily - if the optimizer is doing its job, the batch size should always be equal to the limit hint for limited scans. Release note: None

Previously, the "dynamic batch size" strategy for the vectorized engine's batch allocator worked the same in every situation: batches would start at size 1, then double on each re-allocation, until they hit their maximum size of 1024. Now, to improve performance for scans that return a number of rows somewhere in between 1 and 1024, we pass in the optimizer's best guess of the number of rows that the scan will produce all the way down into the TableReader. That guess is used as the initial size of the batch if it's less than 1024. Release note (performance improvement): improve the performance for the vectorized engine when scanning fewer than 1024 rows at a time.

yuzefovich

I rebased and picked up the commits that were actually merged on master.

Reviewed 1 of 1 files at r1, 16 of 16 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis)

yuzefovich · 2021-03-24T03:25:18Z

I'll go ahead and merge this too since it is fixing a GA release blocker.

jordanlewis requested a review from a team as a code owner March 22, 2021 17:06

jordanlewis requested a review from a team March 22, 2021 17:06

jordanlewis added 2 commits March 23, 2021 18:16

yuzefovich force-pushed the backport21.1-62282 branch from 2ed7a9a to 55b7126 Compare March 24, 2021 01:16

yuzefovich approved these changes Mar 24, 2021

View reviewed changes

yuzefovich merged commit 23e7cb5 into cockroachdb:release-21.1 Mar 24, 2021

erikgrinaker mentioned this pull request Mar 25, 2021

20.2 → 21.1 roachperf benchmark regressions #62322

Closed

15 tasks

yuzefovich mentioned this pull request Mar 25, 2021

roachtest: tpcdsvec failed #62520

Closed

jordanlewis deleted the backport21.1-62282 branch March 26, 2021 19:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release-21.1: sql: hint scan batch size by expected row count #62365

release-21.1: sql: hint scan batch size by expected row count #62365

jordanlewis commented Mar 22, 2021

cockroach-teamcity commented Mar 22, 2021

yuzefovich left a comment

yuzefovich commented Mar 24, 2021

release-21.1: sql: hint scan batch size by expected row count #62365

release-21.1: sql: hint scan batch size by expected row count #62365

Conversation

jordanlewis commented Mar 22, 2021

cockroach-teamcity commented Mar 22, 2021

yuzefovich left a comment

Choose a reason for hiding this comment

yuzefovich commented Mar 24, 2021