colexec: remove vectorize_row_count_thresold session variable #53893

asubiotto · 2020-09-03T13:16:25Z

By default, the vectorized execution engine is only used to execute a query if the query's row count estimate is larger than the vectorize_row_count_threshold. This was because there was a non-negligible allocation overhead.

Since dynamic batches were introduced, the allocations were minimized and the vectorized execution engine demonstrates a 10% speedup on a point lookup workload (kv95), which is a worst-case scenario for the engine.

We should:

Remove the vectorize_row_count_threshold since it is not useful anymore.
Create an optimizer cost model to take into account the fact that vectorized execution will now be unconditionally used if set (this will be the default). This is a lot easier than what we wanted to do previously, which was make the optimizer decide whether vectorized execution would be beneficial.

The text was updated successfully, but these errors were encountered:

asubiotto · 2020-09-30T09:56:51Z

is likely out of scope for the immediate 21.1 release.

I think the main work item here is benchmarking other execution operations with a small number of rows (e.g. hash joins, aggregations) to offer more data points on the performance difference in the worst case for the vectorized engine.

yuzefovich · 2020-10-19T19:50:52Z

On a related note, do we want to remove vectorize=201auto setting as well? As a reminder, it means that only streaming queries will run via the vectorized engine when the threshold is met; we're removing the second condition, and I'm thinking that "streaming" concept is no longer useful differentiator as well because we have disk-spilling in all cases when the row engine has it. Thoughts? cc @jordanlewis

asubiotto · 2020-10-20T10:05:49Z

Yes, I think 201auto only makes sense in 20.2

56206: sql: make SupportsVectorized check light-weight r=yuzefovich a=yuzefovich **colexec: pool allocations of some objects in the read path** This commit pools the allocations of some objects that are created on the simplest read path in the vectorized engine - when we have a ColBatchScan connected with a Materializer. Release note: None **colbuilder: fix casting behavior for actual types mismatch** We have recently merged a change that enforces that the colbuilder produces an operator chain that outputs batches with the desired type schema. This is enforced by planning casts when there is a mismatch. Previously, we would only try planning a vectorized cast because the assumption was that only integers of different widths will need to be cast in some cases, but as it turns out types of string family also might need to be cast (e.g. `string` and `"char"` aren't identical). This is now fixed by falling back to row-execution casting when a vectorized cast isn't supported. Release note: None **sql: make SupportsVectorized check light-weight** Previously, in order to determine whether we supported the vectorization of a flow, we would run a pretty expensive SupportsVectorized check that performs a "fake flow setup" by actually creating all of the components without running them. This has non-negligible performance impact on KV-like workloads, so it has been optimized away in favor of a more light-weight check that simply inspects the processor specs for the fact whether the processor core can be vectorized (either natively or by wrapping row-execution processor). All processor cores have been audited to separate out all that we currently cannot wrap (usually because they don't implement RowSource interface). Note that if a new processor core is introduced and `canWrap` check is not updated, we defensively assume that it cannot be wrapped and emit an assertion failed error that - hopefully - should surface the fact that we need to update the check. Addresses: #53893. Release note: None Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>

56540: colexec: propagate the set of needed columns in table reader spec r=yuzefovich a=yuzefovich This commit adds the propagation of the set of needed columns via the table reader spec and that information is now used when setting up the ColBatchScans. The row-by-row engine is not affected since it still needs to set up the ProcOutputHelpers, but that is no longer needed in the vectorized engine which gives us a couple of percent improvement on KV microbenchmark. Addresses: #53893 Release note: None Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>

55713: sql: decrease vectorize_row_count_threshold to 0 r=yuzefovich a=yuzefovich **colexec: add context to errors from builtin functions** This commit wraps the errors that occur during builtin functions evaluations to provide more context. Release note: None **sql: decrease vectorize_row_count_threshold to 0** This commit decreases the default value for `vectorize_row_count_threshold` setting to 0 which means that we will be using the vectorized engine for all supported queries. We intend to remove that setting entirely in 21.1 release, but for now we choose the option of effectively disabling it, just in case. The benchmarks have shown the following: - -1.5% on KV95 - similar performance on TPCC - -3% on movr - -10% on miscellaneous operations (joins, aggregations) on small tables. We think that such gap is small enough to merge this change, and we intend to optimize the vectorized engine more before making the final call for the default value for the 21.1 release. Additionally, this commit collects the trace metadata on the outboxes. Informs: #53893. Release note (sql change): The default value for `vectorize_row_count_threshold` setting has been decreased from 1000 to 0 meaning that from now on we will always use the vectorized engine for all supported queries regardless of the row estimate (unless `vectorize=off` is set). Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>

yuzefovich · 2021-03-04T19:56:18Z

Posting the last results we observed in #55713 for convenience:

KV95

1 min ramp, 5 min duration
- GCE: +3.68%
- AWS: -3.84%
2 min ramp, 10 min duration
- GCE: -2.84%
- AWS: -1.10%
3 min ramp, 15 min duration
- GCE: +3.58%
- AWS: -3.70%
5 min ramp, 30 min duration
- GCE: -2.51%
- AWS: -0.66%
10 min ramp, 60 min duration
- GCE: +1.37%
- AWS: -1.87%

MOVR (with these parameters --num-histories 10000 --num-promo-codes 10000 --num-rides 5000 --num-users 500 --num-vehicles 150)

1 min ramp, 5 min duration
- GCE:
  - old: 300.0s 0 88306 294.4 3.4 1.2 12.1 44.0 96.5
  - new: 300.0s 0 86214 287.4 3.5 1.4 11.5 41.9 60.8
  - -2.37% in throughput
- AWS:
  - old: 300.0s 0 153160 510.5 2.0 0.9 6.8 22.0 226.5
  - new: 300.0s 0 147820 492.7 2.0 0.9 6.8 22.0 226.5
  - -3.49% in throughput
3 min ramp, 15 min duration
- GCE:
  - old: 900.0s 0 260376 289.3 3.5 1.4 11.5 41.9 67.1
  - new: 900.0s 0 254317 282.6 3.5 1.4 13.1 41.9 60.8
  - -2.33% in throughput
- AWS:
  - old: 900.0s 0 435143 483.5 2.1 1.0 6.6 22.0 226.5
  - new: 900.0s 0 417249 463.6 2.2 1.0 7.1 22.0 469.8
  - -4.11% in throughput
10 min ramp, 60 min duration
- GCE:
  - old: 3600.0s 0 1077971 299.4 3.3 1.6 10.0 37.7 209.7
  - new: 3600.0s 0 1055800 293.3 3.4 1.7 10.0 37.7 58.7
  - -2.06% in throughput
- AWS:
  - old: 3600.0s 0 1488654 413.5 2.4 1.4 6.8 22.0 436.2
  - new: 3600.0s 0 1418218 393.9 2.5 1.4 7.6 23.1 436.2
  - -4.73% in throughput

I'll rerun KV95 and MOVR on 4989436. I'll also quickly confirm that there is still no regression on TPCC.

yuzefovich · 2021-03-05T02:53:22Z

New numbers (on 4989436).

KV95

1 min ramp, 5 min duration
- GCE: -2.1%
- AWS: -2.3%
2 min ramp, 10 min duration
- GCE: -2.8%
- AWS: -2.9%
3 min ramp, 15 min duration
- GCE: -2.2%
- AWS: -3.0%

MOVR (with these parameters --num-histories 10000 --num-promo-codes 10000 --num-rides 5000 --num-users 500 --num-vehicles 150)

1 min ramp, 5 min duration
- GCE: -3.7%
- AWS: -4.8%
2 min ramp, 10 min duration
- GCE: -0.0%
- AWS: -4.4%
3 min ramp, 15 min duration
- GCE: -5.2%
- AWS: -6.5%

TPCC still shows no significant difference.

asubiotto · 2021-03-09T21:43:53Z

Action items: get the difference with 20.2 & the roachperf graph doesn't show a noticeable drop when we set this value to 0 previously.

yuzefovich · 2021-03-13T05:06:14Z

Comparison of 20.2.5 against 21.1.0.alpha3 with all default settings (vectorize_row_count_threshold=1000 for the former and 0 for the latter). All tests were run once on a 3 node roachprod cluster with the load coming from the fourth node.

KV95

1 min ramp, 5 min duration
- GCE: +1.5%
- AWS: +1.3%
2 min ramp, 10 min duration
- GCE: +2.1%
- AWS: -0.4%
3 min ramp, 15 min duration
- GCE: +2.1%
- AWS: -2.7%

TPCC 100 warehouses

1 min ramp, 5 min duration
- GCE: +2.9%
- AWS: -0.2%
2 min ramp, 10 min duration
- GCE: -0.2%
- AWS: -0.1%
3 min ramp, 15 min duration
- GCE: +0.0%
- AWS: -0.3%

MOVR (with these parameters --num-histories 10000 --num-promo-codes 10000 --num-rides 5000 --num-users 500 --num-vehicles 150)

1 min ramp, 5 min duration
- GCE: -0.0%
- AWS: -0.7%
2 min ramp, 10 min duration
- GCE: -3.2%
- AWS: -4.2%
3 min ramp, 15 min duration
- GCE: -4.4%
- AWS: -1.0%

asubiotto · 2021-03-15T15:40:40Z

OK. These look good to me cc @jordanlewis @awoods187 . Let's socialize it in your email thread?

awoods187 · 2021-03-16T03:31:47Z

Why is there such a large variance across clouds? Is this a small sample size problem? Could we get one test, how about the hypothesized worst test, and run it 10 times to reduce randomness?

yuzefovich · 2021-03-16T03:46:10Z

Yes, I'm pretty sure the variance is due to having only a single run. I'll kick off 10 runs of KV95 3 min ramp 15 min duration to reduce the noise.

yuzefovich · 2021-03-16T20:47:45Z

Here is the info from 10 runs of KV95:

3 min ramp, 15 min duration
- GCE:
  - 20.2.5: avg throughput 9685.81
  - 21.1.0.alpha3 threshold=1000: avg throughput 10017.62
  - 21.1.0.alpha3 threshold=0: avg throughput 9746.22
  - 20.2.5 vs 21.1.0.alpha3 threshold=0: +0.6%
  - 21.1.0.alpha3 (threshold 1000) vs 21.1.0.alpha3 (threshold 0): -3.7%
- AWS:
  - 20.2.5: avg throughput 10260.21
  - 21.1.0.alpha3 threshold=1000: avg throughput 10589.77
  - 21.1.0.alpha3 threshold=0: avg throughput 10349.99
  - 20.2.5 vs 21.1.0.alpha3 threshold=0: +0.9%
  - 21.1.0.alpha3 (threshold 1000) vs 21.1.0.alpha3 (threshold 0): -2.3%

yuzefovich · 2021-03-17T19:41:04Z

It looks like there wasn't any pushback on our decision, so I think the issue of "choosing 0 as the default" can be closed.

Now the question is, do we want to remove the setting entirely in 21.1 release? I think we probably should lean on the safer side - keep the setting and remove it in 21.2 release. This will allow us to have a more fine-grained escape hatch than vectorize=off. Thoughts?

Another question is do we want to force all clusters to use the new default once they upgrade to 21.1? For some context, if the user has previously explicitly set the cluster setting to any value (including the previous default), that value will be kept after an upgrade.

asubiotto · 2021-03-17T19:44:48Z

Yeah, let's be safe and remove it in the 21.1 release. I think we shouldn't force clusters to use the new vectorized setting if they set it explicitly.

yuzefovich · 2021-03-17T19:48:05Z

Cool, that was my thinking as well. I'll open up a PR to remove the setting on master and will not backport it to 21.1.

asubiotto mentioned this issue Sep 3, 2020

colexec/opt: add costing for the vectorized engine #49794

Closed

asubiotto added A-sql-execution Relating to SQL execution. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) labels Sep 3, 2020

cockroachdb deleted a comment from blathers-crl bot Sep 3, 2020

asubiotto assigned yuzefovich Oct 6, 2020

yuzefovich mentioned this issue Oct 19, 2020

sql: decrease vectorize_row_count_threshold to 0 #55713

Merged

yuzefovich mentioned this issue Nov 2, 2020

sql: make SupportsVectorized check light-weight #56206

Merged

yuzefovich mentioned this issue Nov 11, 2020

colexec: propagate the set of needed columns in table reader spec #56540

Merged

yuzefovich mentioned this issue Mar 17, 2021

sql: remove vectorize_row_count_threshold setting #62164

Merged

craig bot closed this as completed in 56842a8 Mar 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

colexec: remove vectorize_row_count_thresold session variable #53893

colexec: remove vectorize_row_count_thresold session variable #53893

asubiotto commented Sep 3, 2020

asubiotto commented Sep 30, 2020

yuzefovich commented Oct 19, 2020

asubiotto commented Oct 20, 2020

yuzefovich commented Mar 4, 2021

yuzefovich commented Mar 5, 2021 •

edited

Loading

asubiotto commented Mar 9, 2021

yuzefovich commented Mar 13, 2021

asubiotto commented Mar 15, 2021

awoods187 commented Mar 16, 2021

yuzefovich commented Mar 16, 2021

yuzefovich commented Mar 16, 2021

yuzefovich commented Mar 17, 2021 •

edited

Loading

asubiotto commented Mar 17, 2021

yuzefovich commented Mar 17, 2021

colexec: remove vectorize_row_count_thresold session variable #53893

colexec: remove vectorize_row_count_thresold session variable #53893

Comments

asubiotto commented Sep 3, 2020

asubiotto commented Sep 30, 2020

yuzefovich commented Oct 19, 2020

asubiotto commented Oct 20, 2020

yuzefovich commented Mar 4, 2021

yuzefovich commented Mar 5, 2021 • edited Loading

asubiotto commented Mar 9, 2021

yuzefovich commented Mar 13, 2021

asubiotto commented Mar 15, 2021

awoods187 commented Mar 16, 2021

yuzefovich commented Mar 16, 2021

yuzefovich commented Mar 16, 2021

yuzefovich commented Mar 17, 2021 • edited Loading

asubiotto commented Mar 17, 2021

yuzefovich commented Mar 17, 2021

yuzefovich commented Mar 5, 2021 •

edited

Loading

yuzefovich commented Mar 17, 2021 •

edited

Loading