-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
colexec: remove vectorize_row_count_thresold session variable #53893
Comments
I think the main work item here is benchmarking other execution operations with a small number of rows (e.g. hash joins, aggregations) to offer more data points on the performance difference in the worst case for the vectorized engine. |
On a related note, do we want to remove |
Yes, I think |
56206: sql: make SupportsVectorized check light-weight r=yuzefovich a=yuzefovich **colexec: pool allocations of some objects in the read path** This commit pools the allocations of some objects that are created on the simplest read path in the vectorized engine - when we have a ColBatchScan connected with a Materializer. Release note: None **colbuilder: fix casting behavior for actual types mismatch** We have recently merged a change that enforces that the colbuilder produces an operator chain that outputs batches with the desired type schema. This is enforced by planning casts when there is a mismatch. Previously, we would only try planning a vectorized cast because the assumption was that only integers of different widths will need to be cast in some cases, but as it turns out types of string family also might need to be cast (e.g. `string` and `"char"` aren't identical). This is now fixed by falling back to row-execution casting when a vectorized cast isn't supported. Release note: None **sql: make SupportsVectorized check light-weight** Previously, in order to determine whether we supported the vectorization of a flow, we would run a pretty expensive SupportsVectorized check that performs a "fake flow setup" by actually creating all of the components without running them. This has non-negligible performance impact on KV-like workloads, so it has been optimized away in favor of a more light-weight check that simply inspects the processor specs for the fact whether the processor core can be vectorized (either natively or by wrapping row-execution processor). All processor cores have been audited to separate out all that we currently cannot wrap (usually because they don't implement RowSource interface). Note that if a new processor core is introduced and `canWrap` check is not updated, we defensively assume that it cannot be wrapped and emit an assertion failed error that - hopefully - should surface the fact that we need to update the check. Addresses: #53893. Release note: None Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
56540: colexec: propagate the set of needed columns in table reader spec r=yuzefovich a=yuzefovich This commit adds the propagation of the set of needed columns via the table reader spec and that information is now used when setting up the ColBatchScans. The row-by-row engine is not affected since it still needs to set up the ProcOutputHelpers, but that is no longer needed in the vectorized engine which gives us a couple of percent improvement on KV microbenchmark. Addresses: #53893 Release note: None Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
55713: sql: decrease vectorize_row_count_threshold to 0 r=yuzefovich a=yuzefovich **colexec: add context to errors from builtin functions** This commit wraps the errors that occur during builtin functions evaluations to provide more context. Release note: None **sql: decrease vectorize_row_count_threshold to 0** This commit decreases the default value for `vectorize_row_count_threshold` setting to 0 which means that we will be using the vectorized engine for all supported queries. We intend to remove that setting entirely in 21.1 release, but for now we choose the option of effectively disabling it, just in case. The benchmarks have shown the following: - -1.5% on KV95 - similar performance on TPCC - -3% on movr - -10% on miscellaneous operations (joins, aggregations) on small tables. We think that such gap is small enough to merge this change, and we intend to optimize the vectorized engine more before making the final call for the default value for the 21.1 release. Additionally, this commit collects the trace metadata on the outboxes. Informs: #53893. Release note (sql change): The default value for `vectorize_row_count_threshold` setting has been decreased from 1000 to 0 meaning that from now on we will always use the vectorized engine for all supported queries regardless of the row estimate (unless `vectorize=off` is set). Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Posting the last results we observed in #55713 for convenience: KV95
MOVR (with these parameters
I'll rerun KV95 and MOVR on 4989436. I'll also quickly confirm that there is still no regression on TPCC. |
New numbers (on 4989436). KV95
MOVR (with these parameters
TPCC still shows no significant difference. |
Action items: get the difference with 20.2 & the roachperf graph doesn't show a noticeable drop when we set this value to 0 previously. |
Comparison of 20.2.5 against 21.1.0.alpha3 with all default settings ( KV95
TPCC 100 warehouses
MOVR (with these parameters
|
OK. These look good to me cc @jordanlewis @awoods187 . Let's socialize it in your email thread? |
Why is there such a large variance across clouds? Is this a small sample size problem? Could we get one test, how about the hypothesized worst test, and run it 10 times to reduce randomness? |
Yes, I'm pretty sure the variance is due to having only a single run. I'll kick off 10 runs of KV95 3 min ramp 15 min duration to reduce the noise. |
Here is the info from 10 runs of KV95:
|
It looks like there wasn't any pushback on our decision, so I think the issue of "choosing 0 as the default" can be closed. Now the question is, do we want to remove the setting entirely in 21.1 release? I think we probably should lean on the safer side - keep the setting and remove it in 21.2 release. This will allow us to have a more fine-grained escape hatch than Another question is do we want to force all clusters to use the new default once they upgrade to 21.1? For some context, if the user has previously explicitly set the cluster setting to any value (including the previous default), that value will be kept after an upgrade. |
Yeah, let's be safe and remove it in the 21.1 release. I think we shouldn't force clusters to use the new vectorized setting if they set it explicitly. |
Cool, that was my thinking as well. I'll open up a PR to remove the setting on master and will not backport it to 21.1. |
By default, the vectorized execution engine is only used to execute a query if the query's row count estimate is larger than the
vectorize_row_count_threshold
. This was because there was a non-negligible allocation overhead.Since dynamic batches were introduced, the allocations were minimized and the vectorized execution engine demonstrates a 10% speedup on a point lookup workload (kv95), which is a worst-case scenario for the engine.
We should:
vectorize_row_count_threshold
since it is not useful anymore.The text was updated successfully, but these errors were encountered: