sql: decrease vectorize_row_count_threshold to 0 #55713

yuzefovich · 2020-10-19T21:06:18Z

colexec: add context to errors from builtin functions

This commit wraps the errors that occur during builtin functions
evaluations to provide more context.

Release note: None

sql: decrease vectorize_row_count_threshold to 0

This commit decreases the default value for
vectorize_row_count_threshold setting to 0 which means that we will be
using the vectorized engine for all supported queries. We intend to
remove that setting entirely in 21.1 release, but for now we choose the
option of effectively disabling it, just in case.

The benchmarks have shown the following:

-1.5% on KV95
similar performance on TPCC
-3% on movr
-10% on miscellaneous operations (joins, aggregations) on small tables.

We think that such gap is small enough to merge this change, and we
intend to optimize the vectorized engine more before making the final
call for the default value for the 21.1 release.

Additionally, this commit collects the trace metadata on the outboxes.

Informs: #53893.

Release note (sql change): The default value for
vectorize_row_count_threshold setting has been decreased from 1000 to
0 meaning that from now on we will always use the vectorized engine for
all supported queries regardless of the row estimate (unless
vectorize=off is set).

cockroach-teamcity · 2020-10-19T21:06:29Z

This change is

yuzefovich · 2020-10-22T03:14:44Z

The benchmarks.

3 node roachprod cluster

TPCC, 100 warehouses, 1 min ramp, 5 min duration: -0.748% in tpmC

old

_elapsed_______tpmC____efc__avg(ms)__p50(ms)__p90(ms)__p95(ms)__p99(ms)_pMax(ms)
  300.0s     1203.2  93.6%     28.3     26.2     37.7     46.1     71.3    151.0

new

_elapsed_______tpmC____efc__avg(ms)__p50(ms)__p90(ms)__p95(ms)__p99(ms)_pMax(ms)
  300.0s     1194.2  92.9%     29.9     28.3     39.8     48.2     65.0    117.4

KV95, 1 min ramp, 5 min duration: -6.45% in throughput

old

Highest sequence written: 126430. Can be passed as --write-seq=R126430 to the next run.

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  300.0s        0        2142885         7143.0      1.0      1.0      1.8      3.0    159.4  read

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  300.0s        0         113060          376.9      2.5      2.4      4.1      7.3     30.4  write

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result
  300.0s        0        2255945         7519.8      1.1      1.0      2.4      3.5    159.4

new

Highest sequence written: 118557. Can be passed as --write-seq=R118557 to the next run.

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  300.0s        0        2004183         6680.6      1.1      1.0      2.0      3.5    151.0  read

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  300.0s        0         106178          353.9      2.6      2.5      4.1      6.6     23.1  write

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result
  300.0s        0        2110361         7034.5      1.1      1.0      2.5      3.9    151.0

KV95, 5 min ramp, 30 min duration: -8.87% in throughput

old

Highest sequence written: 843826. Can be passed as --write-seq=R843826 to the next run.

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
 1800.0s        0       14648469         8138.0      0.9      0.9      1.6      2.4     50.3  read

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
 1800.0s        0         771129          428.4      2.1      2.0      3.1      4.7     56.6  write

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result
 1800.0s        0       15419598         8566.4      0.9      0.9      1.9      2.8     56.6

new

Highest sequence written: 774506. Can be passed as --write-seq=R774506 to the next run.

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
 1800.0s        0       13347899         7415.5      1.0      0.9      1.7      2.9    142.6  read

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
 1800.0s        0         703820          391.0      2.3      2.2      3.7      5.8     31.5  write

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result
 1800.0s        0       14051719         7806.5      1.0      1.0      2.1      3.3    142.6

I trust these numbers a lot more than the ones I got here because benchmarking on the laptop can have significant variance in performance. My current guess is that this difference is mostly caused by the fact that we're pooling tableReader objects as well as reusing some of the slices in tableInfo objects in the row-by-row engine. I looked briefly into how we could do the same in the vectorized engine, and especially I wanted to remove a lot of duplication between cFetcher and row.fetcher (as well as cTableInfo and tableInfo), but it's not that easy. cc @cockroachdb/sql-execution

jordanlewis · 2020-10-22T03:22:06Z

Thanks for running these benchmarks. What setup did you use? I guess we're not quite ready to merge this yet.

yuzefovich · 2020-10-22T03:27:33Z

It was 3 node roachprod cluster with the default hardware options in GCE, the load was coming from node 1. Is this what you mean by "setup"?

I took a look at profiles, and nothing really stood out, and the pooling behavior is the main difference I can see right now. I'll take a stab tomorrow at cleaning up the fetchers a bit.

jordanlewis · 2020-10-22T03:30:14Z

Yes, I meant hardware setup. Thanks!

awoods187 · 2020-10-22T11:41:52Z

On tpcc we typically use a fourth node for the workload to simulate an application being separate from the database resources. May be worth adjusting the testing setup in the future.

On Wed, Oct 21, 2020 at 11:30 PM Jordan Lewis ***@***.***> wrote: Yes, I meant hardware setup. Thanks! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#55713 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFJ7F75CGTQ32LCAJ6BVEMLSL6ROZANCNFSM4SWWJKFQ> .

-- Andy Woods Senior Product Manager, Cockroach Labs andy@cockroachlabs.com

yuzefovich · 2020-10-23T01:42:03Z

One pretty obvious thing that I missed that is likely to explain the majority of the difference is that without vectorize_row_count_threshold we are now always running SupportsVectorized check (previously, we would do so only if the threshold is met). That check especially shows up on KV-like workload because we're essentially instantiating the whole flow twice with only the second time running it (the flow runtime is comparable to the flow setup time, so it is too expensive to do 2 x setup), and in particular that means that we're creating a ColBatchScan and initializing cFetcher twice too - cFetcher.Init is pretty heavy.

So I think apart from the pooling of objects we'll need to do something about this check. I feel like the most viable option is to make it a lot more lightweight: instead of creating the whole fake flow with all of the components, we could probably just inspect the processor specs and see whether there are any that we refuse to wrap, and if we either support "natively" or can wrap all of the cores, then we'll assume that we can successfully setup the whole flow. I think when SupportsVectorized check was introduced, we had a lot more edge cases that needed to be handled, so we went with the fake flow setup, but now that might be unnecessary anymore. It does feel a bit scary to remove that check though.

yuzefovich · 2020-10-23T03:46:44Z

Hm, to my surprise a prototype with some pooling and a more lightweight support check (#55883) shows a similar big drop in performance between vectorize_row_count_threshold=1000 and vectorize_row_count_threshold=0. I guess we'll have to abandon this PR for now :/

knz

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)

pkg/cli/cli_test.go, line 531 at r1 (raw file):

	c.RunWithArgs([]string{`sql`, `--set`, `unknownoption`, `-e`, `select 123 as "123"`})
	// Check that partial results + error get reported together.
	c.RunWithArgs([]string{`sql`, `-e`, `select 1/(@1-2) from generate_series(1,3)`})

Out of curiosity, why did you need to change this?

yuzefovich

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz)

pkg/cli/cli_test.go, line 531 at r1 (raw file):

Previously, knz (kena) wrote…

Out of curiosity, why did you need to change this?

Currently the test will fail if run via the vectorized engine because a division by zero error will occur sooner (it occurs on the third row, but we have batches of growing capacity starting at 1, so the batch sizes will be 1, 2, 4, ..., 1024, 1024, ..., and the error occurs when processing the second batch at which point only 1 row has been returned to the client whereas the test expects 2 rows).

knz

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)

pkg/cli/cli_test.go, line 531 at r1 (raw file):

Previously, yuzefovich wrote…

Currently the test will fail if run via the vectorized engine because a division by zero error will occur sooner (it occurs on the third row, but we have batches of growing capacity starting at 1, so the batch sizes will be 1, 2, 4, ..., 1024, 1024, ..., and the error occurs when processing the second batch at which point only 1 row has been returned to the client whereas the test expects 2 rows).

I think it's worth explaining this in a comment here.

asubiotto · 2020-10-26T10:17:43Z

Let's discuss this and how to move forward during the next meeting.

yuzefovich · 2020-10-26T14:39:58Z

A newer prototype (that is almost production ready) shows an improvement - the performance hit on KV95 if we remove SupportsVectorized check in favor of a more light-weight one decreases from 6-8% to 3-4%.

yuzefovich · 2020-12-01T04:44:02Z

More KV95 numbers:

1 min ramp, 5 min duration
- GCE: +3.68%
- AWS: -3.84%
2 min ramp, 10 min duration
- GCE: -2.84%
- AWS: -1.10%
3 min ramp, 15 min duration
- GCE: +3.58%
- AWS: -3.70%
5 min ramp, 30 min duration
- GCE: -2.51%
- AWS: -0.66%
10 min ramp, 60 min duration
- GCE: +1.37%
- AWS: -1.87%

jordanlewis · 2020-12-01T19:42:07Z

LGTM! Ship it!

jordanlewis · 2020-12-01T19:59:01Z

Err, I guess we decided to also see TPCC numbers, right? I'm getting eager.

yuzefovich · 2020-12-01T20:23:23Z

Yes, I'll be rerunning TPCC today as well as MovR to see what's up, and if those look good, we'll merge tomorrow I think.

yuzefovich · 2020-12-02T05:33:04Z

TPCC100 numbers again show no significant difference:

1 min ramp, 5 min duration
- GCE:
  - old: 300.0s 1193.2 92.8% 24.1 23.1 30.4 37.7 54.5 92.3
  - new: 300.0s 1194.8 92.9% 23.9 23.1 30.4 37.7 52.4 96.5
- AWS:
  - old: 300.0s 1204.8 93.7% 16.4 15.7 19.9 26.2 39.8 104.9
  - new: 300.0s 1198.6 93.2% 17.3 16.3 22.0 27.3 41.9 226.5
3 min ramp, 15 min duration
- GCE:
  - old: 900.0s 1233.3 95.9% 21.9 22.0 26.2 30.4 46.1 121.6
  - new: 900.0s 1228.7 95.5% 21.9 21.0 27.3 31.5 48.2 88.1
- AWS:
  - old: 900.0s 1237.5 96.2% 15.9 15.2 18.9 22.0 35.7 243.3
  - new: 900.0s 1235.9 96.1% 15.1 14.7 17.8 21.0 33.6 100.7
10 min ramp, 60 min duration
- GCE:
  - old: 3600.0s 1251.7 97.3% 23.1 23.1 27.3 31.5 48.2 130.0
  - new: 3600.0s 1252.0 97.4% 22.5 22.0 26.2 30.4 48.2 151.0
- AWS:
  - old: 3600.0s 1255.1 97.6% 15.3 14.7 17.8 19.9 33.6 436.2
  - new: 3600.0s 1253.5 97.5% 15.7 15.2 17.8 19.9 33.6 226.5

The only interesting detail is that during 60 min run with old config on AWS apparently one node died and "new order" transactions were returning an error. I didn't notice it before wiping the cluster, so I'm not sure what happened.

And here are the numbers of movr workload (with these parameters --num-histories 10000 --num-promo-codes 10000 --num-rides 5000 --num-users 500 --num-vehicles 150):

1 min ramp, 5 min duration
- GCE:
  - old: 300.0s 0 88306 294.4 3.4 1.2 12.1 44.0 96.5
  - new: 300.0s 0 86214 287.4 3.5 1.4 11.5 41.9 60.8
  - -2.37% in throughput
- AWS:
  - old: 300.0s 0 153160 510.5 2.0 0.9 6.8 22.0 226.5
  - new: 300.0s 0 147820 492.7 2.0 0.9 6.8 22.0 226.5
  - -3.49% in throughput
3 min ramp, 15 min duration
- GCE:
  - old: 900.0s 0 260376 289.3 3.5 1.4 11.5 41.9 67.1
  - new: 900.0s 0 254317 282.6 3.5 1.4 13.1 41.9 60.8
  - -2.33% in throughput
- AWS:
  - old: 900.0s 0 435143 483.5 2.1 1.0 6.6 22.0 226.5
  - new: 900.0s 0 417249 463.6 2.2 1.0 7.1 22.0 469.8
  - -4.11% in throughput
10 min ramp, 60 min duration
- GCE:
  - old: 3600.0s 0 1077971 299.4 3.3 1.6 10.0 37.7 209.7
  - new: 3600.0s 0 1055800 293.3 3.4 1.7 10.0 37.7 58.7
  - -2.06% in throughput
- AWS:
  - old: 3600.0s 0 1488654 413.5 2.4 1.4 6.8 22.0 436.2
  - new: 3600.0s 0 1418218 393.9 2.5 1.4 7.6 23.1 436.2
  - -4.73% in throughput

yuzefovich · 2020-12-02T17:11:40Z

Updated all the benchmark numbers and the commit message. PTAL, @asubiotto @jordanlewis

asubiotto

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @knz)

yuzefovich · 2020-12-03T15:17:33Z

TFTR!

bors r+

craig · 2020-12-03T16:18:28Z

Build failed:

GitHub CI (Cockroach)

yuzefovich · 2020-12-03T16:50:39Z

test target timed out, testrace target is a flake that will be fixed shortly. I'll amend the commit and force-push in order to run the CI again before merging to see whether the time out is related or not.

This commit wraps the errors that occur during builtin functions evaluations to provide more context. Release note: None

This commit decreases the default value for `vectorize_row_count_threshold` setting to 0 which means that we will be using the vectorized engine for all supported queries. We intend to remove that setting entirely in 21.1 release, but for now we choose the option of effectively disabling it, just in case. The benchmarks have shown the following: - -1.5% on KV95 - similar performance on TPCC - -3% on movr - -10% on miscellaneous operations (joins, aggregations) on small tables. We think that such gap is small enough to merge this change, and we intend to optimize the vectorized engine more before making the final call for the default value for the 21.1 release. Additionally, this commit collects the trace metadata on the outboxes. Release note (sql change): The default value for `vectorize_row_count_threshold` setting has been decreased from 1000 to 0 meaning that from now on we will always use the vectorized engine for all supported queries regardless of the row estimate (unless `vectorize=off` is set).

yuzefovich · 2020-12-03T17:52:26Z

Ok, test target passed, testrace target should be fixed.

bors r+

craig · 2020-12-03T18:39:09Z

Build succeeded:

GitHub CI (Cockroach)

yuzefovich added the do-not-merge bors won't merge a PR with this label. label Oct 19, 2020

yuzefovich requested review from a team and dt and removed request for a team October 19, 2020 21:06

yuzefovich removed the request for review from dt October 19, 2020 21:07

yuzefovich force-pushed the vec-threshold branch from b253b6f to 0671b98 Compare October 19, 2020 23:48

yuzefovich requested a review from a team as a code owner October 19, 2020 23:48

yuzefovich removed the request for review from a team October 19, 2020 23:48

yuzefovich force-pushed the vec-threshold branch from 0671b98 to d8d6c5c Compare October 20, 2020 00:47

yuzefovich mentioned this pull request Oct 20, 2020

colexec: add a way to propagate metadata in a streaming fashion #55758

Open

yuzefovich force-pushed the vec-threshold branch from d8d6c5c to 8591ce0 Compare October 21, 2020 17:23

yuzefovich changed the title ~~WIP sql: remove vectorize_row_count_threshold~~ sql: remove vectorize_row_count_threshold Oct 21, 2020

yuzefovich force-pushed the vec-threshold branch from 8591ce0 to 11ceb86 Compare October 21, 2020 17:48

yuzefovich force-pushed the vec-threshold branch from 11ceb86 to d38f53e Compare October 22, 2020 23:54

knz reviewed Oct 23, 2020

View reviewed changes

yuzefovich commented Oct 23, 2020

View reviewed changes

knz reviewed Oct 23, 2020

View reviewed changes

yuzefovich force-pushed the vec-threshold branch from d38f53e to 132b429 Compare October 28, 2020 20:59

yuzefovich mentioned this pull request Oct 28, 2020

sql: WIP on changing SupportsVectorized check #55948

Closed

yuzefovich changed the title ~~sql: remove vectorize_row_count_threshold~~ sql: decrease vectorize_row_count_threshold to 0 Oct 28, 2020

yuzefovich force-pushed the vec-threshold branch from 43166a8 to 266a4dc Compare December 2, 2020 16:52

yuzefovich removed the do-not-merge bors won't merge a PR with this label. label Dec 2, 2020

yuzefovich force-pushed the vec-threshold branch 2 times, most recently from 220b010 to a138cea Compare December 2, 2020 21:25

asubiotto approved these changes Dec 3, 2020

View reviewed changes

yuzefovich mentioned this pull request Dec 3, 2020

util/log: data race in tests #57466

Closed

yuzefovich force-pushed the vec-threshold branch from a138cea to 3003207 Compare December 3, 2020 16:51

yuzefovich added 2 commits December 3, 2020 09:44

colexec: add context to errors from builtin functions

b3faab8

This commit wraps the errors that occur during builtin functions evaluations to provide more context. Release note: None

yuzefovich force-pushed the vec-threshold branch from 3003207 to 6203345 Compare December 3, 2020 17:44

craig bot merged commit 05c825b into cockroachdb:master Dec 3, 2020

yuzefovich deleted the vec-threshold branch December 3, 2020 18:57

jseldess mentioned this pull request Jan 29, 2021

sql: decrease vectorize_row_count_threshold to 0 cockroachdb/docs#9553

Closed

yuzefovich mentioned this pull request Mar 4, 2021

colexec: remove vectorize_row_count_thresold session variable #53893

Closed

This was referenced Mar 18, 2021

YCSB/E performance regression on December 4th #62198

Closed

kv: gradual kv95 performance regression since December 1st #62156

Closed

20.2 → 21.1 roachperf benchmark regressions #62322

Closed

rytaft mentioned this pull request Mar 30, 2021

release: 21.1 microbenchmark regression investigation and sign-off #62212

Closed

35 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: decrease vectorize_row_count_threshold to 0 #55713

sql: decrease vectorize_row_count_threshold to 0 #55713

yuzefovich commented Oct 19, 2020 •

edited

Loading

cockroach-teamcity commented Oct 19, 2020

yuzefovich commented Oct 22, 2020 •

edited

Loading

jordanlewis commented Oct 22, 2020

yuzefovich commented Oct 22, 2020

jordanlewis commented Oct 22, 2020

awoods187 commented Oct 22, 2020 via email

yuzefovich commented Oct 23, 2020 •

edited

Loading

yuzefovich commented Oct 23, 2020 •

edited

Loading

knz left a comment

yuzefovich left a comment

knz left a comment

asubiotto commented Oct 26, 2020

yuzefovich commented Oct 26, 2020

yuzefovich commented Dec 1, 2020

jordanlewis commented Dec 1, 2020

jordanlewis commented Dec 1, 2020

yuzefovich commented Dec 1, 2020

yuzefovich commented Dec 2, 2020 •

edited

Loading

yuzefovich commented Dec 2, 2020

asubiotto left a comment

yuzefovich commented Dec 3, 2020

craig bot commented Dec 3, 2020

yuzefovich commented Dec 3, 2020

yuzefovich commented Dec 3, 2020

craig bot commented Dec 3, 2020

sql: decrease vectorize_row_count_threshold to 0 #55713

sql: decrease vectorize_row_count_threshold to 0 #55713

Conversation

yuzefovich commented Oct 19, 2020 • edited Loading

cockroach-teamcity commented Oct 19, 2020

yuzefovich commented Oct 22, 2020 • edited Loading

3 node roachprod cluster

jordanlewis commented Oct 22, 2020

yuzefovich commented Oct 22, 2020

jordanlewis commented Oct 22, 2020

awoods187 commented Oct 22, 2020 via email

yuzefovich commented Oct 23, 2020 • edited Loading

yuzefovich commented Oct 23, 2020 • edited Loading

knz left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

knz left a comment

Choose a reason for hiding this comment

asubiotto commented Oct 26, 2020

yuzefovich commented Oct 26, 2020

yuzefovich commented Dec 1, 2020

jordanlewis commented Dec 1, 2020

jordanlewis commented Dec 1, 2020

yuzefovich commented Dec 1, 2020

yuzefovich commented Dec 2, 2020 • edited Loading

yuzefovich commented Dec 2, 2020

asubiotto left a comment

Choose a reason for hiding this comment

yuzefovich commented Dec 3, 2020

craig bot commented Dec 3, 2020

yuzefovich commented Dec 3, 2020

yuzefovich commented Dec 3, 2020

craig bot commented Dec 3, 2020

yuzefovich commented Oct 19, 2020 •

edited

Loading

yuzefovich commented Oct 22, 2020 •

edited

Loading

yuzefovich commented Oct 23, 2020 •

edited

Loading

yuzefovich commented Oct 23, 2020 •

edited

Loading

yuzefovich commented Dec 2, 2020 •

edited

Loading