sql: investigate performance regressions on some queries #47058

yuzefovich · 2020-04-05T22:20:36Z

Our friends from Apollo team at Georgia Tech have found multiple queries on which our 19.1 version performs better than 19.2 which indicates some performance regressions. I reran all reported queries on a single node setup of 19.1.8, 19.2.5, and 20.1.0.beta-4-dirty and observed the regressions on most of the queries (all times are in seconds).

Case #	19.1.8	19.2.5	20.1.0
0	0.875	1.922	1.815
1	1.110	2.896	3.118
3	0.759	1.856	1.810
5	0.057	0.299	0.304
6	0.196	0.301	0.314
9	0.283	0.665	0.630
10	1.201	3.279	3.598
11	1.097	3.326	3.391
12	0.310	0.686	0.627
14	0.747	3.181	2.997
16	0.965	1.957	1.966
18	2.037	2.346	2.323
19	0.758	3.190	2.959
22	1.019	1.923	2.018
24	2.031	2.303	2.375

~~Note that query 6 runs in about the same time in 20.1.0 as in 19.2.5 if we set distsql=on. Further investigation is needed for why the behavior of distsql=auto has changed between 19.2.5 and 20.1.~~ This has been addressed by #47365, and I updated the run times.

I have not looked into any other regressions.

The repro instructions can be found in our Apollo slack channel.

The text was updated successfully, but these errors were encountered:

awoods187 · 2020-04-06T13:25:24Z

cc @RaduBerinde

47365: sql: ignore soft limits on scan nodes for distsql planning r=yuzefovich a=yuzefovich We have added propagation of soft limits in 20.1 release, and this causes some of the queries that used to run via DistSQL with `distsql=auto` to get a "should not distribute" recommendation during distsql physical planning. However, this can cause an egregious performance regression on some queries from 19.2 version. In order to keep the decision whether to distribute scans or not the same, we will be ignoring the soft limits on scan nodes for now. Addresses: #47058. Release note: None Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>

asubiotto · 2020-04-16T16:21:30Z

Wrote a script to check for plan differences in different versions. Here is the output for these queries. In each query, the top plan is 19.1.8 and the bottom is 20.1.0:
diffoutput.txt

Some differences are due to the output changing between versions. I haven't taken a close look at them yet but wanted to upload them in case anyone saw anything interesting. ~~In query 1, for example, there seems to be an extra ORDER BY in the window function as well as an extra render stage~~.

asubiotto · 2020-04-17T10:32:05Z

And here is the explain analyze output:
diffoutput_analyze.txt

asubiotto · 2020-04-22T16:20:11Z

Looking at the explain analyze output of some queries that have window functions it seems that a lot of these regressions can be explained by the fact that we previously did not respect the work limit in the windower processor. We seem to be properly spilling to disk in 20.1 so I would classify these as expected regressions. Queries that include window functions here are: 1, 10, 11, 14, 18, 19, 24 so we can eliminate those.

One thing for the optimizer team cc @RaduBerinde: Query 5 seems to use a merge join instead of a lookup join.

Still needs further investigation: 0, 3, 6, 9, 12, 16, and 22. The EXPLAIN (VERBOSE) output doesn't seem different for these and the EXPLAIN ANALYZE doesn't seem to display properly and for some reason the encoded links all have a bunch of repeated "A"s (example: Query 6 explain analyze plan)

awoods187 · 2020-04-28T14:52:41Z

thanks for taking a look at these @asubiotto! I'm curious about query 5 @rytaft or @RaduBerinde

rytaft · 2020-04-29T14:34:08Z

I've looked into this, and I'm pretty sure that query 5 changed because we've made lookup joins more expensive since 19.1. In particular, #40248 and #43003 probably had an impact. I haven't actually done a git bisect, but I'm pretty sure that's the cause.

Part of the problem here is that our row count estimate of the distinct table after filtering is incorrect since we can't determine that the filter on district is actually false. We could do that pretty easily by creating a rule that converts exists(<subquery known to return 0 rows>) to false. I'll open an issue to create such a rule.

This commit adds a new rule, EliminateExistsZeroRows, which converts an Exists subquery to False when it's known that the input produces zero rows. Informs cockroachdb#47058 Release note (performance improvement): The optimizer can now detect when an Exists subquery can be eliminated because the input has zero rows. This leads to better plans in some cases.

rytaft · 2020-04-29T15:48:23Z

Actually -- just went ahead and submitted a PR since it was so trivial: #48162. I confirmed that the resulting plan for query 5 is much better.

46992: sql: Add Logical Column ID field to ColumnDescriptor r=rohany a=RichardJCai The LogicalColumnID field mimics the ColumnID field however LogicalColumnID may be swapped between two columns whereas ColumnID cannot. LogicalColumnID is referenced for virtual tables (pg_catalog, information_schema) and most notably affects column ordering for SHOW COLUMNS. This LogicalColumnID field support swapping the order of two columns - currently only used for ALTER COLUMN TYPE when a shadow column is created and swapped with it's original column. Does not affect existing behaviour. Release note: None 47449: cli: add --cert-principal-map to client commands r=petermattis a=petermattis Add support for the `--cert-principal-map` flag to the certs and client commands. Anywhere we were accepting the `--certs-dir` flag, we now also accept the `--cert-principal-map` flag. Fixes #47300 Release note (cli change): Support the `--cert-principal-map` flag in the `cert *` and "client" commands such as `sql`. 48138: keys: support splitting Ranges on tenant-id prefixed keys r=nvanbenschoten a=nvanbenschoten Fixes #48122. Relates to #47903. Relates to #48123. This PR contains a series of small commits that work towards the introduction of tenant-id prefixed keyspaces and begin the removal of some `keys.TODOSQLCodec` instances. This should be the only time we need to touch C++ throughout this work. 48160: storage,libroach: Check for MaxKeys when reading from intent history r=itsbilal a=itsbilal We weren't checking for MaxKeys (or TargetBytes) being reached in the case where we read from intent history in the MVCC scanner. All other cases go through addAndAdvance(), which had these checks. Almost certainly fixes #46652. Would be very surprised if it was something else. Release note (bug fix): Fixes a bug where a read operation in a transaction would error out for exceeding the maximum count of results returned. 48162: opt: add rule to eliminate Exists when input has zero rows r=rytaft a=rytaft This commit adds a new rule, `EliminateExistsZeroRows`, which converts an `Exists` subquery to False when it's known that the input produces zero rows. Informs #47058 Release note (performance improvement): The optimizer can now detect when an Exists subquery can be eliminated because the input has zero rows. This leads to better plans in some cases. Co-authored-by: richardjcai <caioftherichard@gmail.com> Co-authored-by: Peter Mattis <petermattis@gmail.com> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com> Co-authored-by: Bilal Akhtar <bilal@cockroachlabs.com> Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>

yuzefovich · 2021-04-07T16:57:13Z

I think we addressed the two main problems identified by this issue (#47365, #48162), and most of the remaining regressions were expected given the fix to the window functions, so I'm closing this issue.

yuzefovich added the C-performance Perf of queries or internals. Solution not expected to change functional behavior. label Apr 5, 2020

yuzefovich mentioned this issue Apr 11, 2020

sql: ignore soft limits on scan nodes for distsql planning #47365

Merged

yuzefovich mentioned this issue Apr 11, 2020

release-20.1: sql: ignore soft limits on scan nodes for distsql planning #47371

Merged

asubiotto self-assigned this Apr 14, 2020

rytaft mentioned this issue Apr 29, 2020

opt: add rule to eliminate Exists when input has zero rows #48162

Merged

yuzefovich closed this as completed Apr 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: investigate performance regressions on some queries #47058

sql: investigate performance regressions on some queries #47058

yuzefovich commented Apr 5, 2020 •

edited

Loading

awoods187 commented Apr 6, 2020

asubiotto commented Apr 16, 2020 •

edited

Loading

asubiotto commented Apr 17, 2020

asubiotto commented Apr 22, 2020

awoods187 commented Apr 28, 2020

rytaft commented Apr 29, 2020 •

edited

Loading

rytaft commented Apr 29, 2020 •

edited

Loading

yuzefovich commented Apr 7, 2021

sql: investigate performance regressions on some queries #47058

sql: investigate performance regressions on some queries #47058

Comments

yuzefovich commented Apr 5, 2020 • edited Loading

awoods187 commented Apr 6, 2020

asubiotto commented Apr 16, 2020 • edited Loading

asubiotto commented Apr 17, 2020

asubiotto commented Apr 22, 2020

awoods187 commented Apr 28, 2020

rytaft commented Apr 29, 2020 • edited Loading

rytaft commented Apr 29, 2020 • edited Loading

yuzefovich commented Apr 7, 2021

yuzefovich commented Apr 5, 2020 •

edited

Loading

asubiotto commented Apr 16, 2020 •

edited

Loading

rytaft commented Apr 29, 2020 •

edited

Loading

rytaft commented Apr 29, 2020 •

edited

Loading