sql: performance regression on TPC-H Q17 between 20.1 and 20.2 #55787

yuzefovich · 2020-10-20T22:25:22Z

I reran all TPCH queries (via tpchvec/perf test, 35 times total) with the default configs on 20.1.7 and 20.2.0-rc.2, and we've improved significantly on many of them, but on query 17 there is a regression:

17:	20.1.7: 0.50s	20.2: 0.65s

Last time we did a similar experiment some time around July, and we also saw a similar regression

17 |   | 0.46 | 0.59 | -22.16%

The plan for the query hasn't changed this year (based on tpch_vec logic test), so my guess is that it might be due to the changes around join reader (#48058, #49669). cc @asubiotto

The text was updated successfully, but these errors were encountered:

yuzefovich · 2020-10-21T03:51:18Z

Here is what I got when running Q17 on a single node cluster on my laptop.

8fec3f4 (before rowexec: change lookup join batch size to be specified in bytes #48058)

I201021 03:26:11.428268 146 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.29 seconds
I201021 03:26:11.697669 146 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.27 seconds
I201021 03:26:11.968696 146 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.27 seconds
I201021 03:26:12.237008 146 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.27 seconds
I201021 03:26:12.523119 146 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.29 seconds
I201021 03:26:12.789059 146 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.27 seconds
I201021 03:26:13.058016 146 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.27 seconds
I201021 03:26:13.326211 146 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.27 seconds
I201021 03:26:13.588399 146 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.26 seconds
I201021 03:26:13.851252 146 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.26 seconds

d7486e7 (after rowexec: change lookup join batch size to be specified in bytes #48058)

I201021 03:26:58.243520 82 workload/tpch/tpch.go:479  [q17] returned 1 rows after 1.35 seconds
I201021 03:26:58.560694 82 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds
I201021 03:26:58.868101 82 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.31 seconds
I201021 03:26:59.176833 82 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.31 seconds
I201021 03:26:59.488207 82 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.31 seconds
I201021 03:26:59.797702 82 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.31 seconds
I201021 03:27:00.106905 82 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.31 seconds
I201021 03:27:00.413490 82 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.31 seconds
I201021 03:27:00.717938 82 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.30 seconds
I201021 03:27:01.026726 82 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.31 seconds

948ce76 (before rowcontainer,rowexec: switch joinReader to use DiskBackedNumberedRowC… #49669)

I201021 03:28:22.767530 83 workload/tpch/tpch.go:479  [q17] returned 1 rows after 1.23 seconds
I201021 03:28:23.091941 83 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds
I201021 03:28:23.408893 83 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds
I201021 03:28:23.726643 83 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds
I201021 03:28:24.047218 83 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds
I201021 03:28:24.368609 83 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds
I201021 03:28:24.687644 83 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds
I201021 03:28:25.006102 83 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds
I201021 03:28:25.322248 83 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds
I201021 03:28:25.638241 83 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds

f39ae24 (after rowcontainer,rowexec: switch joinReader to use DiskBackedNumberedRowC… #49669)

I201021 03:29:17.008461 53 workload/tpch/tpch.go:479  [q17] returned 1 rows after 1.38 seconds
I201021 03:29:17.334548 53 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.33 seconds
I201021 03:29:17.654997 53 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds
I201021 03:29:17.973337 53 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds
I201021 03:29:18.292295 53 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds
I201021 03:29:18.612096 53 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds
I201021 03:29:18.931065 53 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds
I201021 03:29:19.248551 53 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds
I201021 03:29:19.563860 53 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds
I201021 03:29:19.883272 53 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds

2bf2f97 (before the third commit in colexec: introduce batches with dynamic capacity #52453)

I201021 03:44:46.151840 42 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.38 seconds
I201021 03:44:46.481265 42 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.33 seconds
I201021 03:44:46.812330 42 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.33 seconds
I201021 03:44:47.135636 42 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds
I201021 03:44:47.461597 42 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.33 seconds
I201021 03:44:47.793987 42 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.33 seconds
I201021 03:44:48.122648 42 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.33 seconds
I201021 03:44:48.443652 42 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds
I201021 03:44:48.766269 42 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds
I201021 03:44:49.093058 42 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.33 seconds

895125b (after colexec: introduce batches with dynamic capacity #52453)

I201021 03:48:04.726705 114 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.31 seconds
I201021 03:48:05.020483 114 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.29 seconds
I201021 03:48:05.324142 114 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.30 seconds
I201021 03:48:05.635863 114 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.31 seconds
I201021 03:48:05.949519 114 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.31 seconds
I201021 03:48:06.268810 114 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds
I201021 03:48:06.584554 114 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds
I201021 03:48:06.896186 114 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.31 seconds
I201021 03:48:07.205693 114 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.31 seconds
I201021 03:48:07.515525 114 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.31 seconds

Based on these observations it seems that #48058 is the cause of the slowdown. In the local environment it increased the latency by 15% or so, and possibly in the distributed environment the increase could explain 20-30% performance degradation we're seeing when comparing against 20.1 version (or it could be some other changes). Interestingly, the dynamic batch size behavior appears to have a very minor positive impact.

yuzefovich · 2020-10-21T04:10:49Z

Reducing join reader batch size hint for "ordering" strategy from 2MB to 256KB seems to give us those 15% back. I'll kick off a run of tpchvec/bench with this prototype to see what's the optimal number on a subset of TPCH queries.

yuzefovich · 2020-10-21T05:40:56Z

5 runs of tpchvec/bench showed that the current default of 2MB is the best option on the 11 queries I picked (2, 4, 5, 7, 8, 9, 11, 12, 17, 20, 21):

Yahors-MacBook-Pro:cockroach yuzefovich$ tail -n 8 artifacts/tpchvec/bench/run_1/test.log 
05:35:45 tpchvec.go:494: score of 131072 is 13.59
05:35:45 tpchvec.go:494: score of 524288 is 11.64
05:35:45 tpchvec.go:494: score of 1048576 is 11.62
05:35:45 tpchvec.go:494: score of 2097152 is 11.41
05:35:45 tpchvec.go:494: score of 4194304 is 11.67
05:35:45 tpchvec.go:494: score of 8388608 is 12.20
05:35:45 test.go:189: test status: ----- best setup is 2097152 -----
05:35:45 test_runner.go:768: tearing down after success; see teardown.log
Yahors-MacBook-Pro:cockroach yuzefovich$ tail -n 8 artifacts/tpchvec/bench/run_2/test.log 
05:37:21 tpchvec.go:494: score of 131072 is 13.08
05:37:21 tpchvec.go:494: score of 524288 is 11.73
05:37:21 tpchvec.go:494: score of 1048576 is 11.54
05:37:21 tpchvec.go:494: score of 2097152 is 11.36
05:37:21 tpchvec.go:494: score of 4194304 is 11.42
05:37:21 tpchvec.go:494: score of 8388608 is 12.45
05:37:21 test.go:189: test status: ----- best setup is 2097152 -----
05:37:21 test_runner.go:768: tearing down after success; see teardown.log
Yahors-MacBook-Pro:cockroach yuzefovich$ tail -n 8 artifacts/tpchvec/bench/run_3/test.log 
05:35:45 tpchvec.go:494: score of 131072 is 13.18
05:35:45 tpchvec.go:494: score of 524288 is 11.98
05:35:45 tpchvec.go:494: score of 1048576 is 11.99
05:35:45 tpchvec.go:494: score of 2097152 is 12.20
05:35:45 tpchvec.go:494: score of 4194304 is 11.88
05:35:45 tpchvec.go:494: score of 8388608 is 13.03
05:35:45 test.go:189: test status: ----- best setup is 4194304 -----
05:35:45 test_runner.go:768: tearing down after success; see teardown.log
Yahors-MacBook-Pro:cockroach yuzefovich$ tail -n 8 artifacts/tpchvec/bench/run_4/test.log 
05:34:07 tpchvec.go:494: score of 131072 is 13.45
05:34:07 tpchvec.go:494: score of 524288 is 11.78
05:34:07 tpchvec.go:494: score of 1048576 is 11.83
05:34:07 tpchvec.go:494: score of 2097152 is 11.74
05:34:07 tpchvec.go:494: score of 4194304 is 11.76
05:34:07 tpchvec.go:494: score of 8388608 is 12.86
05:34:07 test.go:189: test status: ----- best setup is 2097152 -----
05:34:07 test_runner.go:768: tearing down after success; see teardown.log
Yahors-MacBook-Pro:cockroach yuzefovich$ tail -n 8 artifacts/tpchvec/bench/run_5/test.log 
05:36:02 tpchvec.go:494: score of 131072 is 13.05
05:36:02 tpchvec.go:494: score of 524288 is 11.46
05:36:02 tpchvec.go:494: score of 1048576 is 11.44
05:36:02 tpchvec.go:494: score of 2097152 is 11.33
05:36:02 tpchvec.go:494: score of 4194304 is 11.51
05:36:02 tpchvec.go:494: score of 8388608 is 13.70
05:36:02 test.go:189: test status: ----- best setup is 2097152 -----
05:36:02 test_runner.go:768: tearing down after success; see teardown.log

(the lower the score, the better).

Closing the issue as an unfortunate regression that occurred because of the change that is actually beneficial in most cases.

yuzefovich added the C-performance Perf of queries or internals. Solution not expected to change functional behavior. label Oct 20, 2020

yuzefovich self-assigned this Oct 20, 2020

yuzefovich closed this as completed Oct 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: performance regression on TPC-H Q17 between 20.1 and 20.2 #55787

sql: performance regression on TPC-H Q17 between 20.1 and 20.2 #55787

yuzefovich commented Oct 20, 2020

yuzefovich commented Oct 21, 2020

yuzefovich commented Oct 21, 2020 •

edited

Loading

yuzefovich commented Oct 21, 2020 •

edited

Loading

sql: performance regression on TPC-H Q17 between 20.1 and 20.2 #55787

sql: performance regression on TPC-H Q17 between 20.1 and 20.2 #55787

Comments

yuzefovich commented Oct 20, 2020

yuzefovich commented Oct 21, 2020

yuzefovich commented Oct 21, 2020 • edited Loading

yuzefovich commented Oct 21, 2020 • edited Loading

yuzefovich commented Oct 21, 2020 •

edited

Loading

yuzefovich commented Oct 21, 2020 •

edited

Loading