Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: performance regression on TPC-H Q17 between 20.1 and 20.2 #55787

Closed
yuzefovich opened this issue Oct 20, 2020 · 3 comments
Closed

sql: performance regression on TPC-H Q17 between 20.1 and 20.2 #55787

yuzefovich opened this issue Oct 20, 2020 · 3 comments
Assignees
Labels
C-performance Perf of queries or internals. Solution not expected to change functional behavior.

Comments

@yuzefovich
Copy link
Member

I reran all TPCH queries (via tpchvec/perf test, 35 times total) with the default configs on 20.1.7 and 20.2.0-rc.2, and we've improved significantly on many of them, but on query 17 there is a regression:

17:	20.1.7: 0.50s	20.2: 0.65s

Last time we did a similar experiment some time around July, and we also saw a similar regression

17 |   | 0.46 | 0.59 | -22.16%

The plan for the query hasn't changed this year (based on tpch_vec logic test), so my guess is that it might be due to the changes around join reader (#48058, #49669). cc @asubiotto

@yuzefovich yuzefovich added the C-performance Perf of queries or internals. Solution not expected to change functional behavior. label Oct 20, 2020
@yuzefovich yuzefovich self-assigned this Oct 20, 2020
@yuzefovich
Copy link
Member Author

Here is what I got when running Q17 on a single node cluster on my laptop.

I201021 03:26:11.428268 146 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.29 seconds
I201021 03:26:11.697669 146 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.27 seconds
I201021 03:26:11.968696 146 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.27 seconds
I201021 03:26:12.237008 146 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.27 seconds
I201021 03:26:12.523119 146 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.29 seconds
I201021 03:26:12.789059 146 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.27 seconds
I201021 03:26:13.058016 146 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.27 seconds
I201021 03:26:13.326211 146 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.27 seconds
I201021 03:26:13.588399 146 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.26 seconds
I201021 03:26:13.851252 146 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.26 seconds
I201021 03:26:58.243520 82 workload/tpch/tpch.go:479  [q17] returned 1 rows after 1.35 seconds
I201021 03:26:58.560694 82 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds
I201021 03:26:58.868101 82 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.31 seconds
I201021 03:26:59.176833 82 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.31 seconds
I201021 03:26:59.488207 82 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.31 seconds
I201021 03:26:59.797702 82 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.31 seconds
I201021 03:27:00.106905 82 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.31 seconds
I201021 03:27:00.413490 82 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.31 seconds
I201021 03:27:00.717938 82 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.30 seconds
I201021 03:27:01.026726 82 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.31 seconds
I201021 03:28:22.767530 83 workload/tpch/tpch.go:479  [q17] returned 1 rows after 1.23 seconds
I201021 03:28:23.091941 83 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds
I201021 03:28:23.408893 83 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds
I201021 03:28:23.726643 83 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds
I201021 03:28:24.047218 83 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds
I201021 03:28:24.368609 83 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds
I201021 03:28:24.687644 83 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds
I201021 03:28:25.006102 83 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds
I201021 03:28:25.322248 83 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds
I201021 03:28:25.638241 83 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds
I201021 03:29:17.008461 53 workload/tpch/tpch.go:479  [q17] returned 1 rows after 1.38 seconds
I201021 03:29:17.334548 53 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.33 seconds
I201021 03:29:17.654997 53 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds
I201021 03:29:17.973337 53 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds
I201021 03:29:18.292295 53 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds
I201021 03:29:18.612096 53 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds
I201021 03:29:18.931065 53 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds
I201021 03:29:19.248551 53 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds
I201021 03:29:19.563860 53 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds
I201021 03:29:19.883272 53 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds
I201021 03:44:46.151840 42 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.38 seconds
I201021 03:44:46.481265 42 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.33 seconds
I201021 03:44:46.812330 42 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.33 seconds
I201021 03:44:47.135636 42 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds
I201021 03:44:47.461597 42 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.33 seconds
I201021 03:44:47.793987 42 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.33 seconds
I201021 03:44:48.122648 42 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.33 seconds
I201021 03:44:48.443652 42 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds
I201021 03:44:48.766269 42 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds
I201021 03:44:49.093058 42 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.33 seconds
I201021 03:48:04.726705 114 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.31 seconds
I201021 03:48:05.020483 114 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.29 seconds
I201021 03:48:05.324142 114 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.30 seconds
I201021 03:48:05.635863 114 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.31 seconds
I201021 03:48:05.949519 114 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.31 seconds
I201021 03:48:06.268810 114 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds
I201021 03:48:06.584554 114 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.32 seconds
I201021 03:48:06.896186 114 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.31 seconds
I201021 03:48:07.205693 114 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.31 seconds
I201021 03:48:07.515525 114 workload/tpch/tpch.go:479  [q17] returned 1 rows after 0.31 seconds

Based on these observations it seems that #48058 is the cause of the slowdown. In the local environment it increased the latency by 15% or so, and possibly in the distributed environment the increase could explain 20-30% performance degradation we're seeing when comparing against 20.1 version (or it could be some other changes). Interestingly, the dynamic batch size behavior appears to have a very minor positive impact.

@yuzefovich
Copy link
Member Author

yuzefovich commented Oct 21, 2020

Reducing join reader batch size hint for "ordering" strategy from 2MB to 256KB seems to give us those 15% back. I'll kick off a run of tpchvec/bench with this prototype to see what's the optimal number on a subset of TPCH queries.

@yuzefovich
Copy link
Member Author

yuzefovich commented Oct 21, 2020

5 runs of tpchvec/bench showed that the current default of 2MB is the best option on the 11 queries I picked (2, 4, 5, 7, 8, 9, 11, 12, 17, 20, 21):

Yahors-MacBook-Pro:cockroach yuzefovich$ tail -n 8 artifacts/tpchvec/bench/run_1/test.log 
05:35:45 tpchvec.go:494: score of 131072 is 13.59
05:35:45 tpchvec.go:494: score of 524288 is 11.64
05:35:45 tpchvec.go:494: score of 1048576 is 11.62
05:35:45 tpchvec.go:494: score of 2097152 is 11.41
05:35:45 tpchvec.go:494: score of 4194304 is 11.67
05:35:45 tpchvec.go:494: score of 8388608 is 12.20
05:35:45 test.go:189: test status: ----- best setup is 2097152 -----
05:35:45 test_runner.go:768: tearing down after success; see teardown.log
Yahors-MacBook-Pro:cockroach yuzefovich$ tail -n 8 artifacts/tpchvec/bench/run_2/test.log 
05:37:21 tpchvec.go:494: score of 131072 is 13.08
05:37:21 tpchvec.go:494: score of 524288 is 11.73
05:37:21 tpchvec.go:494: score of 1048576 is 11.54
05:37:21 tpchvec.go:494: score of 2097152 is 11.36
05:37:21 tpchvec.go:494: score of 4194304 is 11.42
05:37:21 tpchvec.go:494: score of 8388608 is 12.45
05:37:21 test.go:189: test status: ----- best setup is 2097152 -----
05:37:21 test_runner.go:768: tearing down after success; see teardown.log
Yahors-MacBook-Pro:cockroach yuzefovich$ tail -n 8 artifacts/tpchvec/bench/run_3/test.log 
05:35:45 tpchvec.go:494: score of 131072 is 13.18
05:35:45 tpchvec.go:494: score of 524288 is 11.98
05:35:45 tpchvec.go:494: score of 1048576 is 11.99
05:35:45 tpchvec.go:494: score of 2097152 is 12.20
05:35:45 tpchvec.go:494: score of 4194304 is 11.88
05:35:45 tpchvec.go:494: score of 8388608 is 13.03
05:35:45 test.go:189: test status: ----- best setup is 4194304 -----
05:35:45 test_runner.go:768: tearing down after success; see teardown.log
Yahors-MacBook-Pro:cockroach yuzefovich$ tail -n 8 artifacts/tpchvec/bench/run_4/test.log 
05:34:07 tpchvec.go:494: score of 131072 is 13.45
05:34:07 tpchvec.go:494: score of 524288 is 11.78
05:34:07 tpchvec.go:494: score of 1048576 is 11.83
05:34:07 tpchvec.go:494: score of 2097152 is 11.74
05:34:07 tpchvec.go:494: score of 4194304 is 11.76
05:34:07 tpchvec.go:494: score of 8388608 is 12.86
05:34:07 test.go:189: test status: ----- best setup is 2097152 -----
05:34:07 test_runner.go:768: tearing down after success; see teardown.log
Yahors-MacBook-Pro:cockroach yuzefovich$ tail -n 8 artifacts/tpchvec/bench/run_5/test.log 
05:36:02 tpchvec.go:494: score of 131072 is 13.05
05:36:02 tpchvec.go:494: score of 524288 is 11.46
05:36:02 tpchvec.go:494: score of 1048576 is 11.44
05:36:02 tpchvec.go:494: score of 2097152 is 11.33
05:36:02 tpchvec.go:494: score of 4194304 is 11.51
05:36:02 tpchvec.go:494: score of 8388608 is 13.70
05:36:02 test.go:189: test status: ----- best setup is 2097152 -----
05:36:02 test_runner.go:768: tearing down after success; see teardown.log

(the lower the score, the better).

Closing the issue as an unfortunate regression that occurred because of the change that is actually beneficial in most cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-performance Perf of queries or internals. Solution not expected to change functional behavior.
Projects
None yet
Development

No branches or pull requests

1 participant