Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

executor: vectorize the merge join executor by vecGroupChecker #14458

Merged
merged 5 commits into from
Feb 4, 2020

Conversation

catror
Copy link
Contributor

@catror catror commented Jan 12, 2020

PCP #14216

What problem does this PR solve?

Vectorize the merge join executor by vecGroupChecker, then we can process it group by group.

What is changed and how it works?

  1. combine struct mergeJoinInnerTable and mergeJoinOuterTable to mergeJoinTable
  2. group rows by join keys by vecGroupChecker
  3. process outer table by group instead of by row

Check List

Tests

  • Unit test
  • Integration test

Benchmark

benchstat -delta-test none old.txt new.txt
name                                                                                                                                                old time/op    new time/op    delta
MergeJoinExec/merge_join_(outerRows:300000,_innerRows:300000,_concurency:4,_outerJoinKeyIdx:_[0_1],_innerJoinKeyIdx:_[0_1],_NeedOuterSort:false)-8     1.27s ± 0%     0.34s ± 0%  -72.87%
MergeJoinExec/merge_join_(outerRows:3000,_innerRows:300000,_concurency:4,_outerJoinKeyIdx:_[0_1],_innerJoinKeyIdx:_[0_1],_NeedOuterSort:false)-8       338ms ± 0%     172ms ± 0%  -48.96%
MergeJoinExec/merge_join_(outerRows:30,_innerRows:300000,_concurency:4,_outerJoinKeyIdx:_[0_1],_innerJoinKeyIdx:_[0_1],_NeedOuterSort:false)-8         330ms ± 0%      43ms ± 0%  -86.96%
MergeJoinExec/merge_join_(outerRows:300000,_innerRows:330000,_concurency:4,_outerJoinKeyIdx:_[0_1],_innerJoinKeyIdx:_[0_1],_NeedOuterSort:false)-8     958ms ± 0%     309ms ± 0%  -67.73%

name                                                                                                                                                old alloc/op   new alloc/op   delta
MergeJoinExec/merge_join_(outerRows:300000,_innerRows:300000,_concurency:4,_outerJoinKeyIdx:_[0_1],_innerJoinKeyIdx:_[0_1],_NeedOuterSort:false)-8    72.2MB ± 0%    58.5MB ± 0%  -18.98%
MergeJoinExec/merge_join_(outerRows:3000,_innerRows:300000,_concurency:4,_outerJoinKeyIdx:_[0_1],_innerJoinKeyIdx:_[0_1],_NeedOuterSort:false)-8      65.9MB ± 0%    58.5MB ± 0%  -11.25%
MergeJoinExec/merge_join_(outerRows:30,_innerRows:300000,_concurency:4,_outerJoinKeyIdx:_[0_1],_innerJoinKeyIdx:_[0_1],_NeedOuterSort:false)-8        64.5MB ± 0%    62.6MB ± 0%   -2.94%
MergeJoinExec/merge_join_(outerRows:300000,_innerRows:330000,_concurency:4,_outerJoinKeyIdx:_[0_1],_innerJoinKeyIdx:_[0_1],_NeedOuterSort:false)-8    72.1MB ± 0%    58.5MB ± 0%  -18.97%

name                                                                                                                                                old allocs/op  new allocs/op  delta
MergeJoinExec/merge_join_(outerRows:300000,_innerRows:300000,_concurency:4,_outerJoinKeyIdx:_[0_1],_innerJoinKeyIdx:_[0_1],_NeedOuterSort:false)-8      908k ± 0%        4k ± 0%  -99.59%
MergeJoinExec/merge_join_(outerRows:3000,_innerRows:300000,_concurency:4,_outerJoinKeyIdx:_[0_1],_innerJoinKeyIdx:_[0_1],_NeedOuterSort:false)-8       37.8k ± 0%      3.8k ± 0%  -90.08%
MergeJoinExec/merge_join_(outerRows:30,_innerRows:300000,_concurency:4,_outerJoinKeyIdx:_[0_1],_innerJoinKeyIdx:_[0_1],_NeedOuterSort:false)-8         8.72k ± 0%     3.71k ± 0%  -57.45%
MergeJoinExec/merge_join_(outerRows:300000,_innerRows:330000,_concurency:4,_outerJoinKeyIdx:_[0_1],_innerJoinKeyIdx:_[0_1],_NeedOuterSort:false)-8      908k ± 0%        4k ± 0%  -99.58%

@sre-bot
Copy link
Contributor

sre-bot commented Jan 12, 2020

Thanks for your contribution. If your PR get merged, you will be rewarded 3000 points.

@sre-bot sre-bot added the contribution This PR is from a community contributor. label Jan 12, 2020
@zz-jason zz-jason requested a review from qw4990 January 14, 2020 01:09
@zz-jason zz-jason added the sig/execution SIG execution label Jan 14, 2020
@qw4990
Copy link
Contributor

qw4990 commented Jan 17, 2020

/run-unit-test

executor/merge_join.go Outdated Show resolved Hide resolved
@qw4990
Copy link
Contributor

qw4990 commented Jan 19, 2020

Hi @catror , I add a new test for the vectorized merge join in this PR #14519, please merge it into your PR to check this PR's correctness at your convenience.
And also please fix this CI problem in the check_dev_2 test:

[2020-01-17T09:51:51.002Z] FAIL: physical_plan_test.go:784: testPlanSuite.TestAggToCopHint
[2020-01-17T09:51:51.002Z] 
[2020-01-17T09:51:51.002Z] physical_plan_test.go:824:
[2020-01-17T09:51:51.002Z]     c.Assert(planString, Equals, output[i].Best, comment)
[2020-01-17T09:51:51.002Z] ... obtained string = "TableReader(Table(t))"
[2020-01-17T09:51:51.002Z] ... expected string = "TableReader(Table(t)->HashAgg)->HashAgg->HashAgg"
[2020-01-17T09:51:51.002Z] ... case:2 sql:select /*+ AGG_TO_COP(), HASH_AGG(), USE_INDEX(t) */ distinct a from t group by a

@catror catror force-pushed the vectorized-merge-join branch 2 times, most recently from 183e593 to a371c4f Compare January 19, 2020 08:26
@bb7133 bb7133 added this to the v4.0.0-beta.1 milestone Jan 19, 2020
@zz-jason zz-jason requested a review from qw4990 January 20, 2020 06:59
executor/merge_join.go Outdated Show resolved Hide resolved
executor/merge_join.go Outdated Show resolved Hide resolved
executor/merge_join.go Outdated Show resolved Hide resolved
@qw4990
Copy link
Contributor

qw4990 commented Feb 3, 2020

@catror Friendly ping, please resolve conflicts.

@catror
Copy link
Contributor Author

catror commented Feb 3, 2020

@catror Friendly ping, please resolve conflicts.

Done.

@qw4990
Copy link
Contributor

qw4990 commented Feb 3, 2020

LGTM, NICE WORK, Thanks for your contribution.

@qw4990 qw4990 added the status/LGT1 Indicates that a PR has LGTM 1. label Feb 3, 2020
Copy link
Contributor

@wshwsh12 wshwsh12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wshwsh12 wshwsh12 added status/LGT2 Indicates that a PR has LGTM 2. status/can-merge Indicates a PR has been approved by a committer. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Feb 4, 2020
@sre-bot
Copy link
Contributor

sre-bot commented Feb 4, 2020

/run-all-tests

@sre-bot sre-bot merged commit cfe0c20 into pingcap:master Feb 4, 2020
@sre-bot
Copy link
Contributor

sre-bot commented Feb 4, 2020

@catror complete task #14216 and get 3000 score, currerent score 3500

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contribution This PR is from a community contributor. sig/execution SIG execution status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants