Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the performance of NestedLoopApplyExec #12939

Open
zz-jason opened this issue Oct 25, 2019 · 8 comments
Open

Improve the performance of NestedLoopApplyExec #12939

zz-jason opened this issue Oct 25, 2019 · 8 comments
Assignees
Labels
sig/execution SIG execution type/enhancement The issue or PR belongs to an enhancement.

Comments

@zz-jason
Copy link
Member

zz-jason commented Oct 25, 2019

Description

NestedLoopApplyExec is used to calculate the unnested subqueries. Taking this SQL as an example:

create table t1(a bigint, b bigint);
create table t2(a bigint, b bigint);
-- insert some records into table t1 and t2
select * from t1 where t1.b > (select min(b) from t2 where t1.a > t2.a);

The above select statement is converted to an execution plan which contains an Apply executor:

TiDB(root@127.0.0.1:test) > explain select * from t1 where t1.b > (select min(b) from t2 where t1.a > t2.a);
+----------------------------------+----------+-----------+-----------------------------------------------------------------------------+
| id                               | count    | task      | operator info                                                               |
+----------------------------------+----------+-----------+-----------------------------------------------------------------------------+
| Projection_15                    | 9990.00  | root      | Column#1, Column#2                                                          |
| └─Apply_17                       | 9990.00  | root      | CARTESIAN inner join, inner:Selection_21, other cond:gt(Column#2, Column#7) |
|   ├─TableReader_20               | 9990.00  | root      | data:Selection_19                                                           |
|   │ └─Selection_19               | 9990.00  | cop[tikv] | not(isnull(Column#2))                                                       |
|   │   └─TableScan_18             | 10000.00 | cop[tikv] | table:t1, range:[-inf,+inf], keep order:false, stats:pseudo                 |
|   └─Selection_21                 | 0.80     | root      | not(isnull(Column#7))                                                       |
|     └─StreamAgg_26               | 1.00     | root      | funcs:min(Column#5)                                                         |
|       └─TopN_27                  | 1.00     | root      | Column#5:asc, offset:0, count:1                                             |
|         └─TableReader_36         | 1.00     | root      | data:TopN_35                                                                |
|           └─TopN_35              | 1.00     | cop[tikv] | Column#5:asc, offset:0, count:1                                             |
|             └─Selection_34       | 7992.00  | cop[tikv] | gt(Column#1, Column#4), not(isnull(Column#5))                               |
|               └─TableScan_33     | 10000.00 | cop[tikv] | table:t2, range:[-inf,+inf], keep order:false, stats:pseudo                 |
+----------------------------------+----------+-----------+-----------------------------------------------------------------------------+
12 rows in set (0.00 sec)

The Apply_17 operator in the above execution plan is executed by the NestedLoopApplyExec physical operator. Your goal is to optimize this physical implementation to improve the overall sql execution, reduce at least 100% execution time.

Goals:

  • Optimize the performance of NestedLoopApplyExec

Non-Goals:

  • Improve the planner to unnest this subquery to avoid choosing NestedLoopApplyExec

Difficulty

  • Medium

Score

  • 3000

Mentor(s)

Recommended Skills

  • SQL Optimization
  • Golang Profiling
@zz-jason zz-jason added type/enhancement The issue or PR belongs to an enhancement. sig/execution SIG execution difficulty/medium labels Oct 25, 2019
@sre-bot sre-bot added the PCP-S1 label Nov 4, 2019
@miyuri-fox
Copy link

/pick-up-challenge

@sre-bot
Copy link
Contributor

sre-bot commented Nov 4, 2019

@miyuri-fox don't have enough score, pick up failed
Progress 0/400
You may pick up some easy issues first.

@TennyZhuang
Copy link
Contributor

reduce at least 100% execution time.

You mean no-op?

@ekalinin
Copy link
Contributor

/pick-up-challenge

@sre-bot
Copy link
Contributor

sre-bot commented Dec 14, 2019

@ekalinin pick up issue success

@ekalinin
Copy link
Contributor

ekalinin commented Jan 3, 2020

/give-up-challenge

@sre-bot
Copy link
Contributor

sre-bot commented Jan 3, 2020

@ekalinin give up issue success

@you06 you06 changed the title PCP-1: Improve the performance of NestedLoopApplyExec Improve the performance of NestedLoopApplyExec Feb 29, 2020
@qw4990
Copy link
Contributor

qw4990 commented Jun 8, 2020

image
This SQL can be converted to a HashJoin in CRDB.
Maybe we can learn something from this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/execution SIG execution type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

No branches or pull requests

8 participants