Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

planner: enhance the rule max_min_eliminate to support multiple agg #12083

Merged
merged 20 commits into from
Sep 12, 2019

Conversation

francis0407
Copy link
Member

What problem does this PR solve?

Fix #12040.
This PR enhances the max_min_elimination rule to support multiple max/min elimination in some circumstance.
Before this PR,

mysql> show create table t;
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t     | CREATE TABLE `t` (
  `a` int(11) DEFAULT NULL,
  `b` int(11) NOT NULL,
  PRIMARY KEY (`b`),
  KEY `idx_a` (`a`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

mysql> explain select max(a) - min(a) from t where a > 100;
+------------------------+-------+------+--------------------------------------------------------------------+
| id                     | count | task | operator info                                                      |
+------------------------+-------+------+--------------------------------------------------------------------+
| Projection_5           | 1.00  | root | minus(3_col_0, 3_col_1)                                            |
| └─StreamAgg_10         | 1.00  | root | funcs:max(test.t.a), min(test.t.a)                                 |
|   └─IndexReader_15     | 1.00  | root | index:IndexScan_14                                                 |
|     └─IndexScan_14     | 1.00  | cop  | table:t, index:a, range:(100,+inf], keep order:false, stats:pseudo |
+------------------------+-------+------+--------------------------------------------------------------------+
4 rows in set (0.00 sec)

After this PR:

mysql> explain select max(a) - min(a) from t where a > 100;
+--------------------------------+-------+------+-------------------------------------------------------------------------+
| id                             | count | task | operator info                                                           |
+--------------------------------+-------+------+-------------------------------------------------------------------------+
| Projection_18                  | 1.00  | root | minus(3_col_0, 3_col_1)                                                 |
| └─HashLeftJoin_19              | 1.00  | root | CARTESIAN inner join, inner:StreamAgg_46                                |
|   ├─StreamAgg_25               | 1.00  | root | funcs:max(test.t.a)                                                     |
|   │ └─Limit_29                 | 0.33  | root | offset:0, count:1                                                       |
|   │   └─IndexReader_39         | 1.00  | root | index:Limit_38                                                          |
|   │     └─Limit_38             | 1.00  | cop  | offset:0, count:1                                                       |
|   │       └─IndexScan_37       | 1.00  | cop  | table:t, index:a, range:(100,+inf], keep order:true, desc, stats:pseudo |
|   └─StreamAgg_46               | 1.00  | root | funcs:min(test.t.a)                                                     |
|     └─Limit_50                 | 0.33  | root | offset:0, count:1                                                       |
|       └─IndexReader_57         | 1.00  | root | index:Limit_56                                                          |
|         └─Limit_56             | 1.00  | cop  | offset:0, count:1                                                       |
|           └─IndexScan_55       | 1.00  | cop  | table:t, index:a, range:(100,+inf], keep order:true, stats:pseudo       |
+--------------------------------+-------+------+-------------------------------------------------------------------------+
12 rows in set (0.00 sec)

What is changed and how it works?

When there is only a single max/min, we still use the previous method. But when we get multiple max/min aggregations, we should firstly check:

  1. The argument of max/min must be a single column (reject max(a+b) or max(a+1))
  2. All of the aggregated columns must have index.
  3. Only support Agg->Sel->DS or Agg->DS patter.
  4. All of the filter condition must be able to be pushed down to the scan range. (reject a * 3 + 10 > 100)

If all of the conditions above are satisfied, we can rewrite such query by a cartesianJoin with several independent aggs. For example:

select max(a) - min(a) from t where a < 10

->

select max_a - min_a
from 
  (select max(a) as max_a from t where a < 10),
  (select min(a) as min_a from t where a < 10)

Each single agg will use the previous method to rewrite as Sort + Limit.

Check List

Tests

  • Unit test

Side effects

  • Increased code complexity

@francis0407 francis0407 added type/enhancement The issue or PR belongs to an enhancement. sig/planner SIG: Planner labels Sep 9, 2019
@codecov
Copy link

codecov bot commented Sep 9, 2019

Codecov Report

Merging #12083 into master will not change coverage.
The diff coverage is n/a.

@@             Coverage Diff             @@
##             master     #12083   +/-   ##
===========================================
  Coverage   81.3234%   81.3234%           
===========================================
  Files           453        453           
  Lines         97288      97288           
===========================================
  Hits          79118      79118           
  Misses        12500      12500           
  Partials       5670       5670

planner/core/rule_max_min_eliminate.go Outdated Show resolved Hide resolved
planner/core/rule_max_min_eliminate.go Outdated Show resolved Hide resolved
Copy link
Contributor

@alivxxx alivxxx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@francis0407
Copy link
Member Author

Simply move accesspath.idxCols initialization to buildDataSource causes some unexpected problems. I'm working on this.

@winoros
Copy link
Member

winoros commented Sep 10, 2019

@francis0407 You don't need move the codes about idxCols. idxCols should be initialized after columns are pruned. fullIdxCols can be initialized at the beginning when the path is initialized.

@francis0407
Copy link
Member Author

/rebuild

1 similar comment
@francis0407
Copy link
Member Author

/rebuild

planner/core/rule_max_min_eliminate.go Show resolved Hide resolved
planner/core/rule_max_min_eliminate.go Outdated Show resolved Hide resolved
planner/core/rule_max_min_eliminate.go Outdated Show resolved Hide resolved
planner/core/rule_max_min_eliminate.go Outdated Show resolved Hide resolved
planner/core/rule_max_min_eliminate.go Outdated Show resolved Hide resolved
planner/core/rule_max_min_eliminate.go Outdated Show resolved Hide resolved
planner/core/rule_max_min_eliminate.go Outdated Show resolved Hide resolved
planner/core/rule_max_min_eliminate.go Outdated Show resolved Hide resolved
planner/core/rule_max_min_eliminate.go Outdated Show resolved Hide resolved
@francis0407
Copy link
Member Author

A new problem occurred.

mysql> create table ta (a int, b int, c int, d int, index idx_a(a), index idx_b(b), index idx_c(c), index idx_d(d));
Query OK, 0 rows affected (0.01 sec)

mysql> explain select max(a), max(b) from ta;
+--------------------------------+-------+------+---------------------------------------------------------------------------+
| id                             | count | task | operator info                                                             |
+--------------------------------+-------+------+---------------------------------------------------------------------------+
| HashLeftJoin_16                | 1.00  | root | CARTESIAN inner join, inner:StreamAgg_57                                  |
| ├─StreamAgg_22                 | 1.00  | root | funcs:max(test.ta.a)                                                      |
| │ └─Limit_27                   | 1.00  | root | offset:0, count:1                                                         |
| │   └─Projection_47            | 1.00  | root | test.ta.a, test.ta.b                                                      |
| │     └─IndexLookUp_46         | 1.00  | root |                                                                           |
| │       ├─IndexScan_43         | 1.00  | cop  | table:ta, index:a, range:[NULL,+inf], keep order:true, desc, stats:pseudo |
| │       └─Selection_45         | 1.00  | cop  | not(isnull(test.ta.b))                                                    |
| │         └─TableScan_44       | 1.00  | cop  | table:ta, keep order:false, stats:pseudo                                  |
| └─StreamAgg_57                 | 1.00  | root | funcs:max(test.ta.b)                                                      |
|   └─Limit_62                   | 1.00  | root | offset:0, count:1                                                         |
|     └─Projection_72            | 1.00  | root | test.ta.a, test.ta.b                                                      |
|       └─IndexLookUp_71         | 1.00  | root |                                                                           |
|         ├─Limit_70             | 1.00  | cop  | offset:0, count:1                                                         |
|         │ └─IndexScan_68       | 1.00  | cop  | table:ta, index:b, range:[-inf,+inf], keep order:true, desc, stats:pseudo |
|         └─TableScan_69         | 1.00  | cop  | table:ta, keep order:false, stats:pseudo                                  |
+--------------------------------+-------+------+---------------------------------------------------------------------------+
15 rows in set (0.00 sec)

We have to clone a new Columns for DataSource and do Column Prune for it. I'm working on this.

@francis0407
Copy link
Member Author

/run-common-tests

@francis0407
Copy link
Member Author

/run-all-tests

@francis0407
Copy link
Member Author

The issue has been fixed.

mysql> explain select min(a), min(b) from ta;
+------------------------------+-------+------+---------------------------------------------------------------------+
| id                           | count | task | operator info                                                       |
+------------------------------+-------+------+---------------------------------------------------------------------+
| HashLeftJoin_18              | 1.00  | root | CARTESIAN inner join, inner:StreamAgg_45                            |
| ├─StreamAgg_24               | 1.00  | root | funcs:min(test.ta.a)                                                |
| │ └─Limit_28                 | 1.00  | root | offset:0, count:1                                                   |
| │   └─IndexReader_38         | 1.00  | root | index:Limit_37                                                      |
| │     └─Limit_37             | 1.00  | cop  | offset:0, count:1                                                   |
| │       └─IndexScan_36       | 1.00  | cop  | table:ta, index:a, range:[-inf,+inf], keep order:true, stats:pseudo |
| └─StreamAgg_45               | 1.00  | root | funcs:min(test.ta.b)                                                |
|   └─Limit_49                 | 1.00  | root | offset:0, count:1                                                   |
|     └─IndexReader_59         | 1.00  | root | index:Limit_58                                                      |
|       └─Limit_58             | 1.00  | cop  | offset:0, count:1                                                   |
|         └─IndexScan_57       | 1.00  | cop  | table:ta, index:b, range:[-inf,+inf], keep order:true, stats:pseudo |
+------------------------------+-------+------+---------------------------------------------------------------------+
11 rows in set (0.01 sec)

The cause is the shallow copy of DataSource. For readonly fields, we can use a shallow copy. But for fields that will be rewritten, we must use a deep copy.

@francis0407 francis0407 added status/all tests passed and removed status/LGT2 Indicates that a PR has LGTM 2. labels Sep 12, 2019
Copy link
Member

@zz-jason zz-jason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zz-jason zz-jason added status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. labels Sep 12, 2019
@sre-bot
Copy link
Contributor

sre-bot commented Sep 12, 2019

Your auto merge job has been accepted, waiting for 12110, 12164

@sre-bot
Copy link
Contributor

sre-bot commented Sep 12, 2019

/run-all-tests

@sre-bot sre-bot merged commit 2c65985 into pingcap:master Sep 12, 2019
@francis0407 francis0407 deleted the enhance_max_min_eli branch September 12, 2019 05:52
lonng pushed a commit that referenced this pull request Jan 9, 2020
…12083) (#14410)

* planner: enhance the rule max_min_eliminate to support multiple agg (#12083)

Co-authored-by: pingcap-github-bot <sre-bot@pingcap.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/planner SIG: Planner status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

query both max and min very slow
6 participants