Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimizer lacks feature of OR expansion and predicate elimination #56005

Closed
mzhang77 opened this issue Sep 11, 2024 · 4 comments · Fixed by #56136
Closed

optimizer lacks feature of OR expansion and predicate elimination #56005

mzhang77 opened this issue Sep 11, 2024 · 4 comments · Fixed by #56136
Assignees
Labels
sig/planner SIG: Planner type/enhancement The issue or PR belongs to an enhancement.

Comments

@mzhang77
Copy link

mzhang77 commented Sep 11, 2024

Bug Report

1. Minimal reproduce step (Required)

CREATE TABLE `dt` (
  `a` bigint(20) unsigned NOT NULL,
  `pk` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `b` longblob DEFAULT NULL,
  `c` int(11) unsigned NOT NULL,
  PRIMARY KEY (`pk`)
);

CREATE TABLE `it` (
  `pk` bigint(20) unsigned NOT NULL,
  `a` varbinary(767) DEFAULT NULL,
  `b` varbinary(767) DEFAULT NULL,
  `c` varbinary(767) DEFAULT NULL,
  `d` bigint(20) DEFAULT NULL,
  `e` varbinary(767) DEFAULT NULL,
  `f` int(11) unsigned NOT NULL,
  PRIMARY KEY (`pk`) ,
  KEY `a` (`a`,`d`,`c`,`pk`),
  KEY `b` (`b`,`pk`),
  KEY `c` (`d`,`pk`),
  KEY `d` (`e`,`pk`),
  KEY `e` (`c`,`pk`),
  KEY `f` (`a`,`pk`)
);

explain SELECT
  dt.*
FROM
  it 
  LEFT JOIN dt ON it.pk = dt.pk
WHERE
  it.a = "a"
  AND (
    (
      it.a > "a"
    )
    OR (
      it.a = "a" AND it.pk > 1
    )
  )
ORDER BY
  it.pk
LIMIT
  240;

2. What did you expect to see? (Required)

There is an index KEY f (a,pk), and there is where predicate it.a = "a" AND it.pk > 1. We expect to see range:("a" 1,"a" +inf] in execution plan

3. What did you see instead (Required)

This is the full execution plan

+---------------------------------------+---------+-----------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id                                    | estRows | task      | access object            | operator info                                                                                                                                                                                                                                                                                      |
+---------------------------------------+---------+-----------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Projection_12                         | 4.19    | root      |                          | test.dt.a, test.dt.pk, test.dt.b, test.dt.c                                                                                                                                                                                                    |
| └─Limit_19                            | 4.19    | root      |                          | offset:0, count:240                                                                                                                                                                                                                                                                                |
|   └─IndexHashJoin_80                  | 4.19    | root      |                          | left outer join, inner:TableReader_75, outer key:test.it.pk, inner key:test.dt.pk, equal cond:eq(test.it.pk, test.dt.pk), other cond:or(gt(test.it.a, "a"), and(eq(test.it.a, "a"), gt(test.dt.pk, 1))) |
|     ├─Limit_36(Build)                 | 3.36    | root      |                          | offset:0, count:240                                                                                                                                                                                                                                                                                |
|     │ └─IndexReader_54                | 3.37    | root      |                          | index:Limit_53                                                                                                                                                                                                                                                                                     |
|     │   └─Limit_53                    | 3.37    | cop[tikv] |                          | offset:0, count:240                                                                                                                                                                                                                                                                                |
|     │     └─Selection_52              | 3.37    | cop[tikv] |                          | or(gt(test.it.a, "a"), and(eq(test.it.a, "a"), gt(test.it.pk, 1)))                                                                                                                                                                                          |
|     │       └─IndexRangeScan_51       | 10.00   | cop[tikv] | table:it, index:f(a, pk) | range:["a","a"], keep order:true, stats:pseudo                                                                                                                                                                                                                                                     |
|     └─TableReader_75(Probe)           | 3.36    | root      |                          | data:TableRangeScan_74                                                                                                                                                                                                                                                                             |
|       └─TableRangeScan_74             | 3.36    | cop[tikv] | table:dt                 | range: decided by [test.it.pk], keep order:false, stats:pseudo                                                                                                                                                                                                                        |
+---------------------------------------+---------+-----------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

We see range:["a","a"]. This is causing unnecessary additional data access and poor performance.

4. What is your TiDB version? (Required)

mysql> select @@version;
+--------------------+
| @@version          |
+--------------------+
| 8.0.11-TiDB-v7.5.1 |
+--------------------+
1 row in set (0.00 sec)
@mzhang77 mzhang77 added the type/bug The issue is confirmed as a bug. label Sep 11, 2024
@mzhang77
Copy link
Author

mzhang77 commented Sep 11, 2024

To process the above query. Optimizer need to first do an OR expansion:

explain SELECT * FROM
 ( SELECT
  dt.*
FROM
  it 
  LEFT JOIN dt ON it.pk = dt.pk
WHERE
  it.a = "a" AND it.a > "a"
UNION
SELECT
  dt.*
FROM
  it 
  LEFT JOIN dt ON it.pk = dt.pk
WHERE it.a = "a"
      AND it.a = "a"
      AND it.pk > 1
) tb
ORDER BY
  tb.pk
LIMIT
  240;

Then optimizer should eliminate the first branch of the OR expansion, because it.a = "a" AND it.a > "a" is always false. So this should be the final SQL and plan:

explain SELECT * FROM
 ( 
SELECT
  dt.*
FROM
  it 
  LEFT JOIN dt ON it.pk = dt.pk
WHERE it.a = "a"
      AND it.a = "a"
      AND it.pk > 1
) tb
ORDER BY
  tb.pk
LIMIT
  240;

+-------------------------------+---------+-----------+--------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id                            | estRows | task      | access object            | operator info                                                                                                                                                                |
+-------------------------------+---------+-----------+--------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| TopN_14                       | 41.67   | root      |                          | test.dt.pk, offset:0, count:240                                                                                                                                 |
| └─IndexJoin_27                | 41.67   | root      |                          | left outer join, inner:TableReader_23, outer key:test.it.pk, inner key:test.dt.pk, equal cond:eq(test.it.pk, test.dt.pk) |
|   ├─IndexReader_39(Build)     | 33.33   | root      |                          | index:IndexRangeScan_38                                                                                                                                                      |
|   │ └─IndexRangeScan_38       | 33.33   | cop[tikv] | table:it, index:f(a, pk) | range:("a" 1,"a" +inf], keep order:false, stats:pseudo                                                                                                                       |
|   └─TableReader_23(Probe)     | 11.11   | root      |                          | data:Selection_22                                                                                                                                                            |
|     └─Selection_22            | 11.11   | cop[tikv] |                          | gt(test.dt.pk, 1)                                                                                                                                               |
|       └─TableRangeScan_21     | 33.33   | cop[tikv] | table:dt                 | range: decided by [test.it.pk], keep order:false, stats:pseudo                                                                                                  |
+-------------------------------+---------+-----------+--------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

@hawkingrei hawkingrei self-assigned this Sep 11, 2024
@hawkingrei hawkingrei added type/enhancement The issue or PR belongs to an enhancement. sig/planner SIG: Planner type/bug The issue is confirmed as a bug. and removed type/bug The issue is confirmed as a bug. type/enhancement The issue or PR belongs to an enhancement. labels Sep 11, 2024
@ghazalfamilyusa
Copy link
Contributor

We have separate logic of predicate simplification and index range derivation. I think it can be enhancement for either one and we need to see what is the best way to do it.

@time-and-fate time-and-fate added type/enhancement The issue or PR belongs to an enhancement. and removed type/bug The issue is confirmed as a bug. labels Sep 12, 2024
@ghazalfamilyusa
Copy link
Contributor

The current logic actually does such simplification but the issue is the different type/collation between it.a (binary) and the constant "a" which is utf8mb4_0900_ai_ci. A workaround is to replace the constants with binary constants like b'......'.
Now, in general and regardless of type/collation it.a = "a" and it.a > "a" is false and that can be done as an enhancement which is not straightforward. See if the workaround works cc @terry1purcell , @mzhang77

@mzhang77
Copy link
Author

Yes it works

mysql> explain SELECT
    ->   dt.*
    -> FROM
    ->   it 
    ->   LEFT JOIN dt ON it.pk = dt.pk
    -> WHERE
    ->   it.a = 0xaa
    ->   AND (
    ->     (
    ->       it.a > 0xaa
    ->     )
    ->     OR (
    ->       it.a = 0xaa AND it.pk > 1
    ->     )
    ->   )
    -> ORDER BY
    ->   it.pk
    -> LIMIT
    ->   240;
+-------------------------------------+---------+-----------+--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id                                  | estRows | task      | access object            | operator info                                                                                                                                                                                                 |
+-------------------------------------+---------+-----------+--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Projection_12                       | 41.67   | root      |                          | test.dt.a, test.dt.pk, test.dt.b, test.dt.c                                                                                                                                                                   |
| └─Limit_19                          | 41.67   | root      |                          | offset:0, count:240                                                                                                                                                                                           |
|   └─IndexHashJoin_71                | 41.67   | root      |                          | left outer join, inner:TableReader_66, outer key:test.it.pk, inner key:test.dt.pk, equal cond:eq(test.it.pk, test.dt.pk), other cond:or(gt(test.it.a, "0xaa"), and(eq(test.it.a, "0xaa"), gt(test.dt.pk, 1))) |
|     ├─Limit_36(Build)               | 33.33   | root      |                          | offset:0, count:240                                                                                                                                                                                           |
|     │ └─IndexReader_46              | 33.33   | root      |                          | index:Limit_45                                                                                                                                                                                                |
|     │   └─Limit_45                  | 33.33   | cop[tikv] |                          | offset:0, count:240                                                                                                                                                                                           |
|     │     └─IndexRangeScan_44       | 33.33   | cop[tikv] | table:it, index:f(a, pk) | range:("\xaa" 1,"\xaa" +inf], keep order:true, stats:pseudo                                                                                                                                                   |
|     └─TableReader_66(Probe)         | 33.33   | root      |                          | data:TableRangeScan_65                                                                                                                                                                                        |
|       └─TableRangeScan_65           | 33.33   | cop[tikv] | table:dt                 | range: decided by [test.it.pk], keep order:false, stats:pseudo                                                                                                                                                |
+-------------------------------------+---------+-----------+--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
9 rows in set (0.00 sec)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/planner SIG: Planner type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants