ordering optimization: adopt functional dependency into the current optimizer #14580

zz-jason · 2020-01-29T12:25:08Z

Description

Is your feature request related to a problem? Please describe:

Let's say we have this table:

drop table if exists t;
create table t(a bigint, b bigint, index idx_a(a));

Suppose the best plan for the following query is to use index idx_a, which avoids a sorting:

TiDB(root@127.0.0.1:test) > desc select * from t use index(idx_a) where a = b order by a;
+------------------------+----------+-----------+--------------------------------------------------------------------+
| id                     | count    | task      | operator info                                                      |
+------------------------+----------+-----------+--------------------------------------------------------------------+
| Projection_17          | 8000.00  | root      | test.t.a, test.t.b                                                 |
| └─IndexLookUp_16       | 8000.00  | root      |                                                                    |
|   ├─IndexScan_13       | 10000.00 | cop[tikv] | table:t, index:a, range:[NULL,+inf], keep order:true, stats:pseudo |
|   └─Selection_15       | 8000.00  | cop[tikv] | eq(test.t.a, test.t.b)                                             |
|     └─TableScan_14     | 10000.00 | cop[tikv] | table:t, keep order:false, stats:pseudo                            |
+------------------------+----------+-----------+--------------------------------------------------------------------+
5 rows in set (0.00 sec)

If we change the query's order by column from a to b, even if we force to use the index idx_a, the execution plan still needs a sorting:

TiDB(root@127.0.0.1:test) > desc select * from t use index(idx_a) where a = b order by b;
+-----------------------+----------+-----------+---------------------------------------------------------------------+
| id                    | count    | task      | operator info                                                       |
+-----------------------+----------+-----------+---------------------------------------------------------------------+
| Sort_5                | 8000.00  | root      | test.t.b:asc                                                        |
| └─IndexLookUp_11      | 8000.00  | root      |                                                                     |
|   ├─IndexScan_8       | 10000.00 | cop[tikv] | table:t, index:a, range:[NULL,+inf], keep order:false, stats:pseudo |
|   └─Selection_10      | 8000.00  | cop[tikv] | eq(test.t.a, test.t.b)                                              |
|     └─TableScan_9     | 10000.00 | cop[tikv] | table:t, keep order:false, stats:pseudo                             |
+-----------------------+----------+-----------+---------------------------------------------------------------------+
5 rows in set (0.00 sec)

This is because we can not recognize the functional dependency a -> b and b -> a, or the equivalence class {a, b} after the filter a = b.

ideally, the best execution plan for the two queries should be the same:

select * from t use index(idx_a) where a = b order by a;
select * from t use index(idx_a) where a = b order by b;

Describe the feature you'd like:

Step 1: After a selection with predicate a = b has been applied, we can infer that the tuple stream is also ordered on the attribute sequence (a, b) and (b, a) or even on the single attribute b. We call the first ordering on (a) the physical ordering of the tuple stream. The inferred orderings are called logical orderings. As we have seen, algebraic operators (e.g. select operators) are able to change the set of logical orderings implied by a given physical ordering. Order optimization now involves not only bookkeeping of orderings, but also order inference. Simmen et al. introduced a powerful and flexible framework for order inference. See Fundamental Techniques for Order Optimization for details.
Step 2: Use the method described in An Efficient Framework for Order Optimization by Thomas Neumann to reduce the optimization time and resource consumption.

Document Collection

Proposal doc: To be added.
Weekly report: To be added.

Talent Challenge Program information

Mentor of this issue: @winoros
Recommended skills: Relational Algebra, Graph theory, Golang
Estimated Workloads: 2 Man month i.e. 2XL

Milestones and action items

Milestone 1: Clear the design proposal

Give the proposal

Milestone 2: (Maybe) The graph structure prototype

To be decided.

Milestone 3: (Maybe) Integrate the graph structure to the planner.

To be decided.

Milestone 4: (Maybe) Show the improvements.

To be decided.

The text was updated successfully, but these errors were encountered:

winoros · 2020-02-03T10:50:38Z

There's also another one describing a way to maintain the functional dependency. https://cs.uwaterloo.ca/research/tr/2000/11/CS-2000-11.thesis.pdf

zz-jason added type/enhancement The issue or PR belongs to an enhancement. sig/planner SIG: Planner difficulty/hard labels Jan 29, 2020

tisonkun removed the difficulty/hard label Sep 1, 2021

xuyifangreeneyes mentioned this issue Nov 15, 2021

Refine only-full-group-by checker #29766

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ordering optimization: adopt functional dependency into the current optimizer #14580

ordering optimization: adopt functional dependency into the current optimizer #14580

zz-jason commented Jan 29, 2020 •

edited by winoros

Loading

winoros commented Feb 3, 2020

ordering optimization: adopt functional dependency into the current optimizer #14580

ordering optimization: adopt functional dependency into the current optimizer #14580

Comments

zz-jason commented Jan 29, 2020 • edited by winoros Loading

Description

Document Collection

Talent Challenge Program information

Milestones and action items

winoros commented Feb 3, 2020

zz-jason commented Jan 29, 2020 •

edited by winoros

Loading