Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TiDB does not support loose index scan #14460

Open
wwar opened this issue Jan 12, 2020 · 2 comments
Open

TiDB does not support loose index scan #14460

wwar opened this issue Jan 12, 2020 · 2 comments
Labels
feature/accepted This feature request is accepted by product managers priority/P2 The issue has P2 priority. sig/planner SIG: Planner type/feature-request Categorizes issue or PR as related to a new feature. type/performance

Comments

@wwar
Copy link

wwar commented Jan 12, 2020

Feature Request

Is your feature request related to a problem? Please describe:

I have been looking into edge cases where TiDB could perform significantly worse than MySQL, which causes regressions for applications migrating. Consider the following test case:

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a INT NOT NULL primary key auto_increment, b INT NOT NULL, INDEX (b));

INSERT INTO t1 SELECT NULL, -1 FROM dual;
INSERT INTO t1 SELECT NULL, -1 FROM t1;
INSERT INTO t1 SELECT NULL, -1 FROM t1;
INSERT INTO t1 SELECT NULL, -1 FROM t1;
INSERT INTO t1 SELECT NULL, -1 FROM t1;
INSERT INTO t1 SELECT NULL, -1 FROM t1;
INSERT INTO t1 SELECT NULL, -1 FROM t1;
INSERT INTO t1 SELECT NULL, -1 FROM t1;
INSERT INTO t1 SELECT NULL, -1 FROM t1;
INSERT INTO t1 SELECT NULL, -1 FROM t1;
INSERT INTO t1 SELECT NULL, -1 FROM t1;
INSERT INTO t1 SELECT NULL, -1 FROM t1;
INSERT INTO t1 SELECT NULL, -1 FROM t1;
INSERT INTO t1 SELECT NULL, -1 FROM t1;
INSERT INTO t1 SELECT NULL, -1 FROM t1;
INSERT INTO t1 SELECT NULL, -1 FROM t1;
INSERT INTO t1 SELECT NULL, -1 FROM t1;
INSERT INTO t1 SELECT NULL, -1 FROM t1;
INSERT INTO t1 SELECT NULL, -1 FROM t1;
INSERT INTO t1 SELECT NULL, -1 FROM t1;

UPDATE t1 SET b = FLOOR(a/100000) WHERE b = -1;

ANALYZE TABLE t1;
EXPLAIN ANALYZE SELECT DISTINCT b FROM t1;
EXPLAIN ANALYZE SELECT b, count(b) FROM t1 GROUP BY b;

Describe the feature you'd like:

In MySQL the select distinct is considerably faster. Using MySQL 8.0.18 with EXPLAIN ANALYZE:

mysql8018> EXPLAIN ANALYZE SELECT DISTINCT b FROM t1;
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                    |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Group (computed in earlier step, no aggregates)  (actual time=22.514..22.565 rows=8 loops=1)
    -> Index range scan on t1 using index_for_group_by(b)  (cost=4.40 rows=8) (actual time=22.510..22.559 rows=8 loops=1)
 |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.02 sec)

mysql8018> EXPLAIN ANALYZE SELECT b, count(b) FROM t1 GROUP BY b;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                       |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Group aggregate: count(t1.b)  (actual time=51.404..278.753 rows=8 loops=1)
    -> Index scan on t1 using b  (cost=52604.45 rows=523722) (actual time=16.118..175.800 rows=524288 loops=1)
 |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.28 sec)

select distinct b from t1;
+---+
| b |
+---+
| 0 |
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
+---+
8 rows in set (0.00 sec)

The optimization loose index scan is applying here. Because the key b is low cardinality, it scans it in a way that repositions the pointer after each row is read to find rows > that value. Thus, only 8 rows needed to be read from the table.

Executing the two queries in TiDB shows that there is very little difference between them:

tidb> EXPLAIN ANALYZE SELECT DISTINCT b FROM t1;
+------------------------+----------+-----------+----------------------------------------------------------+-----------------------------------------------------------------------------------------+------------+------+
| id                     | count    | task      | operator info                                            | execution info                                                                          | memory     | disk |
+------------------------+----------+-----------+----------------------------------------------------------+-----------------------------------------------------------------------------------------+------------+------+
| HashAgg_20             | 6.00     | root      | group by:test.t1.b, funcs:firstrow(test.t1.b)->test.t1.b | time:417.295577ms, loops:4, rows:6, PartialConcurrency:4, FinalConcurrency:4            | 2.90625 KB | N/A  |
| └─TableReader_21       | 6.00     | root      | data:HashAgg_22                                          | time:417.119384ms, loops:2, rows:6, rpc num: 1, rpc time:416.916371ms, proc keys:524288 | 990 Bytes  | N/A  |
|   └─HashAgg_22         | 6.00     | cop[tikv] | group by:test.t1.b,                                      | time:416ms, loops:513, rows:6                                                           | N/A        | N/A  |
|     └─TableScan_14     | 10000.00 | cop[tikv] | table:t1, range:[-inf,+inf], keep order:false            | time:412ms, loops:513, rows:524288                                                      | N/A        | N/A  |
+------------------------+----------+-----------+----------------------------------------------------------+-----------------------------------------------------------------------------------------+------------+------+
4 rows in set (0.42 sec)

tidb> EXPLAIN ANALYZE SELECT b, count(b) FROM t1 GROUP BY b;
+--------------------------+----------+-----------+-------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+----------------+------+
| id                       | count    | task      | operator info                                                                             | execution info                                                                          | memory         | disk |
+--------------------------+----------+-----------+-------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+----------------+------+
| Projection_12            | 6.00     | root      | test.t1.b, Column#3                                                                       | time:424.257051ms, loops:4, rows:6, Concurrency:OFF                                     | 744 Bytes      | N/A  |
| └─HashAgg_21             | 6.00     | root      | group by:test.t1.b, funcs:count(Column#4)->Column#3, funcs:firstrow(test.t1.b)->test.t1.b | time:424.248875ms, loops:4, rows:6, PartialConcurrency:4, FinalConcurrency:4            | 5.8125 KB      | N/A  |
|   └─TableReader_22       | 6.00     | root      | data:HashAgg_23                                                                           | time:424.122037ms, loops:2, rows:6, rpc num: 1, rpc time:423.889357ms, proc keys:524288 | 1.021484375 KB | N/A  |
|     └─HashAgg_23         | 6.00     | cop[tikv] | group by:test.t1.b, funcs:count(test.t1.b)->Column#4                                      | time:425ms, loops:513, rows:6                                                           | N/A            | N/A  |
|       └─TableScan_15     | 10000.00 | cop[tikv] | table:t1, range:[-inf,+inf], keep order:false                                             | time:409ms, loops:513, rows:524288                                                      | N/A            | N/A  |
+--------------------------+----------+-----------+-------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+----------------+------+
5 rows in set (0.44 sec)

Describe alternatives you've considered:

There was one previous mention of loose index scan in #5896

I do not have a specific use-case in mind yet, which may have alternative fixes to loose-index-scan. This was discovered in generic research into potential regressions.

Teachability, Documentation, Adoption, Migration Strategy:

I think it would help to document particular cases where TiDB may be much slower, this being one of them. With work being done on index merge.. there should not be very many of them.

@wwar wwar added the type/enhancement The issue or PR belongs to an enhancement. label Jan 12, 2020
@XuHuaiyu
Copy link
Contributor

I do not have a specific use-case in mind yet, which may have alternative fixes to loose-index-scan.

For similar queries mentioned in the description, TiDB can use the hint use index(b) to let it read data from index b. If this table has many columns, we may slightly benefit from this hint.

I think it would help to document particular cases where TiDB may be much slower, this being one of them.

Thanks for your advice here, we will consider documenting some particular cases.

@zz-jason zz-jason added sig/planner SIG: Planner type/new-feature and removed type/enhancement The issue or PR belongs to an enhancement. labels Mar 9, 2020
@zz-jason zz-jason added the type/feature-request Categorizes issue or PR as related to a new feature. label Jun 1, 2020
@zz-jason
Copy link
Member

zz-jason commented Jun 2, 2020

description

From the MySQL document about Loose Index Scan, there are some queries that can benefit from the Loose Index Scan. The document also lists the requirements and examples about queries that can utilize Loose Index Scan.

To support the Loose Index Scan on TiDB, we may need to modify:

  • TiKV Coprocessor: support skipping scanning the same prefix keys.
  • TiDB SQL Optimizer: regard the Loose Index Scan as another access path, calculate the cost for it.

IMO, it's not hard to implement.

value

But considered that we already have TiFlash to speed up OLAP queries. Maybe the use case for this optimization is a little narrow.

@scsldb scsldb added the priority/P2 The issue has P2 priority. label Jun 15, 2020
@zz-jason zz-jason added the feature/accepted This feature request is accepted by product managers label Jul 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature/accepted This feature request is accepted by product managers priority/P2 The issue has P2 priority. sig/planner SIG: Planner type/feature-request Categorizes issue or PR as related to a new feature. type/performance
Projects
None yet
Development

No branches or pull requests

4 participants