TiDB does not support loose index scan #14460

wwar · 2020-01-12T22:23:43Z

Feature Request

Is your feature request related to a problem? Please describe:

I have been looking into edge cases where TiDB could perform significantly worse than MySQL, which causes regressions for applications migrating. Consider the following test case:

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a INT NOT NULL primary key auto_increment, b INT NOT NULL, INDEX (b));

INSERT INTO t1 SELECT NULL, -1 FROM dual;
INSERT INTO t1 SELECT NULL, -1 FROM t1;
INSERT INTO t1 SELECT NULL, -1 FROM t1;
INSERT INTO t1 SELECT NULL, -1 FROM t1;
INSERT INTO t1 SELECT NULL, -1 FROM t1;
INSERT INTO t1 SELECT NULL, -1 FROM t1;
INSERT INTO t1 SELECT NULL, -1 FROM t1;
INSERT INTO t1 SELECT NULL, -1 FROM t1;
INSERT INTO t1 SELECT NULL, -1 FROM t1;
INSERT INTO t1 SELECT NULL, -1 FROM t1;
INSERT INTO t1 SELECT NULL, -1 FROM t1;
INSERT INTO t1 SELECT NULL, -1 FROM t1;
INSERT INTO t1 SELECT NULL, -1 FROM t1;
INSERT INTO t1 SELECT NULL, -1 FROM t1;
INSERT INTO t1 SELECT NULL, -1 FROM t1;
INSERT INTO t1 SELECT NULL, -1 FROM t1;
INSERT INTO t1 SELECT NULL, -1 FROM t1;
INSERT INTO t1 SELECT NULL, -1 FROM t1;
INSERT INTO t1 SELECT NULL, -1 FROM t1;
INSERT INTO t1 SELECT NULL, -1 FROM t1;

UPDATE t1 SET b = FLOOR(a/100000) WHERE b = -1;

ANALYZE TABLE t1;
EXPLAIN ANALYZE SELECT DISTINCT b FROM t1;
EXPLAIN ANALYZE SELECT b, count(b) FROM t1 GROUP BY b;

Describe the feature you'd like:

In MySQL the select distinct is considerably faster. Using MySQL 8.0.18 with EXPLAIN ANALYZE:

mysql8018> EXPLAIN ANALYZE SELECT DISTINCT b FROM t1;
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                    |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Group (computed in earlier step, no aggregates)  (actual time=22.514..22.565 rows=8 loops=1)
    -> Index range scan on t1 using index_for_group_by(b)  (cost=4.40 rows=8) (actual time=22.510..22.559 rows=8 loops=1)
 |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.02 sec)

mysql8018> EXPLAIN ANALYZE SELECT b, count(b) FROM t1 GROUP BY b;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                       |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Group aggregate: count(t1.b)  (actual time=51.404..278.753 rows=8 loops=1)
    -> Index scan on t1 using b  (cost=52604.45 rows=523722) (actual time=16.118..175.800 rows=524288 loops=1)
 |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.28 sec)

select distinct b from t1;
+---+
| b |
+---+
| 0 |
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
+---+
8 rows in set (0.00 sec)

The optimization loose index scan is applying here. Because the key b is low cardinality, it scans it in a way that repositions the pointer after each row is read to find rows > that value. Thus, only 8 rows needed to be read from the table.

Executing the two queries in TiDB shows that there is very little difference between them:

tidb> EXPLAIN ANALYZE SELECT DISTINCT b FROM t1;
+------------------------+----------+-----------+----------------------------------------------------------+-----------------------------------------------------------------------------------------+------------+------+
| id                     | count    | task      | operator info                                            | execution info                                                                          | memory     | disk |
+------------------------+----------+-----------+----------------------------------------------------------+-----------------------------------------------------------------------------------------+------------+------+
| HashAgg_20             | 6.00     | root      | group by:test.t1.b, funcs:firstrow(test.t1.b)->test.t1.b | time:417.295577ms, loops:4, rows:6, PartialConcurrency:4, FinalConcurrency:4            | 2.90625 KB | N/A  |
| └─TableReader_21       | 6.00     | root      | data:HashAgg_22                                          | time:417.119384ms, loops:2, rows:6, rpc num: 1, rpc time:416.916371ms, proc keys:524288 | 990 Bytes  | N/A  |
|   └─HashAgg_22         | 6.00     | cop[tikv] | group by:test.t1.b,                                      | time:416ms, loops:513, rows:6                                                           | N/A        | N/A  |
|     └─TableScan_14     | 10000.00 | cop[tikv] | table:t1, range:[-inf,+inf], keep order:false            | time:412ms, loops:513, rows:524288                                                      | N/A        | N/A  |
+------------------------+----------+-----------+----------------------------------------------------------+-----------------------------------------------------------------------------------------+------------+------+
4 rows in set (0.42 sec)

tidb> EXPLAIN ANALYZE SELECT b, count(b) FROM t1 GROUP BY b;
+--------------------------+----------+-----------+-------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+----------------+------+
| id                       | count    | task      | operator info                                                                             | execution info                                                                          | memory         | disk |
+--------------------------+----------+-----------+-------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+----------------+------+
| Projection_12            | 6.00     | root      | test.t1.b, Column#3                                                                       | time:424.257051ms, loops:4, rows:6, Concurrency:OFF                                     | 744 Bytes      | N/A  |
| └─HashAgg_21             | 6.00     | root      | group by:test.t1.b, funcs:count(Column#4)->Column#3, funcs:firstrow(test.t1.b)->test.t1.b | time:424.248875ms, loops:4, rows:6, PartialConcurrency:4, FinalConcurrency:4            | 5.8125 KB      | N/A  |
|   └─TableReader_22       | 6.00     | root      | data:HashAgg_23                                                                           | time:424.122037ms, loops:2, rows:6, rpc num: 1, rpc time:423.889357ms, proc keys:524288 | 1.021484375 KB | N/A  |
|     └─HashAgg_23         | 6.00     | cop[tikv] | group by:test.t1.b, funcs:count(test.t1.b)->Column#4                                      | time:425ms, loops:513, rows:6                                                           | N/A            | N/A  |
|       └─TableScan_15     | 10000.00 | cop[tikv] | table:t1, range:[-inf,+inf], keep order:false                                             | time:409ms, loops:513, rows:524288                                                      | N/A            | N/A  |
+--------------------------+----------+-----------+-------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+----------------+------+
5 rows in set (0.44 sec)

Describe alternatives you've considered:

There was one previous mention of loose index scan in #5896

I do not have a specific use-case in mind yet, which may have alternative fixes to loose-index-scan. This was discovered in generic research into potential regressions.

Teachability, Documentation, Adoption, Migration Strategy:

I think it would help to document particular cases where TiDB may be much slower, this being one of them. With work being done on index merge.. there should not be very many of them.

The text was updated successfully, but these errors were encountered:

XuHuaiyu · 2020-01-14T06:40:39Z

I do not have a specific use-case in mind yet, which may have alternative fixes to loose-index-scan.

For similar queries mentioned in the description, TiDB can use the hint use index(b) to let it read data from index b. If this table has many columns, we may slightly benefit from this hint.

I think it would help to document particular cases where TiDB may be much slower, this being one of them.

Thanks for your advice here, we will consider documenting some particular cases.

zz-jason · 2020-06-02T09:40:15Z

description

From the MySQL document about Loose Index Scan, there are some queries that can benefit from the Loose Index Scan. The document also lists the requirements and examples about queries that can utilize Loose Index Scan.

To support the Loose Index Scan on TiDB, we may need to modify:

TiKV Coprocessor: support skipping scanning the same prefix keys.
TiDB SQL Optimizer: regard the Loose Index Scan as another access path, calculate the cost for it.

IMO, it's not hard to implement.

value

But considered that we already have TiFlash to speed up OLAP queries. Maybe the use case for this optimization is a little narrow.

wwar added the type/enhancement The issue or PR belongs to an enhancement. label Jan 12, 2020

wwar mentioned this issue Jan 19, 2020

EXPLAIN does not show which columns co-processor needs to decode #14528

Open

zz-jason added sig/planner SIG: Planner type/new-feature and removed type/enhancement The issue or PR belongs to an enhancement. labels Mar 9, 2020

zz-jason added the type/performance label Mar 9, 2020

zz-jason removed the type/new-feature label Apr 3, 2020

zz-jason added the type/feature-request Categorizes issue or PR as related to a new feature. label Jun 1, 2020

scsldb added the priority/P2 The issue has P2 priority. label Jun 15, 2020

winoros mentioned this issue Jul 20, 2020

Loose index scan can be used to speed up DISTINCT upon JOIN #18693

Open

zz-jason added the feature/accepted This feature request is accepted by product managers label Jul 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TiDB does not support loose index scan #14460

TiDB does not support loose index scan #14460

wwar commented Jan 12, 2020

XuHuaiyu commented Jan 14, 2020

zz-jason commented Jun 2, 2020 •

edited

Loading

TiDB does not support loose index scan #14460

TiDB does not support loose index scan #14460

Comments

wwar commented Jan 12, 2020

Feature Request

XuHuaiyu commented Jan 14, 2020

zz-jason commented Jun 2, 2020 • edited Loading

description

value

zz-jason commented Jun 2, 2020 •

edited

Loading