Support Clustered Index #4841

hanfei1991 · 2017-10-20T05:25:37Z

Description

Currently, TiDB arranges table's row data by handle id, for example we have a table whose schema is

CRATE TABLE t (
  uid varchar(64),
  s int(10),
  data varchar(255),
  primary key (uid, s),
)

There will be two key-value pairs for each row data:

handle id -> (uid, s, data)
(uid, s) -> handle id

The handle id is a uint64 generated by TiDB. There are some disadvantages for this type of data arrangement:

we need to write two key-value pairs for each row data
when reading the data by primary key, TiDB will trigger two read operations, one is to fetch the related handle id by primary key, and the other is to fetch the data by the handle id. This will increase the latency of SELECT statements.

In the clustered index, we arrange row data by primary key, in the above example, there will be only one key-value pair for each row data, this will the disadvantages mentioned above:

(uid, s) -> (data)

Task list

P0: TiDB support clustered index
P0: TiFlash support clusered index
TiFlash is column-oriented storage, it only stores specified tables' row data, used to speedup AP queries on these tables. After TiDB supports the clustered-index feature, TiFlash also should recognize it.

Progress Tracking: https://github.com/pingcap/tidb/projects/45

Value

The clustered index can speed up some queries. Its usage can refer to:

Workload Estimation

600

Time

GanttStart: 2020-06-01
GanttDue: 2020-10-30
GanttProgress: 100%

Weekly Report

https://docs.google.com/document/d/1zeCni1g-BfGsi7B_qWSbSJGXFiWrDaUsqic14gXygk8/edit?usp=sharing

The text was updated successfully, but these errors were encountered:

morgo · 2019-03-22T19:38:47Z

I am trying to understand this feature correctly. Is it a change so that the handle_id may be some other key or column, other than the primary key?

gertvdijk · 2019-07-02T14:31:51Z

Clustering your data determines how the data is laid out on disk. Several databases use different terminology to describe the same thing. Let me explain it by example.

Suppose you have a schema with columns object_id (integer, incrementing & unique), user_id (integer), data (binary/varchar or whatever) and you're running MySQL with InnoDB, the following options have a huge impact on your query performance:

PRIMARY KEY: object_id:
Data on disk is ordered by this column. Results:
- will be fast to SELECT points or ranges of object_id
- INSERTs are very fast too, as they are effectively appended to the data files
- SELECT queries for specific user_id values are super slow and require heavy random I/O even with secondary indexes
PRIMARY KEY: user_id, object_id:
Data on disk is clustered by user_id, then by object_id. Results:
- will be slow to SELECT points or ranges of object_id
- INSERTs are very slow too, as they most likely spread out by user_id resulting in random writes
- SELECTs for specific user_id values are super fast and require only a few seeks to get thousands of rows.

You can have secondary indexes to help you know where to look for data on disk, but if you want 1000 rows for a single user_id value, and your secondary index points you to 990 different pages on disk, you still see poor performance (unless you can afford yourself to have all your data on Optane SSDs). If you would choose to cluster your data by user_id, you can understand all user data is mostly clustered together and SELECT queries on specific user_ids are much faster, but then the INSERTs are very expensive because they happen with a huge spread on user_ids - ie. heavy write amplification (for almost every row a whole page needs to be rewritten, scattered across the volume). For some workloads like non-critical INSERT-time (queuing externally) and a huge user_id spread this may be the difference in several orders of magnitude for query time on SELECTs for specific users_ids.

In MySQL with InnoDB, the PRIMARY KEY means it is the clustering key, and there's only one clustering key. In MySQL/derivatives with TokuDB as storage engine you can have multiple clustering keys. That would make TokuDB effectively duplicate the table for you in different orders for each additional clustering key. Both TokuDB and InnoDB do a little bit of trade-off to not hurt INSERT performance too much. Consolidation of data before it's actually written to the data files, not fully filled pages and a bit of fragmentation is the result, but in practice it's still a huge benefit in performance for the SELECTs for specific user_ids.

PostgreSQL has even another approach: it does not order data on disk in a specific order. It just appends new INSERTS at the end of the data file, until you tell it to cluster it (e.g. CLUSTER command or pg_repack to do that online). This means that clustering basically runs out of band and implies a full table rebuild. (And yes, this is also true when you mark an index and clustering index.)

Cassandra works similarly to MySQL with regard to the user/schema with a primary key, but under the hood more like PostgreSQL with later compaction in the background to cluster data efficiently (but that's in fact quite complex, and this sentence is not very accurate once you know the details).

To get back to TiDB case; it would be very much beneficial to be able to control how data is laid out on disk and to be able to make a trade-off in INSERT performance to gain certain SELECT performance with it.

jackysp · 2020-06-24T09:01:20Z

https://github.com/pingcap/tidb/projects/45

Workload Estimation

600

zanmato1984 · 2020-07-15T15:59:59Z

Need to add a sub-task for TiFlash Clustered Index.

hanfei1991 added the priority/P2 The issue has P2 priority. label Oct 20, 2017

hanfei1991 added this to the 1.1 milestone Oct 20, 2017

hanfei1991 assigned breezewish Oct 20, 2017

hanfei1991 assigned XuHuaiyu and unassigned breezewish Nov 16, 2017

ngaut removed this from the 2.0 milestone May 29, 2018

jackysp added this to the v5.0-alpha milestone Jun 24, 2020

jackysp added priority/P0 The issue has P0 priority. and removed priority/P0 The issue has P0 priority. labels Jun 24, 2020

jackysp added priority/P0 The issue has P0 priority. type/feature-request Categorizes issue or PR as related to a new feature. and removed priority/P2 The issue has P2 priority. labels Jun 24, 2020

scsldb mentioned this issue Jun 30, 2020

Reduce the absolute latency of TP statements #18006

Open

2 tasks

cfzjywxk assigned coocood and lysu Jul 3, 2020

zz-jason unassigned XuHuaiyu Jul 14, 2020

zz-jason changed the title ~~Support clustered index~~ Support Clustered Index Jul 14, 2020

zz-jason added the feature/accepted This feature request is accepted by product managers label Jul 29, 2020

zz-jason mentioned this issue Aug 24, 2020

support interleaved tables #19404

Open

5kbpers mentioned this issue Oct 15, 2020

Support compatibility check with TiDB and TiKV pingcap/tiflow#997

Closed

scsldb modified the milestones: v5.0.0-alpha, v5.0.0-rc Dec 14, 2020

jebter modified the milestones: v5.0.0-rc, v5.0.0 Jan 18, 2021

HunDunDM mentioned this issue Feb 1, 2021

keyviz: support obtaining table info with clustered index from tidb pingcap/tidb-dashboard#845

Merged

lysu closed this as completed Jul 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Clustered Index #4841

Support Clustered Index #4841

hanfei1991 commented Oct 20, 2017 •

edited by scsldb

Loading

morgo commented Mar 22, 2019

gertvdijk commented Jul 2, 2019

jackysp commented Jun 24, 2020 •

edited by scsldb

Loading

zanmato1984 commented Jul 15, 2020

Support Clustered Index #4841

Support Clustered Index #4841

Comments

hanfei1991 commented Oct 20, 2017 • edited by scsldb Loading

Description

Task list

Category

Value

Workload Estimation

Time

Weekly Report

morgo commented Mar 22, 2019

gertvdijk commented Jul 2, 2019

jackysp commented Jun 24, 2020 • edited by scsldb Loading

Workload Estimation

zanmato1984 commented Jul 15, 2020

hanfei1991 commented Oct 20, 2017 •

edited by scsldb

Loading

jackysp commented Jun 24, 2020 •

edited by scsldb

Loading