Skip to content

Latest commit

 

History

History
2182 lines (1655 loc) · 90.8 KB

partitioned-table.md

File metadata and controls

2182 lines (1655 loc) · 90.8 KB
title summary aliases
Partitioning
Learn how to use partitioning in TiDB.
/docs/dev/partitioned-table/
/docs/dev/reference/sql/partitioning/

Partitioning

This document introduces TiDB's implementation of partitioning.

Partitioning types

This section introduces the types of partitioning in TiDB. Currently, TiDB supports Range partitioning, Range COLUMNS partitioning, List partitioning, List COLUMNS partitioning, Hash partitioning, and Key partitioning.

  • Range partitioning, Range COLUMNS partitioning, List partitioning, and List COLUMNS partitioning are used to resolve the performance issues caused by a large number of deletions in the application, and support dropping partitions quickly.
  • Hash partitioning and Key partitioning are used to distribute data in scenarios with a large number of writes. Compared with Hash partitioning, Key partitioning supports distributing data of multiple columns and partitioning by non-integer columns.

Range partitioning

When a table is partitioned by Range, each partition contains rows for which the partitioning expression value lies within a given Range. Ranges have to be contiguous but not overlapping. You can define it by using VALUES LESS THAN.

Assume you need to create a table that contains personnel records as follows:

{{< copyable "sql" >}}

CREATE TABLE employees (
    id INT NOT NULL,
    fname VARCHAR(30),
    lname VARCHAR(30),
    hired DATE NOT NULL DEFAULT '1970-01-01',
    separated DATE DEFAULT '9999-12-31',
    job_code INT,
    store_id INT NOT NULL
);

You can partition a table by Range in various ways as needed. For example, you can partition it by using the store_id column:

{{< copyable "sql" >}}

CREATE TABLE employees (
    id INT NOT NULL,
    fname VARCHAR(30),
    lname VARCHAR(30),
    hired DATE NOT NULL DEFAULT '1970-01-01',
    separated DATE DEFAULT '9999-12-31',
    job_code INT,
    store_id INT NOT NULL
)

PARTITION BY RANGE (store_id) (
    PARTITION p0 VALUES LESS THAN (6),
    PARTITION p1 VALUES LESS THAN (11),
    PARTITION p2 VALUES LESS THAN (16),
    PARTITION p3 VALUES LESS THAN (21)
);

In this partition scheme, all rows corresponding to employees whose store_id is 1 through 5 are stored in the p0 partition while all employees whose store_id is 6 through 10 are stored in p1. Range partitioning requires the partitions to be ordered, from lowest to highest.

If you insert a row of data (72, 'Tom', 'John', '2015-06-25', NULL, NULL, 15), it falls in the p2 partition. But if you insert a record whose store_id is larger than 20, an error is reported because TiDB cannot know which partition this record should be inserted into. In this case, you can use MAXVALUE when creating a table:

{{< copyable "sql" >}}

CREATE TABLE employees (
    id INT NOT NULL,
    fname VARCHAR(30),
    lname VARCHAR(30),
    hired DATE NOT NULL DEFAULT '1970-01-01',
    separated DATE DEFAULT '9999-12-31',
    job_code INT,
    store_id INT NOT NULL
)

PARTITION BY RANGE (store_id) (
    PARTITION p0 VALUES LESS THAN (6),
    PARTITION p1 VALUES LESS THAN (11),
    PARTITION p2 VALUES LESS THAN (16),
    PARTITION p3 VALUES LESS THAN MAXVALUE
);

MAXVALUE represents an integer value that is larger than all other integer values. Now, all records whose store_id is equal to or larger than 16 (the highest value defined) are stored in the p3 partition.

You can also partition a table by employees' job codes, which are the values of the job_code column. Assume that two-digit job codes stand for regular employees, three-digit codes stand for office and customer support personnel, and four-digit codes stand for managerial personnel. Then you can create a partitioned table like this:

{{< copyable "sql" >}}

CREATE TABLE employees (
    id INT NOT NULL,
    fname VARCHAR(30),
    lname VARCHAR(30),
    hired DATE NOT NULL DEFAULT '1970-01-01',
    separated DATE DEFAULT '9999-12-31',
    job_code INT,
    store_id INT NOT NULL
)

PARTITION BY RANGE (job_code) (
    PARTITION p0 VALUES LESS THAN (100),
    PARTITION p1 VALUES LESS THAN (1000),
    PARTITION p2 VALUES LESS THAN (10000)
);

In this example, all rows relating to regular employees are stored in the p0 partition, all office and customer support personnel in the p1 partition, and all managerial personnel in the p2 partition.

Besides splitting up the table by store_id, you can also partition a table by dates. For example, you can partition by employees' separation year:

{{< copyable "sql" >}}

CREATE TABLE employees (
    id INT NOT NULL,
    fname VARCHAR(30),
    lname VARCHAR(30),
    hired DATE NOT NULL DEFAULT '1970-01-01',
    separated DATE DEFAULT '9999-12-31',
    job_code INT,
    store_id INT
)

PARTITION BY RANGE ( YEAR(separated) ) (
    PARTITION p0 VALUES LESS THAN (1991),
    PARTITION p1 VALUES LESS THAN (1996),
    PARTITION p2 VALUES LESS THAN (2001),
    PARTITION p3 VALUES LESS THAN MAXVALUE
);

In Range partitioning, you can partition based on the values of the timestamp column and use the unix_timestamp() function, for example:

{{< copyable "sql" >}}

CREATE TABLE quarterly_report_status (
    report_id INT NOT NULL,
    report_status VARCHAR(20) NOT NULL,
    report_updated TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
)

PARTITION BY RANGE ( UNIX_TIMESTAMP(report_updated) ) (
    PARTITION p0 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-01-01 00:00:00') ),
    PARTITION p1 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-04-01 00:00:00') ),
    PARTITION p2 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-07-01 00:00:00') ),
    PARTITION p3 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-10-01 00:00:00') ),
    PARTITION p4 VALUES LESS THAN ( UNIX_TIMESTAMP('2009-01-01 00:00:00') ),
    PARTITION p5 VALUES LESS THAN ( UNIX_TIMESTAMP('2009-04-01 00:00:00') ),
    PARTITION p6 VALUES LESS THAN ( UNIX_TIMESTAMP('2009-07-01 00:00:00') ),
    PARTITION p7 VALUES LESS THAN ( UNIX_TIMESTAMP('2009-10-01 00:00:00') ),
    PARTITION p8 VALUES LESS THAN ( UNIX_TIMESTAMP('2010-01-01 00:00:00') ),
    PARTITION p9 VALUES LESS THAN (MAXVALUE)
);

It is not allowed to use any other partitioning expression that contains the timestamp column.

Range partitioning is particularly useful when one or more of the following conditions are satisfied:

  • You want to delete the old data. If you use the employees table in the previous example, you can delete all records of employees who left this company before the year 1991 by simply using ALTER TABLE employees DROP PARTITION p0;. It is faster than executing the DELETE FROM employees WHERE YEAR(separated) <= 1990; operation.
  • You want to use a column that contains time or date values, or containing values arising from some other series.
  • You need to frequently run queries on the columns used for partitioning. For example, when executing a query like EXPLAIN SELECT COUNT(*) FROM employees WHERE separated BETWEEN '2000-01-01' AND '2000-12-31' GROUP BY store_id;, TiDB can quickly know that only the data in the p2 partition needs to be scanned, because the other partitions do not match the WHERE condition.

Range COLUMNS partitioning

Range COLUMNS partitioning is a variant of Range partitioning. You can use one or more columns as partitioning keys. The data types of partition columns can be integer, string (CHAR or VARCHAR), DATE, and DATETIME. Any expressions, such as non-COLUMNS partitioning, are not supported.

Like Range partitioning, Range COLUMNS partitioning also requires the partition ranges to be strictly increasing. The partition definition in the following example is not supported:

CREATE TABLE t(
    a int,
    b datetime,
    c varchar(8)
) PARTITION BY RANGE COLUMNS(`c`,`b`)
(PARTITION `p20240520A` VALUES LESS THAN ('A','2024-05-20 00:00:00'),
 PARTITION `p20240520Z` VALUES LESS THAN ('Z','2024-05-20 00:00:00'),
 PARTITION `p20240521A` VALUES LESS THAN ('A','2024-05-21 00:00:00'));
Error 1493 (HY000): VALUES LESS THAN value must be strictly increasing for each partition

Suppose that you want to partition by name, and drop old and invalid data, then you can create a table as follows:

CREATE TABLE t (
  valid_until datetime,
  name varchar(255) CHARACTER SET ascii,
  notes text
)
PARTITION BY RANGE COLUMNS(name, valid_until)
(PARTITION `p2022-g` VALUES LESS THAN ('G','2023-01-01 00:00:00'),
 PARTITION `p2023-g` VALUES LESS THAN ('G','2024-01-01 00:00:00'),
 PARTITION `p2022-m` VALUES LESS THAN ('M','2023-01-01 00:00:00'),
 PARTITION `p2023-m` VALUES LESS THAN ('M','2024-01-01 00:00:00'),
 PARTITION `p2022-s` VALUES LESS THAN ('S','2023-01-01 00:00:00'),
 PARTITION `p2023-s` VALUES LESS THAN ('S','2024-01-01 00:00:00'))

The preceding SQL statement will partition the data by year and by name in the ranges [ ('', ''), ('G', '2023-01-01 00:00:00') ), [ ('G', '2023-01-01 00:00:00'), ('G', '2024-01-01 00:00:00') ), [ ('G', '2024-01-01 00:00:00'), ('M', '2023-01-01 00:00:00') ), [ ('M', '2023-01-01 00:00:00'), ('M', '2024-01-01 00:00:00') ), [ ('M', '2024-01-01 00:00:00'), ('S', '2023-01-01 00:00:00') ), and [ ('S', '2023-01-01 00:00:00'), ('S', '2024-01-01 00:00:00') ). It allows you to easily drop invalid data while still benefit from partition pruning on both name and valid_until columns. In this example, [,) indicates a left-closed, right-open range. For example, [ ('G', '2023-01-01 00:00:00'), ('G', '2024-01-01 00:00:00') ) indicates a range of data whose name is 'G', the year contains 2023-01-01 00:00:00 and is greater than 2023-01-01 00:00:00 but less than 2024-01-01 00:00:00. It does not include (G, 2024-01-01 00:00:00).

Range INTERVAL partitioning

Range INTERVAL partitioning is an extension of Range partitioning, which allows you to create partitions of a specified interval easily. Starting from v6.3.0, INTERVAL partitioning is introduced in TiDB as syntactic sugar.

The syntax is as follows:

PARTITION BY RANGE [COLUMNS] (<partitioning expression>)
INTERVAL (<interval expression>)
FIRST PARTITION LESS THAN (<expression>)
LAST PARTITION LESS THAN (<expression>)
[NULL PARTITION]
[MAXVALUE PARTITION]

For example:

CREATE TABLE employees (
    id int unsigned NOT NULL,
    fname varchar(30),
    lname varchar(30),
    hired date NOT NULL DEFAULT '1970-01-01',
    separated date DEFAULT '9999-12-31',
    job_code int,
    store_id int NOT NULL
) PARTITION BY RANGE (id)
INTERVAL (100) FIRST PARTITION LESS THAN (100) LAST PARTITION LESS THAN (10000) MAXVALUE PARTITION

It creates the following table:

CREATE TABLE `employees` (
  `id` int unsigned NOT NULL,
  `fname` varchar(30) DEFAULT NULL,
  `lname` varchar(30) DEFAULT NULL,
  `hired` date NOT NULL DEFAULT '1970-01-01',
  `separated` date DEFAULT '9999-12-31',
  `job_code` int DEFAULT NULL,
  `store_id` int NOT NULL
)
PARTITION BY RANGE (`id`)
(PARTITION `P_LT_100` VALUES LESS THAN (100),
 PARTITION `P_LT_200` VALUES LESS THAN (200),
...
 PARTITION `P_LT_9900` VALUES LESS THAN (9900),
 PARTITION `P_LT_10000` VALUES LESS THAN (10000),
 PARTITION `P_MAXVALUE` VALUES LESS THAN (MAXVALUE))

Range INTERVAL partitioning also works with Range COLUMNS partitioning.

For example:

CREATE TABLE monthly_report_status (
    report_id int NOT NULL,
    report_status varchar(20) NOT NULL,
    report_date date NOT NULL
)
PARTITION BY RANGE COLUMNS (report_date)
INTERVAL (1 MONTH) FIRST PARTITION LESS THAN ('2000-01-01') LAST PARTITION LESS THAN ('2025-01-01')

It creates this table:

CREATE TABLE `monthly_report_status` (
  `report_id` int(11) NOT NULL,
  `report_status` varchar(20) NOT NULL,
  `report_date` date NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
PARTITION BY RANGE COLUMNS(`report_date`)
(PARTITION `P_LT_2000-01-01` VALUES LESS THAN ('2000-01-01'),
 PARTITION `P_LT_2000-02-01` VALUES LESS THAN ('2000-02-01'),
...
 PARTITION `P_LT_2024-11-01` VALUES LESS THAN ('2024-11-01'),
 PARTITION `P_LT_2024-12-01` VALUES LESS THAN ('2024-12-01'),
 PARTITION `P_LT_2025-01-01` VALUES LESS THAN ('2025-01-01'))

The optional parameter NULL PARTITION creates a partition with the definition as PARTITION P_NULL VALUES LESS THAN (<minimum value of the column type>), only matching when the partitioning expression evaluates to NULL. See Handling of NULL with Range partitioning, which explains that NULL is considered to be less than any other value.

The optional parameter MAXVALUE PARTITION creates the last partition as PARTITION P_MAXVALUE VALUES LESS THAN (MAXVALUE).

ALTER INTERVAL partitioned tables

INTERVAL partitioning also adds simpler syntaxes for adding and dropping partitions.

The following statement changes the first partition. It drops all partitions whose values are less than the given expression, and makes the matched partition the new first partition. It does not affect a NULL PARTITION.

ALTER TABLE table_name FIRST PARTITION LESS THAN (<expression>)

The following statement changes the last partition, meaning adding more partitions with higher ranges and room for new data. It will add new partitions with the current INTERVAL up to and including the given expression. It does not work if a MAXVALUE PARTITION exists, because it needs data reorganization.

ALTER TABLE table_name LAST PARTITION LESS THAN (<expression>)

INTERVAL partitioning details and limitations

  • The INTERVAL partitioning feature only involves the CREATE/ALTER TABLE syntax. There is no change in metadata, so tables created or altered with the new syntax are still MySQL-compatible.
  • There is no change in the output format of SHOW CREATE TABLE to keep MySQL compatibility.
  • The new ALTER syntax applies to existing tables conforming to INTERVAL. You do not need to create these tables with the INTERVAL syntax.
  • To use the INTERVAL syntax for RANGE COLUMNS partitioning, you can only specify a single column in the INTEGER, DATE, or DATETIME type as the partitioning key.

List partitioning

List partitioning is similar to Range partitioning. Unlike Range partitioning, in List partitioning, the partitioning expression values for all rows in each partition are in a given value set. This value set defined for each partition can have any number of values but cannot have duplicate values. You can use the PARTITION ... VALUES IN (...) clause to define a value set.

Suppose that you want to create a personnel record table. You can create a table as follows:

{{< copyable "sql" >}}

CREATE TABLE employees (
    id INT NOT NULL,
    hired DATE NOT NULL DEFAULT '1970-01-01',
    store_id INT
);

Suppose that there are 20 stores distributed in 4 districts, as shown in the table below:

| Region  | Store ID Numbers     |
| ------- | -------------------- |
| North   | 1, 2, 3, 4, 5        |
| East    | 6, 7, 8, 9, 10       |
| West    | 11, 12, 13, 14, 15   |
| Central | 16, 17, 18, 19, 20   |

If you want to store the personnel data of employees of the same region in the same partition, you can create a List partitioned table based on store_id:

{{< copyable "sql" >}}

CREATE TABLE employees (
    id INT NOT NULL,
    hired DATE NOT NULL DEFAULT '1970-01-01',
    store_id INT
)
PARTITION BY LIST (store_id) (
    PARTITION pNorth VALUES IN (1, 2, 3, 4, 5),
    PARTITION pEast VALUES IN (6, 7, 8, 9, 10),
    PARTITION pWest VALUES IN (11, 12, 13, 14, 15),
    PARTITION pCentral VALUES IN (16, 17, 18, 19, 20)
);

After creating the partitions as above, you can easily add or delete records related to a specific region in the table. For example, suppose that all stores in the East region (East) are sold to another company. Then all the row data related to the store employees of this region can be deleted by executing ALTER TABLE employees TRUNCATE PARTITION pEast, which is much more efficient than the equivalent statement DELETE FROM employees WHERE store_id IN (6, 7, 8, 9, 10).

You can also execute ALTER TABLE employees DROP PARTITION pEast to delete all related rows, but this statement also deletes the pEast partition from the table definition. In this situation, you must execute the ALTER TABLE ... ADD PARTITION statement to recover the original partitioning scheme of the table.

Default List partition

Starting from v7.3.0, you can add a default partition to a List or List COLUMNS partitioned table. The default partition acts as a fallback partition, where rows that do not match value sets of any partitions can be placed.

Note:

This feature is a TiDB extension to MySQL syntax. For a List or List COLUMNS partitioned table with a default partition, the data in the table cannot be directly replicated to MySQL.

Take the following List partitioned table as an example:

CREATE TABLE t (
  a INT,
  b INT
)
PARTITION BY LIST (a) (
  PARTITION p0 VALUES IN (1, 2, 3),
  PARTITION p1 VALUES IN (4, 5, 6)
);
Query OK, 0 rows affected (0.11 sec)

You can add a default list partition named pDef to the table as follows:

ALTER TABLE t ADD PARTITION (PARTITION pDef DEFAULT);

or

ALTER TABLE t ADD PARTITION (PARTITION pDef VALUES IN (DEFAULT));

In this way, newly inserted values that do not match value sets of any partitions can automatically go into the default partition.

INSERT INTO t VALUES (7, 7);
Query OK, 1 row affected (0.01 sec)

You can also add a default partition when creating a List or List COLUMNS partitioned table. For example:

CREATE TABLE employees (
    id INT NOT NULL,
    hired DATE NOT NULL DEFAULT '1970-01-01',
    store_id INT
)
PARTITION BY LIST (store_id) (
    PARTITION pNorth VALUES IN (1, 2, 3, 4, 5),
    PARTITION pEast VALUES IN (6, 7, 8, 9, 10),
    PARTITION pWest VALUES IN (11, 12, 13, 14, 15),
    PARTITION pCentral VALUES IN (16, 17, 18, 19, 20),
    PARTITION pDefault DEFAULT
);

For a List or List COLUMNS partitioned table without a default partition, the values to be inserted using an INSERT statement must match value sets defined in the PARTITION ... VALUES IN (...) clauses of the table. If the values to be inserted do not match value sets of any partitions, the statement will fail and an error is returned, as shown in the following example:

CREATE TABLE t (
  a INT,
  b INT
)
PARTITION BY LIST (a) (
  PARTITION p0 VALUES IN (1, 2, 3),
  PARTITION p1 VALUES IN (4, 5, 6)
);
Query OK, 0 rows affected (0.11 sec)

INSERT INTO t VALUES (7, 7);
ERROR 1525 (HY000): Table has no partition for value 7

To ignore the preceding error, you can add the IGNORE keyword to the INSERT statement. After this keyword is added, the INSERT statement will only insert rows that match the partition value sets and will not insert unmatched rows, without returning an error:

test> TRUNCATE t;
Query OK, 1 row affected (0.00 sec)

test> INSERT IGNORE INTO t VALUES (1, 1), (7, 7), (8, 8), (3, 3), (5, 5);
Query OK, 3 rows affected, 2 warnings (0.01 sec)
Records: 5  Duplicates: 2  Warnings: 2

test> select * from t;
+------+------+
| a    | b    |
+------+------+
|    5 |    5 |
|    1 |    1 |
|    3 |    3 |
+------+------+
3 rows in set (0.01 sec)

List COLUMNS partitioning

List COLUMNS partitioning is a variant of List partitioning. You can use multiple columns as partition keys. Besides the integer data type, you can also use the columns in the string, DATE, and DATETIME data types as partition columns.

Suppose that you want to divide the store employees from the following 12 cities into 4 regions, as shown in the following table:

| Region | Cities                         |
| :----- | ------------------------------ |
| 1      | LosAngeles,Seattle, Houston    |
| 2      | Chicago, Columbus, Boston      |
| 3      | NewYork, LongIsland, Baltimore |
| 4      | Atlanta, Raleigh, Cincinnati   |

You can use List COLUMNS partitioning to create a table and store each row in the partition that corresponds to the employee's city, as shown below:

{{< copyable "sql" >}}

CREATE TABLE employees_1 (
    id INT NOT NULL,
    fname VARCHAR(30),
    lname VARCHAR(30),
    hired DATE NOT NULL DEFAULT '1970-01-01',
    separated DATE DEFAULT '9999-12-31',
    job_code INT,
    store_id INT,
    city VARCHAR(15)
)
PARTITION BY LIST COLUMNS(city) (
    PARTITION pRegion_1 VALUES IN('LosAngeles', 'Seattle', 'Houston'),
    PARTITION pRegion_2 VALUES IN('Chicago', 'Columbus', 'Boston'),
    PARTITION pRegion_3 VALUES IN('NewYork', 'LongIsland', 'Baltimore'),
    PARTITION pRegion_4 VALUES IN('Atlanta', 'Raleigh', 'Cincinnati')
);

Unlike List partitioning, in List COLUMNS partitioning, you do not need to use the expression in the COLUMNS() clause to convert column values to integers.

List COLUMNS partitioning can also be implemented using columns of the DATE and DATETIME types, as shown in the following example. This example uses the same names and columns as the previous employees_1 table, but uses List COLUMNS partitioning based on the hired column:

{{< copyable "sql" >}}

CREATE TABLE employees_2 (
    id INT NOT NULL,
    fname VARCHAR(30),
    lname VARCHAR(30),
    hired DATE NOT NULL DEFAULT '1970-01-01',
    separated DATE DEFAULT '9999-12-31',
    job_code INT,
    store_id INT,
    city VARCHAR(15)
)
PARTITION BY LIST COLUMNS(hired) (
    PARTITION pWeek_1 VALUES IN('2020-02-01', '2020-02-02', '2020-02-03',
        '2020-02-04', '2020-02-05', '2020-02-06', '2020-02-07'),
    PARTITION pWeek_2 VALUES IN('2020-02-08', '2020-02-09', '2020-02-10',
        '2020-02-11', '2020-02-12', '2020-02-13', '2020-02-14'),
    PARTITION pWeek_3 VALUES IN('2020-02-15', '2020-02-16', '2020-02-17',
        '2020-02-18', '2020-02-19', '2020-02-20', '2020-02-21'),
    PARTITION pWeek_4 VALUES IN('2020-02-22', '2020-02-23', '2020-02-24',
        '2020-02-25', '2020-02-26', '2020-02-27', '2020-02-28')
);

In addition, you can also add multiple columns in the COLUMNS() clause. For example:

{{< copyable "sql" >}}

CREATE TABLE t (
    id int,
    name varchar(10)
)
PARTITION BY LIST COLUMNS(id,name) (
     partition p0 values IN ((1,'a'),(2,'b')),
     partition p1 values IN ((3,'c'),(4,'d')),
     partition p3 values IN ((5,'e'),(null,null))
);

Hash partitioning

Hash partitioning is used to make sure that data is evenly scattered into a certain number of partitions. With Range partitioning, you must specify the range of the column values for each partition when you use Range partitioning, while you just need to specify the number of partitions when you use Hash partitioning.

To create a Hash partitioned table, you need to append a PARTITION BY HASH (expr) clause to the CREATE TABLE statement. expr is an expression that returns an integer. It can be a column name if the type of this column is integer. In addition, you might also need to append PARTITIONS num, where num is a positive integer indicating how many partitions a table is divided into.

The following operation creates a Hash partitioned table, which is divided into 4 partitions by store_id:

{{< copyable "sql" >}}

CREATE TABLE employees (
    id INT NOT NULL,
    fname VARCHAR(30),
    lname VARCHAR(30),
    hired DATE NOT NULL DEFAULT '1970-01-01',
    separated DATE DEFAULT '9999-12-31',
    job_code INT,
    store_id INT
)

PARTITION BY HASH(store_id)
PARTITIONS 4;

If PARTITIONS num is not specified, the default number of partitions is 1.

You can also use an SQL expression that returns an integer for expr. For example, you can partition a table by the hire year:

{{< copyable "sql" >}}

CREATE TABLE employees (
    id INT NOT NULL,
    fname VARCHAR(30),
    lname VARCHAR(30),
    hired DATE NOT NULL DEFAULT '1970-01-01',
    separated DATE DEFAULT '9999-12-31',
    job_code INT,
    store_id INT
)

PARTITION BY HASH( YEAR(hired) )
PARTITIONS 4;

The most efficient Hash function is one which operates upon a single table column, and whose value increases or decreases consistently with the column value.

For example, date_col is a column whose type is DATE, and the value of the TO_DAYS(date_col) expression varies with the value of date_col. YEAR(date_col) is different from TO_DAYS(date_col), because not every possible change in date_col produces an equivalent change in YEAR(date_col).

In contrast, assume that you have an int_col column whose type is INT. Now consider about the expression POW(5-int_col,3) + 6. It is not a good Hash function though, because as the value of int_col changes, the result of the expression does not change proportionally. A value change in int_col might result in a huge change in the expression result. For example, when int_col changes from 5 to 6, the change of the expression result is -1. But the result change might be -7 when int_col changes from 6 to 7.

In conclusion, when the expression has a form that is closer to y = cx, it is more suitable to be a Hash function. Because the more non-linear an expression is, the more unevenly scattered the data among the partitions tends to be.

In theory, pruning is also possible for expressions involving more than one column value, but determining which of such expressions are suitable can be quite difficult and time-consuming. For this reason, the use of hashing expressions involving multiple columns is not particularly recommended.

When using PARTITION BY HASH, TiDB decides which partition the data should fall into based on the modulus of the result of the expression. In other words, if a partitioning expression is expr and the number of partitions is num, MOD(expr, num) decides the partition in which the data is stored. Assume that t1 is defined as follows:

{{< copyable "sql" >}}

CREATE TABLE t1 (col1 INT, col2 CHAR(5), col3 DATE)
    PARTITION BY HASH( YEAR(col3) )
    PARTITIONS 4;

When you insert a row of data into t1 and the value of col3 is '2005-09-15', then this row is inserted into partition 1:

MOD(YEAR('2005-09-01'),4)
=  MOD(2005,4)
=  1

Key partitioning

Starting from v7.0.0, TiDB supports Key partitioning. For TiDB versions earlier than v7.0.0, if you try creating a Key partitioned table, TiDB creates it as a non-partitioned table and returns a warning.

Both Key partitioning and Hash partitioning can evenly distribute data into a certain number of partitions. The difference is that Hash partitioning only supports distributing data based on a specified integer expression or an integer column, while Key partitioning supports distributing data based on a column list, and partitioning columns of Key partitioning are not limited to the integer type. The Hash algorithm of TiDB for Key partitioning is different from that of MySQL, so the table data distribution is also different.

To create a Key partitioned table, you need to append a PARTITION BY KEY (columnList) clause to the CREATE TABLE statement. columnList is a column list with one or more column names. The data type of each column in the list can be any type except BLOB, JSON, and GEOMETRY (Note that TiDB does not support GEOMETRY). In addition, you might also need to append PARTITIONS num (where num is a positive integer indicating how many partitions a table is divided into), or append the definition of the partition names. For example, adding (PARTITION p0, PARTITION p1) means dividing the table into two partitions named p0 and p1.

The following operation creates a Key partitioned table, which is divided into 4 partitions by store_id:

CREATE TABLE employees (
    id INT NOT NULL,
    fname VARCHAR(30),
    lname VARCHAR(30),
    hired DATE NOT NULL DEFAULT '1970-01-01',
    separated DATE DEFAULT '9999-12-31',
    job_code INT,
    store_id INT
)

PARTITION BY KEY(store_id)
PARTITIONS 4;

If PARTITIONS num is not specified, the default number of partitions is 1.

You can also create a Key partitioned table based on non-integer columns such as VARCHAR. For example, you can partition a table by the fname column:

CREATE TABLE employees (
    id INT NOT NULL,
    fname VARCHAR(30),
    lname VARCHAR(30),
    hired DATE NOT NULL DEFAULT '1970-01-01',
    separated DATE DEFAULT '9999-12-31',
    job_code INT,
    store_id INT
)

PARTITION BY KEY(fname)
PARTITIONS 4;

You can also create a Key partitioned table based on multiple columns. For example, you can divide a table into 4 partitions based on fname and store_id:

CREATE TABLE employees (
    id INT NOT NULL,
    fname VARCHAR(30),
    lname VARCHAR(30),
    hired DATE NOT NULL DEFAULT '1970-01-01',
    separated DATE DEFAULT '9999-12-31',
    job_code INT,
    store_id INT
)

PARTITION BY KEY(fname, store_id)
PARTITIONS 4;

Similar to MySQL, TiDB supports creating Key partitioned tables with an empty partition column list specified in PARTITION BY KEY. For example, the following statement creates a partitioned table using the primary key id as the partitioning key:

CREATE TABLE employees (
    id INT NOT NULL PRIMARY KEY,
    fname VARCHAR(30),
    lname VARCHAR(30),
    hired DATE NOT NULL DEFAULT '1970-01-01',
    separated DATE DEFAULT '9999-12-31',
    job_code INT,
    store_id INT
)

PARTITION BY KEY()
PARTITIONS 4;

If the table lacks a primary key but contains a unique key, the unique key is used as the partitioning key:

CREATE TABLE k1 (
    id INT NOT NULL,
    name VARCHAR(20),
    UNIQUE KEY (id)
)
PARTITION BY KEY()
PARTITIONS 2;

However, the previous statement will fail if the unique key column is not defined as NOT NULL.

How TiDB handles Linear Hash partitions

Before v6.4.0, if you execute DDL statements of MySQL Linear Hash partitions in TiDB, TiDB can only create non-partitioned tables. In this case, if you still want to use partitioned tables in TiDB, you need to modify the DDL statements.

Since v6.4.0, TiDB supports parsing the MySQL PARTITION BY LINEAR HASH syntax but ignores the LINEAR keyword in it. If you have some existing DDL and DML statements of MySQL Linear Hash partitions, you can execute them in TiDB without modification:

  • For a CREATE statement of MySQL Linear Hash partitions, TiDB will create a non-linear Hash partitioned table (note that there is no Linear Hash partitioned table in TiDB). If the number of partitions is a power of 2, the rows in the TiDB Hash partitioned table are distributed the same as that in the MySQL Linear Hash partitioned table. Otherwise, the distribution of these rows in TiDB is different from MySQL. This is because non-linear partitioned tables use a simple "modulus number of partition", while linear partitioned tables use "modulus next power of 2 and fold the values between the number of partitions and the next power of 2". For details, see #38450.

  • For all other statements of MySQL Linear Hash partitions, they work in TiDB the same as that in MySQL, except that the rows are distributed differently if the number of partitions is not a power of 2, which will give different results for partition selection, TRUNCATE PARTITION, and EXCHANGE PARTITION.

How TiDB handles Linear Key partitions

Starting from v7.0.0, TiDB supports parsing the MySQL PARTITION BY LINEAR KEY syntax for Key partitioning. However, TiDB ignores the LINEAR keyword and uses a non-linear hash algorithm instead.

Before v7.0.0, if you try creating a Key partitioned table, TiDB creates it as a non-partitioned table and returns a warning.

How TiDB partitioning handles NULL

It is allowed in TiDB to use NULL as the calculation result of a partitioning expression.

Note:

NULL is not an integer. TiDB's partitioning implementation treats NULL as being less than any other integer values, just as ORDER BY does.

Handling of NULL with Range partitioning

When you insert a row into a table partitioned by Range, and the column value used to determine the partition is NULL, then this row is inserted into the lowest partition.

{{< copyable "sql" >}}

CREATE TABLE t1 (
    c1 INT,
    c2 VARCHAR(20)
)

PARTITION BY RANGE(c1) (
    PARTITION p0 VALUES LESS THAN (0),
    PARTITION p1 VALUES LESS THAN (10),
    PARTITION p2 VALUES LESS THAN MAXVALUE
);
Query OK, 0 rows affected (0.09 sec)

{{< copyable "sql" >}}

select * from t1 partition(p0);
+------|--------+
| c1   | c2     |
+------|--------+
| NULL | mothra |
+------|--------+
1 row in set (0.00 sec)

{{< copyable "sql" >}}

select * from t1 partition(p1);
Empty set (0.00 sec)

{{< copyable "sql" >}}

select * from t1 partition(p2);
Empty set (0.00 sec)

Drop the p0 partition and verify the result:

{{< copyable "sql" >}}

alter table t1 drop partition p0;
Query OK, 0 rows affected (0.08 sec)

{{< copyable "sql" >}}

select * from t1;
Empty set (0.00 sec)

Handling of NULL with Hash partitioning

When partitioning tables by Hash, there is a different way of handling NULL value - if the calculation result of the partitioning expression is NULL, it is considered as 0.

{{< copyable "sql" >}}

CREATE TABLE th (
    c1 INT,
    c2 VARCHAR(20)
)

PARTITION BY HASH(c1)
PARTITIONS 2;
Query OK, 0 rows affected (0.00 sec)

{{< copyable "sql" >}}

INSERT INTO th VALUES (NULL, 'mothra'), (0, 'gigan');
Query OK, 2 rows affected (0.04 sec)

{{< copyable "sql" >}}

select * from th partition (p0);
+------|--------+
| c1   | c2     |
+------|--------+
| NULL | mothra |
|    0 | gigan  |
+------|--------+
2 rows in set (0.00 sec)

{{< copyable "sql" >}}

select * from th partition (p1);
Empty set (0.00 sec)

You can see that the inserted record (NULL, 'mothra') falls into the same partition as (0, 'gigan').

Note:

NULL values by Hash partitions in TiDB are handled in the same way as described in How MySQL Partitioning Handles NULL, which, however, is not consistent with the actual behavior of MySQL. In other words, MySQL's implementation in this case is not consistent with its documentation.

In this case, the actual behavior of TiDB is in line with the description of this document.

Handling of NULL with Key partitioning

For Key partitioning, the way of handling NULL value is consistent with that of Hash partitioning. If the value of a partitioning field is NULL, it is treated as 0.

Partition management

For RANGE, RANGE COLUMNS, LIST, and LIST COLUMNS partitioned tables, you can manage the partitions as follows:

  • Add partitions using the ALTER TABLE <table name> ADD PARTITION (<partition specification>) statement.
  • Drop partitions using the ALTER TABLE <table name> DROP PARTITION <list of partitions> statement.
  • Remove all data from specified partitions using the ALTER TABLE <table name> TRUNCATE PARTITION <list of partitions> statement. The logic of TRUNCATE PARTITION is similar to TRUNCATE TABLE but it is for partitions.
  • Merge, split, or make other changes to the partitions using the ALTER TABLE <table name> REORGANIZE PARTITION <list of partitions> INTO (<new partition definitions>) statement.

For HASH and KEY partitioned tables, you can manage the partitions as follows:

  • Decrease the number of partitions using the ALTER TABLE <table name> COALESCE PARTITION <number of partitions to decrease by> statement. This operation reorganizes the partitions by copying the whole table to the new number of partitions online.
  • Increase the number of partitions using the ALTER TABLE <table name> ADD PARTITION <number of partitions to increase by | (additional partition definitions)> statement. This operation reorganizes the partitions by copying the whole table to the new number of partitions online.
  • Remove all data from specified partitions using the ALTER TABLE <table name> TRUNCATE PARTITION <list of partitions> statement. The logic of TRUNCATE PARTITION is similar to TRUNCATE TABLE but it is for partitions.

EXCHANGE PARTITION works by swapping a partition and a non-partitioned table, similar to how renaming a table like RENAME TABLE t1 TO t1_tmp, t2 TO t1, t1_tmp TO t2 works.

For example, ALTER TABLE partitioned_table EXCHANGE PARTITION p1 WITH TABLE non_partitioned_table swaps the partitioned_table table p1 partition with the non_partitioned_table table.

Ensure that all rows that you are exchanging into the partition match the partition definition; otherwise, the statement will fail.

Note that TiDB has some specific features that might affect EXCHANGE PARTITION. When the table structure contains such features, you need to ensure that EXCHANGE PARTITION meets the MySQL's EXCHANGE PARTITION condition. Meanwhile, ensure that these specific features are defined the same for both partitioned and non-partitioned tables. These specific features include the following:

  • TiFlash: the numbers of TiFlash replicas are the same.
  • Clustered Indexes: partitioned and non-partitioned tables are both CLUSTERED, or both NONCLUSTERED.

In addition, there are limitations on the compatibility of EXCHANGE PARTITION with other components. Both partitioned and non-partitioned tables must have the same definition.

  • TiFlash: when the TiFlash replica definitions in partitioned and non-partitioned tables are different, the EXCHANGE PARTITION operation cannot be performed.
  • TiCDC: TiCDC replicates the EXCHANGE PARTITION operation when both partitioned and non-partitioned tables have primary keys or unique keys. Otherwise, TiCDC will not replicate the operation.
  • TiDB Lightning and BR: do not perform the EXCHANGE PARTITION operation during import using TiDB Lightning or during restore using BR.

Manage Range, Range COLUMNS, List, and List COLUMNS partitions

This section uses the partitioned tables created by the following SQL statements as examples to show you how to manage Range and List partitions.

CREATE TABLE members (
    id int,
    fname varchar(255),
    lname varchar(255),
    dob date,
    data json
)
PARTITION BY RANGE (YEAR(dob)) (
 PARTITION pBefore1950 VALUES LESS THAN (1950),
 PARTITION p1950 VALUES LESS THAN (1960),
 PARTITION p1960 VALUES LESS THAN (1970),
 PARTITION p1970 VALUES LESS THAN (1980),
 PARTITION p1980 VALUES LESS THAN (1990),
 PARTITION p1990 VALUES LESS THAN (2000));

CREATE TABLE member_level (
 id int,
 level int,
 achievements json
)
PARTITION BY LIST (level) (
 PARTITION l1 VALUES IN (1),
 PARTITION l2 VALUES IN (2),
 PARTITION l3 VALUES IN (3),
 PARTITION l4 VALUES IN (4),
 PARTITION l5 VALUES IN (5));

Drop partitions

ALTER TABLE members DROP PARTITION p1990;

ALTER TABLE member_level DROP PARTITION l5;

Truncate partitions

ALTER TABLE members TRUNCATE PARTITION p1980;

ALTER TABLE member_level TRUNCATE PARTITION l4;

Add partitions

ALTER TABLE members ADD PARTITION (PARTITION `p1990to2010` VALUES LESS THAN (2010));

ALTER TABLE member_level ADD PARTITION (PARTITION l5_6 VALUES IN (5,6));

For a Range partitioned table, ADD PARTITION will append new partitions after the last existing partition. Compared with the existing partitions, the value defined in VALUES LESS THAN for new partitions must be greater. Otherwise, an error is reported:

ALTER TABLE members ADD PARTITION (PARTITION p1990 VALUES LESS THAN (2000));
ERROR 1493 (HY000): VALUES LESS THAN value must be strictly increasing for each partition

Reorganize partitions

Split a partition:

ALTER TABLE members REORGANIZE PARTITION `p1990to2010` INTO
(PARTITION p1990 VALUES LESS THAN (2000),
 PARTITION p2000 VALUES LESS THAN (2010),
 PARTITION p2010 VALUES LESS THAN (2020),
 PARTITION p2020 VALUES LESS THAN (2030),
 PARTITION pMax VALUES LESS THAN (MAXVALUE));

ALTER TABLE member_level REORGANIZE PARTITION l5_6 INTO
(PARTITION l5 VALUES IN (5),
 PARTITION l6 VALUES IN (6));

Merge partitions:

ALTER TABLE members REORGANIZE PARTITION pBefore1950,p1950 INTO (PARTITION pBefore1960 VALUES LESS THAN (1960));

ALTER TABLE member_level REORGANIZE PARTITION l1,l2 INTO (PARTITION l1_2 VALUES IN (1,2));

Change the partitioning scheme definition:

ALTER TABLE members REORGANIZE PARTITION pBefore1960,p1960,p1970,p1980,p1990,p2000,p2010,p2020,pMax INTO
(PARTITION p1800 VALUES LESS THAN (1900),
 PARTITION p1900 VALUES LESS THAN (2000),
 PARTITION p2000 VALUES LESS THAN (2100));

ALTER TABLE member_level REORGANIZE PARTITION l1_2,l3,l4,l5,l6 INTO
(PARTITION lOdd VALUES IN (1,3,5),
 PARTITION lEven VALUES IN (2,4,6));

When reorganizing partitions, you need to note the following key points:

  • Reorganizing partitions (including merging or splitting partitions) can change the listed partitions into a new set of partition definitions but cannot change the type of partitioning (for example, change the List type to the Range type, or change the Range COLUMNS type to the Range type).

  • For a Range partition table, you can reorganize only adjacent partitions in it.

    ALTER TABLE members REORGANIZE PARTITION p1800,p2000 INTO (PARTITION p2000 VALUES LESS THAN (2100));
    ERROR 8200 (HY000): Unsupported REORGANIZE PARTITION of RANGE; not adjacent partitions
    
  • For a Range partitioned table, to modify the end of the range, the new end defined in VALUES LESS THAN must cover the existing rows in the last partition. Otherwise, existing rows no longer fit and an error is reported:

    INSERT INTO members VALUES (313, "John", "Doe", "2022-11-22", NULL);
    ALTER TABLE members REORGANIZE PARTITION p2000 INTO (PARTITION p2000 VALUES LESS THAN (2050)); -- This statement will work as expected, because 2050 covers the existing rows.
    ALTER TABLE members REORGANIZE PARTITION p2000 INTO (PARTITION p2000 VALUES LESS THAN (2020)); -- This statement will fail with an error, because 2022 does not fit in the new range.
    ERROR 1526 (HY000): Table has no partition for value 2022
    
  • For a List partitioned table, to modify the set of values defined for a partition, the new definition must cover the existing values in that partition. Otherwise, an error is reported:

    INSERT INTO member_level (id, level) values (313, 6);
    ALTER TABLE member_level REORGANIZE PARTITION lEven INTO (PARTITION lEven VALUES IN (2,4));
    ERROR 1526 (HY000): Table has no partition for value 6
    
  • After partitions are reorganized, the statistics of the corresponding partitions are outdated, so you will get the following warning. In this case, you can use the ANALYZE TABLE statement to update the statistics.

    +---------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
    | Level   | Code | Message                                                                                                                                                |
    +---------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
    | Warning | 1105 | The statistics of related partitions will be outdated after reorganizing partitions. Please use 'ANALYZE TABLE' statement if you want to update it now |
    +---------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
    1 row in set (0.00 sec)

Manage Hash and Key partitions

This section uses the partitioned table created by the following SQL statement as examples to show you how to manage Hash partitions. For Key partitions, you can use the same management statements as well.

CREATE TABLE example (
  id INT PRIMARY KEY,
  data VARCHAR(1024)
)
PARTITION BY HASH(id)
PARTITIONS 2;

Increase the number of partitions

Increase the number of partitions for the example table by 1 (from 2 to 3):

ALTER TABLE example ADD PARTITION PARTITIONS 1;

You can also specify partition options by adding partition definitions. For example, you can use the following statement to increase the number of partitions from 3 to 5 and specify the names of the newly added partitions as pExample4 and pExample5:

ALTER TABLE example ADD PARTITION
(PARTITION pExample4 COMMENT = 'not p3, but pExample4 instead',
 PARTITION pExample5 COMMENT = 'not p4, but pExample5 instead');

Decrease the number of partitions

Unlike Range and List partitioning, DROP PARTITION is not supported for Hash and Key partitioning, but you can decrease the number of partitions with COALESCE PARTITION or delete all data from specific partitions with TRUNCATE PARTITION.

Decrease the number of partitions for the example table by 1 (from 5 to 4):

ALTER TABLE example COALESCE PARTITION 1;

Note:

The process of changing the number of partitions for Hash or Key partitioned tables reorganizes the partitions by copying all data to the new number of partitions. Therefore, after changing the number of partitions for a Hash or Key partitioned table, you will get the following warning about the outdated statistics. In this case, you can use the ANALYZE TABLE statement to update the statistics.

+---------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level   | Code | Message                                                                                                                                                |
+---------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
| Warning | 1105 | The statistics of related partitions will be outdated after reorganizing partitions. Please use 'ANALYZE TABLE' statement if you want to update it now |
+---------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

To better understand how the example table is organized now, you can show the SQL statement that is used to recreate the example table as follows:

SHOW CREATE TABLE\G
*************************** 1. row ***************************
       Table: example
Create Table: CREATE TABLE `example` (
  `id` int(11) NOT NULL,
  `data` varchar(1024) DEFAULT NULL,
  PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
PARTITION BY HASH (`id`)
(PARTITION `p0`,
 PARTITION `p1`,
 PARTITION `p2`,
 PARTITION `pExample4` COMMENT 'not p3, but pExample4 instead')
1 row in set (0.01 sec)

Truncate partitions

Delete all data from a partition:

ALTER TABLE example TRUNCATE PARTITION p0;
Query OK, 0 rows affected (0.03 sec)

Convert a partitioned table to a non-partitioned table

To convert a partitioned table to a non-partitioned table, you can use the following statement, which removes the partitioning, copies all rows of the table, and recreates the indexes online for the table:

ALTER TABLE <table_name> REMOVE PARTITIONING

For example, to convert the members partitioned table to the non-partitioned table, you can execute the following statement:

ALTER TABLE members REMOVE PARTITIONING

Partition an existing table

To partition an existing non-partitioned table or modify the partition type of an existing partitioned table, you can use the following statement, which copies all rows and recreates the indexes online according to the new partition definitions:

ALTER TABLE <table_name> PARTITION BY <new partition type and definitions>

Examples:

To convert the existing members table to a HASH partitioned table with 10 partitions, you can execute the following statement:

ALTER TABLE members PARTITION BY HASH(id) PARTITIONS 10;

To convert the existing member_level table to a RANGE partitioned table, you can execute the following statement:

ALTER TABLE member_level PARTITION BY RANGE(level)
(PARTITION pLow VALUES LESS THAN (1),
 PARTITION pMid VALUES LESS THAN (3),
 PARTITION pHigh VALUES LESS THAN (7)
 PARTITION pMax VALUES LESS THAN (MAXVALUE));

Partition pruning

Partition pruning is an optimization which is based on a very simple idea - do not scan the partitions that do not match.

Assume that you create a partitioned table t1:

{{< copyable "sql" >}}

CREATE TABLE t1 (
    fname VARCHAR(50) NOT NULL,
    lname VARCHAR(50) NOT NULL,
    region_code TINYINT UNSIGNED NOT NULL,
    dob DATE NOT NULL
)

PARTITION BY RANGE( region_code ) (
    PARTITION p0 VALUES LESS THAN (64),
    PARTITION p1 VALUES LESS THAN (128),
    PARTITION p2 VALUES LESS THAN (192),
    PARTITION p3 VALUES LESS THAN MAXVALUE
);

If you want to get the result of this SELECT statement:

{{< copyable "sql" >}}

SELECT fname, lname, region_code, dob
    FROM t1
    WHERE region_code > 125 AND region_code < 130;

It is evident that the result falls in either the p1 or the p2 partition, that is, you just need to search for the matching rows in p1 and p2. Excluding the unneeded partitions is so-called "pruning". If the optimizer is able to prune a part of partitions, the execution of the query in the partitioned table will be much faster than that in a non-partitioned table.

The optimizer can prune partitions through WHERE conditions in the following two scenarios:

  • partition_column = constant
  • partition_column IN (constant1, constant2, ..., constantN)

Currently, partition pruning does not work with LIKE conditions.

Some cases for partition pruning to take effect

  1. Partition pruning uses the query conditions on the partitioned table, so if the query conditions cannot be pushed down to the partitioned table according to the planner's optimization rules, partition pruning does not apply for this query.

    For example:

    {{< copyable "sql" >}}

    create table t1 (x int) partition by range (x) (
            partition p0 values less than (5),
            partition p1 values less than (10));
    create table t2 (x int);

    {{< copyable "sql" >}}

    explain select * from t1 left join t2 on t1.x = t2.x where t2.x > 5;

    In this query, the left out join is converted to the inner join, and then t1.x > 5 is derived from t1.x = t2.x and t2.x > 5, so it could be used in partition pruning and only the partition p1 remains.

    explain select * from t1 left join t2 on t1.x = t2.x and t2.x > 5;

    In this query, t2.x > 5 cannot be pushed down to the t1 partitioned table, so partition pruning would not take effect for this query.

  2. Since partition pruning is done during the plan optimizing phase, it does not apply for those cases that filter conditions are unknown until the execution phase.

    For example:

    {{< copyable "sql" >}}

    create table t1 (x int) partition by range (x) (
            partition p0 values less than (5),
            partition p1 values less than (10));

    {{< copyable "sql" >}}

    explain select * from t2 where x < (select * from t1 where t2.x < t1.x and t2.x < 2);

    This query reads a row from t2 and uses the result for the subquery on t1. Theoretically, partition pruning could benefit from t1.x > val expression in the subquery, but it does not take effect there as that happens in the execution phase.

  3. As a result of a limitation from current implementation, if a query condition cannot be pushed down to TiKV, it cannot be used by the partition pruning.

    Take the fn(col) expression as an example. If the TiKV coprocessor supports this fn function, fn(col) may be pushed down to the leaf node (that is, partitioned table) according to the predicate push-down rule during the plan optimizing phase, and partition pruning can use it.

    If the TiKV coprocessor does not support this fn function, fn(col) would not be pushed down to the leaf node. Instead, it becomes a Selection node above the leaf node. The current partition pruning implementation does not support this kind of plan tree.

  4. For Hash and Key partition types, the only query supported by partition pruning is the equal condition.

  5. For Range partition, for partition pruning to take effect, the partition expression must be in those forms: col or fn(col), and the query condition must be one of >, <, =, >=, and <=. If the partition expression is in the form of fn(col), the fn function must be monotonous.

    If the fn function is monotonous, for any x and y, if x > y, then fn(x) > fn(y). Then this fn function can be called strictly monotonous. For any x and y, if x > y, then fn(x) >= fn(y). In this case, fn could also be called "monotonous". In theory, all monotonous functions are supported by partition pruning.

    Currently, partition pruning in TiDB only support those monotonous functions:

    For example, the partition expression is a simple column:

    {{< copyable "sql" >}}

    create table t (id int) partition by range (id) (
            partition p0 values less than (5),
            partition p1 values less than (10));
    select * from t where id > 6;

    Or the partition expression is in the form of fn(col) where fn is to_days:

    {{< copyable "sql" >}}

    create table t (dt datetime) partition by range (to_days(id)) (
            partition p0 values less than (to_days('2020-04-01')),
            partition p1 values less than (to_days('2020-05-01')));
    select * from t where dt > '2020-04-18';

    An exception is floor(unix_timestamp()) as the partition expression. TiDB does some optimization for that case by case, so it is supported by partition pruning.

    {{< copyable "sql" >}}

    create table t (ts timestamp(3) not null default current_timestamp(3))
    partition by range (floor(unix_timestamp(ts))) (
            partition p0 values less than (unix_timestamp('2020-04-01 00:00:00')),
            partition p1 values less than (unix_timestamp('2020-05-01 00:00:00')));
    select * from t where ts > '2020-04-18 02:00:42.123';

Partition selection

SELECT statements support partition selection, which is implemented by using a PARTITION option.

{{< copyable "sql" >}}

SET @@sql_mode = '';

CREATE TABLE employees  (
    id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
    fname VARCHAR(25) NOT NULL,
    lname VARCHAR(25) NOT NULL,
    store_id INT NOT NULL,
    department_id INT NOT NULL
)

PARTITION BY RANGE(id)  (
    PARTITION p0 VALUES LESS THAN (5),
    PARTITION p1 VALUES LESS THAN (10),
    PARTITION p2 VALUES LESS THAN (15),
    PARTITION p3 VALUES LESS THAN MAXVALUE
);

INSERT INTO employees VALUES
    ('', 'Bob', 'Taylor', 3, 2), ('', 'Frank', 'Williams', 1, 2),
    ('', 'Ellen', 'Johnson', 3, 4), ('', 'Jim', 'Smith', 2, 4),
    ('', 'Mary', 'Jones', 1, 1), ('', 'Linda', 'Black', 2, 3),
    ('', 'Ed', 'Jones', 2, 1), ('', 'June', 'Wilson', 3, 1),
    ('', 'Andy', 'Smith', 1, 3), ('', 'Lou', 'Waters', 2, 4),
    ('', 'Jill', 'Stone', 1, 4), ('', 'Roger', 'White', 3, 2),
    ('', 'Howard', 'Andrews', 1, 2), ('', 'Fred', 'Goldberg', 3, 3),
    ('', 'Barbara', 'Brown', 2, 3), ('', 'Alice', 'Rogers', 2, 2),
    ('', 'Mark', 'Morgan', 3, 3), ('', 'Karen', 'Cole', 3, 2);

You can view the rows stored in the p1 partition:

{{< copyable "sql" >}}

SELECT * FROM employees PARTITION (p1);
+----|-------|--------|----------|---------------+
| id | fname | lname  | store_id | department_id |
+----|-------|--------|----------|---------------+
|  5 | Mary  | Jones  |        1 |             1 |
|  6 | Linda | Black  |        2 |             3 |
|  7 | Ed    | Jones  |        2 |             1 |
|  8 | June  | Wilson |        3 |             1 |
|  9 | Andy  | Smith  |        1 |             3 |
+----|-------|--------|----------|---------------+
5 rows in set (0.00 sec)

If you want to get the rows in multiple partitions, you can use a list of partition names which are separated by commas. For example, SELECT * FROM employees PARTITION (p1, p2) returns all rows in the p1 and p2 partitions.

When you use partition selection, you can still use WHERE conditions and options such as ORDER BY and LIMIT. It is also supported to use aggregation options such as HAVING and GROUP BY.

{{< copyable "sql" >}}

SELECT * FROM employees PARTITION (p0, p2)
    WHERE lname LIKE 'S%';
+----|-------|-------|----------|---------------+
| id | fname | lname | store_id | department_id |
+----|-------|-------|----------|---------------+
|  4 | Jim   | Smith |        2 |             4 |
| 11 | Jill  | Stone |        1 |             4 |
+----|-------|-------|----------|---------------+
2 rows in set (0.00 sec)

{{< copyable "sql" >}}

SELECT id, CONCAT(fname, ' ', lname) AS name
    FROM employees PARTITION (p0) ORDER BY lname;
+----|----------------+
| id | name           |
+----|----------------+
|  3 | Ellen Johnson  |
|  4 | Jim Smith      |
|  1 | Bob Taylor     |
|  2 | Frank Williams |
+----|----------------+
4 rows in set (0.06 sec)

{{< copyable "sql" >}}

SELECT store_id, COUNT(department_id) AS c
    FROM employees PARTITION (p1,p2,p3)
    GROUP BY store_id HAVING c > 4;
+---|----------+
| c | store_id |
+---|----------+
| 5 |        2 |
| 5 |        3 |
+---|----------+
2 rows in set (0.00 sec)

Partition selection is supported for all types of table partitioning, including Range partitioning and Hash partitioning. For Hash partitions, if partition names are not specified, p0, p1, p2,..., or pN-1 is automatically used as the partition name.

SELECT in INSERT ... SELECT can also use partition selection.

Restrictions and limitations on partitions

This section introduces some restrictions and limitations on partitioned tables in TiDB.

Partitioning keys, primary keys and unique keys

This section discusses the relationship of partitioning keys with primary keys and unique keys. The rule governing this relationship can be expressed as follows: Every unique key on the table must use every column in the table's partitioning expression. This also includes the table's primary key, because it is by definition a unique key.

Note:

You can ignore this rule when using global indexes.

For example, the following table creation statements are invalid:

{{< copyable "sql" >}}

CREATE TABLE t1 (
    col1 INT NOT NULL,
    col2 DATE NOT NULL,
    col3 INT NOT NULL,
    col4 INT NOT NULL,
    UNIQUE KEY (col1, col2)
)

PARTITION BY HASH(col3)
PARTITIONS 4;

CREATE TABLE t2 (
    col1 INT NOT NULL,
    col2 DATE NOT NULL,
    col3 INT NOT NULL,
    col4 INT NOT NULL,
    UNIQUE KEY (col1),
    UNIQUE KEY (col3)
)

PARTITION BY HASH(col1 + col3)
PARTITIONS 4;

In each case, the proposed table has at least one unique key that does not include all columns used in the partitioning expression.

The valid statements are as follows:

{{< copyable "sql" >}}

CREATE TABLE t1 (
    col1 INT NOT NULL,
    col2 DATE NOT NULL,
    col3 INT NOT NULL,
    col4 INT NOT NULL,
    UNIQUE KEY (col1, col2, col3)
)

PARTITION BY HASH(col3)
PARTITIONS 4;

CREATE TABLE t2 (
    col1 INT NOT NULL,
    col2 DATE NOT NULL,
    col3 INT NOT NULL,
    col4 INT NOT NULL,
    UNIQUE KEY (col1, col3)
)

PARTITION BY HASH(col1 + col3)
PARTITIONS 4;

The following example displays an error:

{{< copyable "sql" >}}

CREATE TABLE t3 (
    col1 INT NOT NULL,
    col2 DATE NOT NULL,
    col3 INT NOT NULL,
    col4 INT NOT NULL,
    UNIQUE KEY (col1, col2),
    UNIQUE KEY (col3)
)

PARTITION BY HASH(col1 + col3)
PARTITIONS 4;
ERROR 8264 (HY000): Global Index is needed for index 'col1', since the unique index is not including all partitioning columns, and GLOBAL is not given as IndexOption

The CREATE TABLE statement fails because both col1 and col3 are included in the proposed partitioning key, but neither of these columns is part of both of unique keys on the table. After the following modifications, the CREATE TABLE statement becomes valid:

{{< copyable "sql" >}}

CREATE TABLE t3 (
    col1 INT NOT NULL,
    col2 DATE NOT NULL,
    col3 INT NOT NULL,
    col4 INT NOT NULL,
    UNIQUE KEY (col1, col2, col3),
    UNIQUE KEY (col1, col3)
)
PARTITION BY HASH(col1 + col3)
    PARTITIONS 4;

The following table cannot be partitioned at all, because there is no way to include in a partitioning key any columns that belong to both unique keys:

{{< copyable "sql" >}}

CREATE TABLE t4 (
    col1 INT NOT NULL,
    col2 INT NOT NULL,
    col3 INT NOT NULL,
    col4 INT NOT NULL,
    UNIQUE KEY (col1, col3),
    UNIQUE KEY (col2, col4)
);

Because every primary key is by definition a unique key, so the next two statements are invalid:

{{< copyable "sql" >}}

CREATE TABLE t5 (
    col1 INT NOT NULL,
    col2 DATE NOT NULL,
    col3 INT NOT NULL,
    col4 INT NOT NULL,
    PRIMARY KEY(col1, col2)
)

PARTITION BY HASH(col3)
PARTITIONS 4;

CREATE TABLE t6 (
    col1 INT NOT NULL,
    col2 DATE NOT NULL,
    col3 INT NOT NULL,
    col4 INT NOT NULL,
    PRIMARY KEY(col1, col3),
    UNIQUE KEY(col2)
)

PARTITION BY HASH( YEAR(col2) )
PARTITIONS 4;

In the above examples, the primary key does not include all columns referenced in the partitioning expression. After adding the missing column in the primary key, the CREATE TABLE statement becomes valid:

{{< copyable "sql" >}}

CREATE TABLE t5 (
    col1 INT NOT NULL,
    col2 DATE NOT NULL,
    col3 INT NOT NULL,
    col4 INT NOT NULL,
    PRIMARY KEY(col1, col2, col3)
)
PARTITION BY HASH(col3)
PARTITIONS 4;
CREATE TABLE t6 (
    col1 INT NOT NULL,
    col2 DATE NOT NULL,
    col3 INT NOT NULL,
    col4 INT NOT NULL,
    PRIMARY KEY(col1, col2, col3),
    UNIQUE KEY(col2)
)
PARTITION BY HASH( YEAR(col2) )
PARTITIONS 4;

If a table has neither unique keys nor primary keys, then this restriction does not apply.

When you change tables using DDL statements, you also need to consider this restriction when adding a unique index. For example, when you create a partitioned table as shown below:

{{< copyable "sql" >}}

CREATE TABLE t_no_pk (c1 INT, c2 INT)
    PARTITION BY RANGE(c1) (
        PARTITION p0 VALUES LESS THAN (10),
        PARTITION p1 VALUES LESS THAN (20),
        PARTITION p2 VALUES LESS THAN (30),
        PARTITION p3 VALUES LESS THAN (40)
    );
Query OK, 0 rows affected (0.12 sec)

You can add a non-unique index by using ALTER TABLE statements. But if you want to add a unique index, the c1 column must be included in the unique index.

When using a partitioned table, you cannot specify the prefix index as a unique attribute:

{{< copyable "sql" >}}

CREATE TABLE t (a varchar(20), b blob,
    UNIQUE INDEX (a(5)))
    PARTITION by range columns (a) (
    PARTITION p0 values less than ('aaaaa'),
    PARTITION p1 values less than ('bbbbb'),
    PARTITION p2 values less than ('ccccc'));
ERROR 8264 (HY000): Global Index is needed for index 'a', since the unique index is not including all partitioning columns, and GLOBAL is not given as IndexOption

Global indexes

Before the introduction of global indexes, TiDB created a local index for each partition, leading to a limitation that primary keys and unique keys had to include the partition key to ensure data uniqueness. Additionally, when querying data across multiple partitions, TiDB needed to scan the data of each partition to return results.

To address these issues, TiDB introduces the global indexes feature in v8.3.0. A global index covers the data of the entire table with a single index, allowing primary keys and unique keys to maintain global uniqueness without including all partition keys. Moreover, global indexes can access data across multiple partitions in a single operation, significantly improving query performance for non-partitioned keys.

To create a global index for a primary key or unique key that does not include all the columns used in the partition expressions, you can enable the tidb_enable_global_index system variable and add the GLOBAL keyword in the index definition.

Note:

Global indexes affect partition management. DROP, TRUNCATE, and REORGANIZE PARTITION operations also trigger updates to table-level global indexes, meaning that these DDL operations will only return results after the global indexes of the corresponding tables are fully updated.

CREATE TABLE t1 (
    col1 INT NOT NULL,
    col2 DATE NOT NULL,
    col3 INT NOT NULL,
    col4 INT NOT NULL,
    UNIQUE KEY uidx12(col1, col2) GLOBAL,
    UNIQUE KEY uidx3(col3)
)
PARTITION BY HASH(col3)
PARTITIONS 4;

In the preceding example, the unique index uidx12 is a global index, while uidx3 is a regular unique index.

Note that a clustered index cannot be a global index, as shown in the following example:

CREATE TABLE t2 (
    col1 INT NOT NULL,
    col2 DATE NOT NULL,
    PRIMARY KEY (col2) CLUSTERED GLOBAL
) PARTITION BY HASH(col1) PARTITIONS 5;
ERROR 1503 (HY000): A CLUSTERED INDEX must include all columns in the table's partitioning function

The reason is that if the clustered index is a global index, the table will no longer be partitioned. This is because the key of the clustered index is also the record key at the partition level, but the global index is at the table level, which causes a conflict. If you need to set the primary key as a global index, you must explicitly define it as a non-clustered index, for example, PRIMARY KEY(col1, col2) NONCLUSTERED GLOBAL.

You can identify a global index by the GLOBAL index option in the SHOW CREATE TABLE output.

SHOW CREATE TABLE t1\G
       Table: t1
Create Table: CREATE TABLE `t1` (
  `col1` int(11) NOT NULL,
  `col2` date NOT NULL,
  `col3` int(11) NOT NULL,
  `col4` int(11) NOT NULL,
  UNIQUE KEY `uidx12` (`col1`,`col2`) /*T![global_index] GLOBAL */,
  UNIQUE KEY `uidx3` (`col3`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
PARTITION BY HASH (`col3`) PARTITIONS 4
1 row in set (0.00 sec)

Alternatively, you can query the INFORMATION_SCHEMA.TIDB_INDEXES table and check the IS_GLOBAL column in the output.

SELECT * FROM INFORMATION_SCHEMA.TIDB_INDEXES WHERE table_name='t1';
+--------------+------------+------------+----------+--------------+-------------+----------+---------------+------------+----------+------------+-----------+-----------+
| TABLE_SCHEMA | TABLE_NAME | NON_UNIQUE | KEY_NAME | SEQ_IN_INDEX | COLUMN_NAME | SUB_PART | INDEX_COMMENT | Expression | INDEX_ID | IS_VISIBLE | CLUSTERED | IS_GLOBAL |
+--------------+------------+------------+----------+--------------+-------------+----------+---------------+------------+----------+------------+-----------+-----------+
| test         | t1         |          0 | uidx12   |            1 | col1        |     NULL |               | NULL       |        1 | YES        | NO        |         1 |
| test         | t1         |          0 | uidx12   |            2 | col2        |     NULL |               | NULL       |        1 | YES        | NO        |         1 |
| test         | t1         |          0 | uidx3    |            1 | col3        |     NULL |               | NULL       |        2 | YES        | NO        |         0 |
+--------------+------------+------------+----------+--------------+-------------+----------+---------------+------------+----------+------------+-----------+-----------+
3 rows in set (0.00 sec)

When partitioning a non-partitioned table or repartitioning an already partitioned table, you can update the indexes to be global indexes or revert them to local indexes as needed:

ALTER TABLE t1 PARTITION BY HASH (col1) PARTITIONS 3 UPDATE INDEXES (uidx12 LOCAL, uidx3 GLOBAL);
Limitations of global indexes
  • If the GLOBAL keyword is not explicitly specified in the index definition, TiDB creates a local index by default.
  • The GLOBAL and LOCAL keywords only apply to partitioned tables and do not affect non-partitioned tables. In other words, there is no difference between a global index and a local index in non-partitioned tables.
  • DDL operations such as ADD PARTITION, DROP PARTITION, TRUNCATE PARTITION, REORGANIZE PARTITION, SPLIT PARTITION, and EXCHANGE PARTITION also trigger updates to global indexes. The results of these DDL operations will only be returned after the global indexes of the corresponding tables are fully updated. This can delay operations that usually require quick DDL completion, such as data archiving operations (EXCHANGE PARTITION, TRUNCATE PARTITION, and DROP PARTITION). In contrast, when global indexes are not involved, these DDL operations can be completed immediately.
  • By default, the primary key of a partitioned table is a clustered index and must include the partition key. If you require the primary key to exclude the partition key, you can explicitly specify the primary key as a non-clustered global index when creating the table, for example, PRIMARY KEY(col1, col2) NONCLUSTERED GLOBAL.
  • If a global index is added to an expression column, or a global index is also a prefix index (for example UNIQUE KEY idx_id_prefix (id(10)) GLOBAL), you need to collect statistics manually for this global index.

Partitioning limitations relating to functions

Only the functions shown in the following list are allowed in partitioning expressions:

ABS()
CEILING()
DATEDIFF()
DAY()
DAYOFMONTH()
DAYOFWEEK()
DAYOFYEAR()
EXTRACT() (see EXTRACT() function with WEEK specifier)
FLOOR()
HOUR()
MICROSECOND()
MINUTE()
MOD()
MONTH()
QUARTER()
SECOND()
TIME_TO_SEC()
TO_DAYS()
TO_SECONDS()
UNIX_TIMESTAMP() (with TIMESTAMP columns)
WEEKDAY()
YEAR()
YEARWEEK()

Compatibility with MySQL

Currently, TiDB supports Range partitioning, Range COLUMNS partitioning, List partitioning, List COLUMNS partitioning, Hash partitioning, and Key partitioning. Other partitioning types that are available in MySQL are not supported yet in TiDB.

With regard to partition management, any operation that requires moving data in the bottom implementation is not supported currently, including but not limited to: adjust the number of partitions in a Hash partitioned table, modify the Range of a Range partitioned table, and merge partitions.

For the unsupported partitioning types, when you create a table in TiDB, the partitioning information is ignored and the table is created in the regular form with a warning reported.

The LOAD DATA syntax does not support partition selection currently in TiDB.

{{< copyable "sql" >}}

create table t (id int, val int) partition by hash(id) partitions 4;

The regular LOAD DATA operation is supported:

{{< copyable "sql" >}}

load local data infile "xxx" into t ...

But Load Data does not support partition selection:

{{< copyable "sql" >}}

load local data infile "xxx" into t partition (p1)...

For a partitioned table, the result returned by select * from t is unordered between the partitions. This is different from the result in MySQL, which is ordered between the partitions but unordered inside the partitions.

{{< copyable "sql" >}}

create table t (id int, val int) partition by range (id) (
    partition p0 values less than (3),
    partition p1 values less than (7),
    partition p2 values less than (11));
Query OK, 0 rows affected (0.10 sec)

{{< copyable "sql" >}}

insert into t values (1, 2), (3, 4),(5, 6),(7,8),(9,10);
Query OK, 5 rows affected (0.01 sec)
Records: 5  Duplicates: 0  Warnings: 0

TiDB returns a different result every time, for example:

{{< copyable "sql" >}}

select * from t;
+------|------+
| id   | val  |
+------|------+
|    7 |    8 |
|    9 |   10 |
|    1 |    2 |
|    3 |    4 |
|    5 |    6 |
+------|------+
5 rows in set (0.00 sec)

The result returned in MySQL:

{{< copyable "sql" >}}

select * from t;
+------|------+
| id   | val  |
+------|------+
|    1 |    2 |
|    3 |    4 |
|    5 |    6 |
|    7 |    8 |
|    9 |   10 |
+------|------+
5 rows in set (0.00 sec)

Dynamic pruning mode

TiDB accesses partitioned tables in either dynamic or static mode. dynamic mode is used by default since v6.3.0. However, dynamic partitioning is effective only after the full table-level statistics, or GlobalStats, are collected. Before GlobalStats are collected, TiDB will use the static mode instead. For detailed information about GlobalStats, see Collect statistics of partitioned tables in dynamic pruning mode.

{{< copyable "sql" >}}

set @@session.tidb_partition_prune_mode = 'dynamic'

Manual ANALYZE and normal queries use the session-level tidb_partition_prune_mode setting. The auto-analyze operation in the background uses the global tidb_partition_prune_mode setting.

In static mode, partitioned tables use partition-level statistics. In dynamic mode, partitioned tables use table-level GlobalStats.

When switching from static mode to dynamic mode, you need to check and collect statistics manually. This is because after the switch to dynamic mode, partitioned tables have only partition-level statistics but no table-level statistics. GlobalStats are collected only upon the next auto-analyze operation.

{{< copyable "sql" >}}

set session tidb_partition_prune_mode = 'dynamic';
show stats_meta where table_name like "t";
+---------+------------+----------------+---------------------+--------------+-----------+
| Db_name | Table_name | Partition_name | Update_time         | Modify_count | Row_count |
+---------+------------+----------------+---------------------+--------------+-----------+
| test    | t          | p0             | 2022-05-27 20:23:34 |            1 |         2 |
| test    | t          | p1             | 2022-05-27 20:23:34 |            2 |         4 |
| test    | t          | p2             | 2022-05-27 20:23:34 |            2 |         4 |
+---------+------------+----------------+---------------------+--------------+-----------+
3 rows in set (0.01 sec)

To make sure that the statistics used by SQL statements are correct after you enable global dynamic pruning mode, you need to manually trigger analyze on the tables or on a partition of the table to obtain GlobalStats.

{{< copyable "sql" >}}

analyze table t partition p1;
show stats_meta where table_name like "t";
+---------+------------+----------------+---------------------+--------------+-----------+
| Db_name | Table_name | Partition_name | Update_time         | Modify_count | Row_count |
+---------+------------+----------------+---------------------+--------------+-----------+
| test    | t          | global         | 2022-05-27 20:50:53 |            0 |         5 |
| test    | t          | p0             | 2022-05-27 20:23:34 |            1 |         2 |
| test    | t          | p1             | 2022-05-27 20:50:52 |            0 |         2 |
| test    | t          | p2             | 2022-05-27 20:50:08 |            0 |         2 |
+---------+------------+----------------+---------------------+--------------+-----------+
4 rows in set (0.00 sec)

If the following warning is displayed during the analyze process, partition statistics are inconsistent, and you need to collect statistics of these partitions or the entire table again.

| Warning | 8244 | Build table: `t` column: `a` global-level stats failed due to missing partition-level column stats, please run analyze table to refresh columns of all partitions

You can also use scripts to update statistics of all partitioned tables. For details, see Update statistics of partitioned tables in dynamic pruning mode.

After table-level statistics are ready, you can enable the global dynamic pruning mode, which is effective to all SQL statements and auto-analyze operations.

{{< copyable "sql" >}}

set global tidb_partition_prune_mode = dynamic

In static mode, TiDB accesses each partition separately using multiple operators, and then merges the results using Union. The following example is a simple read operation where TiDB merges the results of two corresponding partitions using Union:

{{< copyable "sql" >}}

mysql> create table t1(id int, age int, key(id)) partition by range(id) (
        partition p0 values less than (100),
        partition p1 values less than (200),
        partition p2 values less than (300),
        partition p3 values less than (400));
Query OK, 0 rows affected (0.01 sec)

mysql> explain select * from t1 where id < 150;
+------------------------------+----------+-----------+------------------------+--------------------------------+
| id                           | estRows  | task      | access object          | operator info                  |
+------------------------------+----------+-----------+------------------------+--------------------------------+
| PartitionUnion_9             | 6646.67  | root      |                        |                                |
| ├─TableReader_12             | 3323.33  | root      |                        | data:Selection_11              |
| │ └─Selection_11             | 3323.33  | cop[tikv] |                        | lt(test.t1.id, 150)            |
| │   └─TableFullScan_10       | 10000.00 | cop[tikv] | table:t1, partition:p0 | keep order:false, stats:pseudo |
| └─TableReader_18             | 3323.33  | root      |                        | data:Selection_17              |
|   └─Selection_17             | 3323.33  | cop[tikv] |                        | lt(test.t1.id, 150)            |
|     └─TableFullScan_16       | 10000.00 | cop[tikv] | table:t1, partition:p1 | keep order:false, stats:pseudo |
+------------------------------+----------+-----------+------------------------+--------------------------------+
7 rows in set (0.00 sec)

In dynamic mode, each operator supports direct access to multiple partitions, so TiDB no longer uses Union.

{{< copyable "sql" >}}

mysql> set @@session.tidb_partition_prune_mode = 'dynamic';
Query OK, 0 rows affected (0.00 sec)

mysql> explain select * from t1 where id < 150;
+-------------------------+----------+-----------+-----------------+--------------------------------+
| id                      | estRows  | task      | access object   | operator info                  |
+-------------------------+----------+-----------+-----------------+--------------------------------+
| TableReader_7           | 3323.33  | root      | partition:p0,p1 | data:Selection_6               |
| └─Selection_6           | 3323.33  | cop[tikv] |                 | lt(test.t1.id, 150)            |
|   └─TableFullScan_5     | 10000.00 | cop[tikv] | table:t1        | keep order:false, stats:pseudo |
+-------------------------+----------+-----------+-----------------+--------------------------------+
3 rows in set (0.00 sec)

From the above query results, you can see that the Union operator in the execution plan disappears while the partition pruning still takes effect and the execution plan only accesses p0 and p1.

dynamic mode makes execution plans simpler and clearer. Omitting the Union operation can improve the execution efficiency and avoid the problem of Union concurrent execution. In addition, dynamic mode also allows execution plans with IndexJoin which cannot be used in static mode. (See examples below)

Example 1: In the following example, a query is performed in static mode using the execution plan with IndexJoin:

{{< copyable "sql" >}}

mysql> create table t1 (id int, age int, key(id)) partition by range(id)
    (partition p0 values less than (100),
     partition p1 values less than (200),
     partition p2 values less than (300),
     partition p3 values less than (400));
Query OK, 0 rows affected (0,08 sec)

mysql> create table t2 (id int, code int);
Query OK, 0 rows affected (0.01 sec)

mysql> set @@tidb_partition_prune_mode = 'static';
Query OK, 0 rows affected (0.00 sec)

mysql> explain select /*+ TIDB_INLJ(t1, t2) */ t1.* from t1, t2 where t2.code = 0 and t2.id = t1.id;
+--------------------------------+----------+-----------+------------------------+------------------------------------------------+
| id                             | estRows  | task      | access object          | operator info                                  |
+--------------------------------+----------+-----------+------------------------+------------------------------------------------+
| HashJoin_13                    | 12.49    | root      |                        | inner join, equal:[eq(test.t1.id, test.t2.id)] |
| ├─TableReader_42(Build)        | 9.99     | root      |                        | data:Selection_41                              |
| │ └─Selection_41               | 9.99     | cop[tikv] |                        | eq(test.t2.code, 0), not(isnull(test.t2.id))   |
| │   └─TableFullScan_40         | 10000.00 | cop[tikv] | table:t2               | keep order:false, stats:pseudo                 |
| └─PartitionUnion_15(Probe)     | 39960.00 | root      |                        |                                                |
|   ├─TableReader_18             | 9990.00  | root      |                        | data:Selection_17                              |
|   │ └─Selection_17             | 9990.00  | cop[tikv] |                        | not(isnull(test.t1.id))                        |
|   │   └─TableFullScan_16       | 10000.00 | cop[tikv] | table:t1, partition:p0 | keep order:false, stats:pseudo                 |
|   ├─TableReader_24             | 9990.00  | root      |                        | data:Selection_23                              |
|   │ └─Selection_23             | 9990.00  | cop[tikv] |                        | not(isnull(test.t1.id))                        |
|   │   └─TableFullScan_22       | 10000.00 | cop[tikv] | table:t1, partition:p1 | keep order:false, stats:pseudo                 |
|   ├─TableReader_30             | 9990.00  | root      |                        | data:Selection_29                              |
|   │ └─Selection_29             | 9990.00  | cop[tikv] |                        | not(isnull(test.t1.id))                        |
|   │   └─TableFullScan_28       | 10000.00 | cop[tikv] | table:t1, partition:p2 | keep order:false, stats:pseudo                 |
|   └─TableReader_36             | 9990.00  | root      |                        | data:Selection_35                              |
|     └─Selection_35             | 9990.00  | cop[tikv] |                        | not(isnull(test.t1.id))                        |
|       └─TableFullScan_34       | 10000.00 | cop[tikv] | table:t1, partition:p3 | keep order:false, stats:pseudo                 |
+--------------------------------+----------+-----------+------------------------+------------------------------------------------+
17 rows in set, 1 warning (0.00 sec)

mysql> show warnings;
+---------+------+------------------------------------------------------------------------------------+
| Level   | Code | Message                                                                            |
+---------+------+------------------------------------------------------------------------------------+
| Warning | 1815 | Optimizer Hint /*+ INL_JOIN(t1, t2) */ or /*+ TIDB_INLJ(t1, t2) */ is inapplicable |
+---------+------+------------------------------------------------------------------------------------+
1 row in set (0,00 sec)

From example 1, you can see that even if the TIDB_INLJ hint is used, the query on the partitioned table cannot select the execution plan with IndexJoin.

Example 2: In the following example, the query is performed in dynamic mode using the execution plan with IndexJoin:

{{< copyable "sql" >}}

mysql> set @@tidb_partition_prune_mode = 'dynamic';
Query OK, 0 rows affected (0.00 sec)

mysql> explain select /*+ TIDB_INLJ(t1, t2) */ t1.* from t1, t2 where t2.code = 0 and t2.id = t1.id;
+---------------------------------+----------+-----------+------------------------+---------------------------------------------------------------------------------------------------------------------+
| id                              | estRows  | task      | access object          | operator info                                                                                                       |
+---------------------------------+----------+-----------+------------------------+---------------------------------------------------------------------------------------------------------------------+
| IndexJoin_11                    | 12.49    | root      |                        | inner join, inner:IndexLookUp_10, outer key:test.t2.id, inner key:test.t1.id, equal cond:eq(test.t2.id, test.t1.id) |
| ├─TableReader_16(Build)         | 9.99     | root      |                        | data:Selection_15                                                                                                   |
| │ └─Selection_15                | 9.99     | cop[tikv] |                        | eq(test.t2.code, 0), not(isnull(test.t2.id))                                                                        |
| │   └─TableFullScan_14          | 10000.00 | cop[tikv] | table:t2               | keep order:false, stats:pseudo                                                                                      |
| └─IndexLookUp_10(Probe)         | 12.49    | root      | partition:all          |                                                                                                                     |
|   ├─Selection_9(Build)          | 12.49    | cop[tikv] |                        | not(isnull(test.t1.id))                                                                                             |
|   │ └─IndexRangeScan_7          | 12.50    | cop[tikv] | table:t1, index:id(id) | range: decided by [eq(test.t1.id, test.t2.id)], keep order:false, stats:pseudo                                      |
|   └─TableRowIDScan_8(Probe)     | 12.49    | cop[tikv] | table:t1               | keep order:false, stats:pseudo                                                                                      |
+---------------------------------+----------+-----------+------------------------+---------------------------------------------------------------------------------------------------------------------+
8 rows in set (0.00 sec)

From example 2, you can see that in dynamic mode, the execution plan with IndexJoin is selected when you execute the query.

Currently, static pruning mode does not support plan cache for both prepared and non-prepared statements.

Update statistics of partitioned tables in dynamic pruning mode

  1. Locate all partitioned tables:

    {{< copyable "sql" >}}

    SELECT DISTINCT CONCAT(TABLE_SCHEMA,'.', TABLE_NAME)
        FROM information_schema.PARTITIONS
        WHERE TIDB_PARTITION_ID IS NOT NULL
        AND TABLE_SCHEMA NOT IN ('INFORMATION_SCHEMA', 'mysql', 'sys', 'PERFORMANCE_SCHEMA', 'METRICS_SCHEMA');
    +-------------------------------------+
    | concat(TABLE_SCHEMA,'.',TABLE_NAME) |
    +-------------------------------------+
    | test.t                              |
    +-------------------------------------+
    1 row in set (0.02 sec)
    
  2. Generate the statements for updating the statistics of all partitioned tables:

    SELECT DISTINCT CONCAT('ANALYZE TABLE ',TABLE_SCHEMA,'.',TABLE_NAME,' ALL COLUMNS;')
        FROM information_schema.PARTITIONS
        WHERE TIDB_PARTITION_ID IS NOT NULL
        AND TABLE_SCHEMA NOT IN ('INFORMATION_SCHEMA','mysql','sys','PERFORMANCE_SCHEMA','METRICS_SCHEMA');
    +----------------------------------------------------------------------+
    | concat('ANALYZE TABLE ',TABLE_SCHEMA,'.',TABLE_NAME,' ALL COLUMNS;') |
    +----------------------------------------------------------------------+
    | ANALYZE TABLE test.t ALL COLUMNS;                                    |
    +----------------------------------------------------------------------+
    1 row in set (0.01 sec)
    

    You can change ALL COLUMNS to the columns you need.

  3. Export the batch update statements to a file:

    mysql --host xxxx --port xxxx -u root -p -e "SELECT DISTINCT CONCAT('ANALYZE TABLE ',TABLE_SCHEMA,'.',TABLE_NAME,' ALL COLUMNS;') \
        FROM information_schema.PARTITIONS \
        WHERE TIDB_PARTITION_ID IS NOT NULL \
        AND TABLE_SCHEMA NOT IN ('INFORMATION_SCHEMA','mysql','sys','PERFORMANCE_SCHEMA','METRICS_SCHEMA');" | tee gatherGlobalStats.sql
  4. Execute a batch update:

    Process SQL statements before executing the source command:

    sed -i "" '1d' gatherGlobalStats.sql --- mac
    sed -i '1d' gatherGlobalStats.sql --- linux
    

    {{< copyable "sql" >}}

    SET session tidb_partition_prune_mode = dynamic;
    source gatherGlobalStats.sql