Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 71 additions & 5 deletions docs/advanced/partition/auto-partition.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,11 +141,77 @@ When building a table, use the following syntax to populate [CREATE-TABLE](../..

### Using constraints

1. The partition column for AUTO PARTITION must be a NOT NULL column;
2. In an AUTO LIST PARTITION, **the length of the partition name must not exceed 50**. This length is derived from the splicing and escaping of the contents of the partition columns on the corresponding rows of data, so the actual allowable length may be shorter;
3. In AUTO RANGE PARTITION, the partition function supports only `date_trunc` and the partition column supports only `DATE` or `DATETIME` type;
4. In AUTO LIST PARTITION, function calls are not supported. Partitioned columns support `BOOLEAN`, `TINYINT`, `SMALLINT`, `INT`, `BIGINT`, `LARGEINT`, `DATE`, `DATETIME`, `CHAR`, `VARCHAR` data-types, and partitioned values are enum values;
5. In AUTO LIST PARTITION, a separate new PARTITION is created for each fetch of a partition column for which the corresponding partition does not currently exist.
1. In AUTO LIST PARTITION, **the length of the partition name must not exceed 50**. This length is derived from the splicing and escaping of the contents of the partition columns on the corresponding rows of data, so the actual allowable length may be shorter;
2. In AUTO RANGE PARTITION, the partition function supports only `date_trunc` and the partition column supports only `DATE` or `DATETIME` type;
3. In AUTO LIST PARTITION, function calls are not supported. Partitioned columns support `BOOLEAN`, `TINYINT`, `SMALLINT`, `INT`, `BIGINT`, `LARGEINT`, `DATE`, `DATETIME`, `CHAR`, `VARCHAR` data-types, and partitioned values are enum values;
4. In AUTO LIST PARTITION, a separate new PARTITION is created for each fetch of a partition column for which the corresponding partition does not currently exist.

### NULL-valued partition

Both LIST and RANGE partitions support NULL columns as partition columns when the session variable `allow_partition_column_nullable` is turned on. When a partition column actually encounters an insert with a NULL value:

1. For an AUTO LIST PARTITION, the corresponding NULL-valued partition is automatically created:

```sql
mysql> create table auto_null_list(
-> k0 varchar null
-> )
-> auto partition by list (k0)
-> (
-> )
-> DISTRIBUTED BY HASH(`k0`) BUCKETS 1
-> properties("replication_num" = "1");
Query OK, 0 rows affected (0.10 sec)

mysql> insert into auto_null_list values (null);
Query OK, 1 row affected (0.28 sec)

mysql> select * from auto_null_list;
+------+
| k0 |
+------+
| NULL |
+------+
1 row in set (0.20 sec)

mysql> select * from auto_null_list partition(pX);
+------+
| k0 |
+------+
| NULL |
+------+
1 row in set (0.20 sec)
```

2. In Doris, NULL values are included in the minimum value partition, whether they are AUTO PARTITION tables or not. Therefore, if a partition is automatically created for a NULL value, you get a partition starting with the minimum value:

```sql
mysql> CREATE TABLE `range_table_nullable` (
-> `k1` INT,
-> `k2` DATETIMEV2(3),
-> `k3` DATETIMEV2(6)
-> ) ENGINE=OLAP
-> DUPLICATE KEY(`k1`)
-> AUTO PARTITION BY RANGE date_trunc(`k2`, 'day')
-> (
-> )
-> DISTRIBUTED BY HASH(`k1`) BUCKETS 16
-> PROPERTIES (
-> "replication_allocation" = "tag.location.default: 1"
-> );
Query OK, 0 rows affected (0.09 sec)

mysql> insert into range_table_nullable values (0, null, null);
Query OK, 1 row affected (0.21 sec)

mysql> show partitions from range_table_nullable;
+-------------+-----------------+----------------+---------------------+--------+--------------+----------------------------------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+----------+------------+-------------------------+-----------+--------------------+--------------+
| PartitionId | PartitionName | VisibleVersion | VisibleVersionTime | State | PartitionKey | Range | DistributionKey | Buckets | ReplicationNum | StorageMedium | CooldownTime | RemoteStoragePolicy | LastConsistencyCheckTime | DataSize | IsInMemory | ReplicaAllocation | IsMutable | SyncWithBaseTables | UnsyncTables |
+-------------+-----------------+----------------+---------------------+--------+--------------+----------------------------------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+----------+------------+-------------------------+-----------+--------------------+--------------+
| 457060 | p00000101000000 | 2 | 2024-03-25 03:01:38 | NORMAL | k2 | [types: [DATETIMEV2]; keys: [0000-01-01 00:00:00]; ..types: [DATETIMEV2]; keys: [0000-01-02 00:00:00]; ) | k1 | 16 | 1 | HDD | 9999-12-31 23:59:59 | | NULL | 0.000 | false | tag.location.default: 1 | true | true | NULL |
+-------------+-----------------+----------------+---------------------+--------+--------------+----------------------------------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+----------+------------+-------------------------+-----------+--------------------+--------------+
1 row in set (0.09 sec)
```

## Sample Scenarios

Expand Down
81 changes: 80 additions & 1 deletion docs/data-table/data-partition.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,8 @@ PARTITION BY RANGE(`date`)
(
PARTITION `p201701` VALUES LESS THAN ("2017-02-01"),
PARTITION `p201702` VALUES LESS THAN ("2017-03-01"),
PARTITION `p201703` VALUES LESS THAN ("2017-04-01")
PARTITION `p201703` VALUES LESS THAN ("2017-04-01"),
PARTITION `p2018` VALUES [("2018-01-01"), ("2019-01-01"))
)
DISTRIBUTED BY HASH(`user_id`) BUCKETS 16
PROPERTIES
Expand Down Expand Up @@ -376,6 +377,84 @@ Compound partitioning is recommended for the following scenarios:

Users can also choose for single partitioning, which is about HASH distribution.

#### NULL-valued partition

PARTITION columns must be NOT NULL columns by default, if you need to use NULL columns, you should set the session variable `allow_partition_column_nullable = true`. For LIST PARTITION, we support true NULL partitions. For RANGE PARTITION, NULL values are assigned to the **minimal LESS THAN partition**. The partitions are listed below:

1. LIST PARTITION

```sql
mysql> create table null_list(
-> k0 varchar null
-> )
-> partition by list (k0)
-> (
-> PARTITION pX values in ((NULL))
-> )
-> DISTRIBUTED BY HASH(`k0`) BUCKETS 1
-> properties("replication_num" = "1");
Query OK, 0 rows affected (0.11 sec)

mysql> insert into null_list values (null);
Query OK, 1 row affected (0.19 sec)

mysql> select * from null_list;
+------+
| k0 |
+------+
| NULL |
+------+
1 row in set (0.18 sec)
```

2. RANGE partition - attributed to the minimal LESS THAN partition

```sql
mysql> create table null_range(
-> k0 int null
-> )
-> partition by range (k0)
-> (
-> PARTITION p10 values less than (10),
-> PARTITION p100 values less than (100),
-> PARTITION pMAX values less than (maxvalue)
-> )
-> DISTRIBUTED BY HASH(`k0`) BUCKETS 1
-> properties("replication_num" = "1");
Query OK, 0 rows affected (0.12 sec)

mysql> insert into null_range values (null);
Query OK, 1 row affected (0.19 sec)

mysql> select * from null_range partition(p10);
+------+
| k0 |
+------+
| NULL |
+------+
1 row in set (0.18 sec)
```

3. RANGE partition -- cannot be inserted without the LESS THAN partition

```sql
mysql> create table null_range2(
-> k0 int null
-> )
-> partition by range (k0)
-> (
-> PARTITION p200 values [("100"), ("200"))
-> )
-> DISTRIBUTED BY HASH(`k0`) BUCKETS 1
-> properties("replication_num" = "1");
Query OK, 0 rows affected (0.13 sec)

mysql> insert into null_range2 values (null);
ERROR 5025 (HY000): Insert has filtered data in strict mode, tracking_url=......
```

Auto Partition's handling of NULL partition values is detailed in its documentation [corresponding section](../advanced/partition/auto-partition/#null-valued-partition)。

### PROPERTIES

In the `PROPERTIES` section at the last of the CREATE TABLE statement, you can set the relevant parameters. Please see [CREATE TABLE](../sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-TABLE.md) for a detailed introduction.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -836,7 +836,11 @@ Therefore, it is recommended to confirm the usage method to build the table reas

#### Dynamic Partition

The dynamic partition function is mainly used to help users automatically manage partitions. By setting certain rules, the Doris system regularly adds new partitions or deletes historical partitions. Please refer to [Dynamic Partition](../../../../advanced/partition/dynamic-partition.md) document for more help.
The dynamic partition function is mainly used to help users automatically manage partitions. By setting certain rules, the Doris system regularly adds new partitions or deletes historical partitions. Please refer to [Dynamic Partition](../../../../advanced/partition/dynamic-partition) document for more help.

#### Auto Partition

See in [Auto Partition](../../../../advanced/partition/auto-partition).

#### Materialized View

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -141,11 +141,77 @@ PROPERTIES (

### 约束

1. 自动分区的分区列必须为 NOT NULL 列。
2. 在AUTO LIST PARTITION中,**分区名长度不得超过 50**. 该长度来自于对应数据行上各分区列内容的拼接与转义,因此实际容许长度可能更短。
3. 在AUTO RANGE PARTITION中,分区函数仅支持 `date_trunc`,分区列仅支持 `DATE` 或者 `DATETIME` 格式;
4. 在AUTO LIST PARTITION中,不支持函数调用,分区列支持 `BOOLEAN`, `TINYINT`, `SMALLINT`, `INT`, `BIGINT`, `LARGEINT`, `DATE`, `DATETIME`, `CHAR`, `VARCHAR` 数据类型,分区值为枚举值。
5. 在AUTO LIST PARTITION中,分区列的每个当前不存在对应分区的取值,都会创建一个独立的新PARTITION。
1. 在 AUTO LIST PARTITION 中,**分区名长度不得超过 50**. 该长度来自于对应数据行上各分区列内容的拼接与转义,因此实际容许长度可能更短。
2. 在 AUTO RANGE PARTITION 中,分区函数仅支持 `date_trunc`,分区列仅支持 `DATE` 或者 `DATETIME` 格式;
3. 在 AUTO LIST PARTITION 中,不支持函数调用,分区列支持 `BOOLEAN`, `TINYINT`, `SMALLINT`, `INT`, `BIGINT`, `LARGEINT`, `DATE`, `DATETIME`, `CHAR`, `VARCHAR` 数据类型,分区值为枚举值。
4. 在 AUTO LIST PARTITION 中,分区列的每个当前不存在对应分区的取值,都会创建一个独立的新 PARTITION。

### NULL 值分区

当开启 session variable `allow_partition_column_nullable` 后,LIST 和 RANGE 分区都支持 NULL 列作为分区列。当分区列实际遇到 NULL 值的插入时:

1. 对于 AUTO LIST PARTITION,会自动创建对应的 NULL 值分区:

```sql
mysql> create table auto_null_list(
-> k0 varchar null
-> )
-> auto partition by list (k0)
-> (
-> )
-> DISTRIBUTED BY HASH(`k0`) BUCKETS 1
-> properties("replication_num" = "1");
Query OK, 0 rows affected (0.10 sec)

mysql> insert into auto_null_list values (null);
Query OK, 1 row affected (0.28 sec)

mysql> select * from auto_null_list;
+------+
| k0 |
+------+
| NULL |
+------+
1 row in set (0.20 sec)

mysql> select * from auto_null_list partition(pX);
+------+
| k0 |
+------+
| NULL |
+------+
1 row in set (0.20 sec)
```

2. Doris 中,无论是否为 AUTO PARTITION 表,NULL 值都被包含在最小值分区内。因此,如果对 NULL 值自动创建分区,会得到一个最小值起始的分区:

```sql
mysql> CREATE TABLE `range_table_nullable` (
-> `k1` INT,
-> `k2` DATETIMEV2(3),
-> `k3` DATETIMEV2(6)
-> ) ENGINE=OLAP
-> DUPLICATE KEY(`k1`)
-> AUTO PARTITION BY RANGE date_trunc(`k2`, 'day')
-> (
-> )
-> DISTRIBUTED BY HASH(`k1`) BUCKETS 16
-> PROPERTIES (
-> "replication_allocation" = "tag.location.default: 1"
-> );
Query OK, 0 rows affected (0.09 sec)

mysql> insert into range_table_nullable values (0, null, null);
Query OK, 1 row affected (0.21 sec)

mysql> show partitions from range_table_nullable;
+-------------+-----------------+----------------+---------------------+--------+--------------+----------------------------------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+----------+------------+-------------------------+-----------+--------------------+--------------+
| PartitionId | PartitionName | VisibleVersion | VisibleVersionTime | State | PartitionKey | Range | DistributionKey | Buckets | ReplicationNum | StorageMedium | CooldownTime | RemoteStoragePolicy | LastConsistencyCheckTime | DataSize | IsInMemory | ReplicaAllocation | IsMutable | SyncWithBaseTables | UnsyncTables |
+-------------+-----------------+----------------+---------------------+--------+--------------+----------------------------------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+----------+------------+-------------------------+-----------+--------------------+--------------+
| 457060 | p00000101000000 | 2 | 2024-03-25 03:01:38 | NORMAL | k2 | [types: [DATETIMEV2]; keys: [0000-01-01 00:00:00]; ..types: [DATETIMEV2]; keys: [0000-01-02 00:00:00]; ) | k1 | 16 | 1 | HDD | 9999-12-31 23:59:59 | | NULL | 0.000 | false | tag.location.default: 1 | true | true | NULL |
+-------------+-----------------+----------------+---------------------+--------+--------------+----------------------------------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+----------+------------+-------------------------+-----------+--------------------+--------------+
1 row in set (0.09 sec)
```

## 场景示例

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,8 @@ PARTITION BY RANGE(`date`)
(
PARTITION `p201701` VALUES LESS THAN ("2017-02-01"),
PARTITION `p201702` VALUES LESS THAN ("2017-03-01"),
PARTITION `p201703` VALUES LESS THAN ("2017-04-01")
PARTITION `p201703` VALUES LESS THAN ("2017-04-01"),
PARTITION `p2018` VALUES [("2018-01-01"), ("2019-01-01"))
)
DISTRIBUTED BY HASH(`user_id`) BUCKETS 16
PROPERTIES
Expand Down Expand Up @@ -382,6 +383,84 @@ Doris 支持两层的数据划分。第一层是 Partition,支持 Range 和 Li

用户也可以不使用复合分区,即使用单分区。则数据只做 HASH 分布。

#### NULL 分区

PARTITION 列默认必须为 NOT NULL 列,如果需要使用 NULL 列,应设置 session variable `allow_partition_column_nullable = true`。对于 LIST PARTITION,我们支持真正的 NULL 分区。对于 RANGE PARTITION,NULL 值会被划归**最小的 LESS THAN 分区**。分列如下:

1. LIST 分区

```sql
mysql> create table null_list(
-> k0 varchar null
-> )
-> partition by list (k0)
-> (
-> PARTITION pX values in ((NULL))
-> )
-> DISTRIBUTED BY HASH(`k0`) BUCKETS 1
-> properties("replication_num" = "1");
Query OK, 0 rows affected (0.11 sec)

mysql> insert into null_list values (null);
Query OK, 1 row affected (0.19 sec)

mysql> select * from null_list;
+------+
| k0 |
+------+
| NULL |
+------+
1 row in set (0.18 sec)
```

2. RANGE 分区 —— 归属最小的 LESS THAN 分区

```sql
mysql> create table null_range(
-> k0 int null
-> )
-> partition by range (k0)
-> (
-> PARTITION p10 values less than (10),
-> PARTITION p100 values less than (100),
-> PARTITION pMAX values less than (maxvalue)
-> )
-> DISTRIBUTED BY HASH(`k0`) BUCKETS 1
-> properties("replication_num" = "1");
Query OK, 0 rows affected (0.12 sec)

mysql> insert into null_range values (null);
Query OK, 1 row affected (0.19 sec)

mysql> select * from null_range partition(p10);
+------+
| k0 |
+------+
| NULL |
+------+
1 row in set (0.18 sec)
```

3. RANGE 分区 —— 没有 LESS THAN 分区时,无法插入

```sql
mysql> create table null_range2(
-> k0 int null
-> )
-> partition by range (k0)
-> (
-> PARTITION p200 values [("100"), ("200"))
-> )
-> DISTRIBUTED BY HASH(`k0`) BUCKETS 1
-> properties("replication_num" = "1");
Query OK, 0 rows affected (0.13 sec)

mysql> insert into null_range2 values (null);
ERROR 5025 (HY000): Insert has filtered data in strict mode, tracking_url=......
```

自动分区对于 NULL 分区值的处理方式详述于其文档[对应部分](../advanced/partition/auto-partition/#null-值分区)。

### PROPERTIES

在建表语句的最后 PROPERTIES 中,关于PROPERTIES中可以设置的相关参数,我们可以查看[CREATE TABLE](../sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-TABLE.md)中查看详细的介绍。
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -817,7 +817,11 @@ Doris 中的表可以分为分区表和无分区的表。这个属性在建表

#### 动态分区

动态分区功能主要用于帮助用户自动的管理分区。通过设定一定的规则,Doris 系统定期增加新的分区或删除历史分区。可参阅 [动态分区](../../../../advanced/partition/dynamic-partition.md) 文档查看更多帮助。
动态分区功能主要用于帮助用户自动的管理分区。通过设定一定的规则,Doris 系统定期增加新的分区或删除历史分区。可参阅 [动态分区](../../../../advanced/partition/dynamic-partition) 文档查看更多帮助。

#### 自动分区

自动分区功能文档参见 [自动分区](../../../../advanced/partition/auto-partition)。

#### 物化视图

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,8 @@ PARTITION BY RANGE(`date`)
(
PARTITION `p201701` VALUES LESS THAN ("2017-02-01"),
PARTITION `p201702` VALUES LESS THAN ("2017-03-01"),
PARTITION `p201703` VALUES LESS THAN ("2017-04-01")
PARTITION `p201703` VALUES LESS THAN ("2017-04-01"),
PARTITION `p2018` VALUES [("2018-01-01"), ("2019-01-01"))
)
DISTRIBUTED BY HASH(`user_id`) BUCKETS 16
PROPERTIES
Expand Down
Loading