Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: RFC for TTL tables #39264

Merged
merged 43 commits into from
Mar 17, 2023
Merged
Changes from 11 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
1c327a1
ttl: add rfc for ttl
lcwangchao Nov 18, 2022
6cb2c91
update
lcwangchao Nov 20, 2022
91f78ac
update
lcwangchao Nov 20, 2022
02c30ac
update
lcwangchao Nov 20, 2022
e201649
update
lcwangchao Nov 20, 2022
25850fa
update
lcwangchao Nov 20, 2022
56f9da4
update
lcwangchao Nov 20, 2022
cb6b2d3
add issues
lcwangchao Nov 21, 2022
dbeeba5
Merge branch 'master' into ttl_doc
lcwangchao Nov 21, 2022
d2a9605
update
lcwangchao Nov 21, 2022
3868907
Merge branch 'ttl_doc' of github.com:lcwangchao/tidb into ttl_doc
lcwangchao Nov 21, 2022
bfe2ed7
update
lcwangchao Nov 21, 2022
4867da9
add sysvar `tidb_ttl_enable_instance_worker`
lcwangchao Nov 21, 2022
ce3fb33
Update docs/design/2022-11-17-ttl-table.md
lcwangchao Nov 22, 2022
501a330
Update docs/design/2022-11-17-ttl-table.md
lcwangchao Nov 22, 2022
c75ff9e
Update docs/design/2022-11-17-ttl-table.md
lcwangchao Nov 22, 2022
18fe079
update
lcwangchao Nov 22, 2022
e19989e
Update docs/design/2022-11-17-ttl-table.md
lcwangchao Nov 22, 2022
3511a2b
update
lcwangchao Nov 22, 2022
fa73868
Merge branch 'master' into ttl_doc
lcwangchao Nov 22, 2022
51b87b7
update remove ttl
lcwangchao Nov 22, 2022
29e3726
update document
lcwangchao Nov 23, 2022
d4196fa
update future works
lcwangchao Nov 25, 2022
49b388a
update table
lcwangchao Nov 25, 2022
a0137d3
update
lcwangchao Nov 25, 2022
c2f27da
update timezone
lcwangchao Nov 25, 2022
2d094f2
update
lcwangchao Nov 25, 2022
7200043
Update docs/design/2022-11-17-ttl-table.md
lcwangchao Nov 30, 2022
9301a17
Update docs/design/2022-11-17-ttl-table.md
lcwangchao Nov 30, 2022
3a759ff
update
lcwangchao Nov 30, 2022
4b5aa55
update
lcwangchao Dec 1, 2022
1e6585c
Merge branch 'master' into ttl_doc
bb7133 Dec 1, 2022
630b1e6
update doc
lcwangchao Dec 2, 2022
2020fac
Merge branch 'ttl_doc' of github.com:lcwangchao/tidb into ttl_doc
lcwangchao Dec 2, 2022
a290eea
add test plans
lcwangchao Dec 5, 2022
c0cbec0
update future works
lcwangchao Dec 6, 2022
ed0c3f8
update
lcwangchao Dec 6, 2022
41b45fb
update
lcwangchao Dec 8, 2022
4a0cf1e
update
lcwangchao Dec 12, 2022
f9b0515
update
lcwangchao Dec 13, 2022
88b0077
Merge branch 'master' into ttl_doc
lcwangchao Mar 17, 2023
a202c34
update
lcwangchao Mar 17, 2023
87c3465
update
lcwangchao Mar 17, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
320 changes: 320 additions & 0 deletions docs/design/2022-11-17-ttl-table.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,320 @@
# Proposal: Support TTL Table
- Author(s): [lcwangchao](https://github.com/lcwangchao)
- Tracking Issue: https://github.com/pingcap/tidb/issues/39262

## Table of Contents

<!-- TOC -->
* [Proposal: Support TTL Table](#proposal--support-ttl-table)
* [Table of Contents](#table-of-contents)
* [Introduction](#introduction)
* [Detailed Design](#detailed-design)
* [Syntax](#syntax)
* [Create TTL Table](#create-ttl-table)
* [Alter a Table with TTL](#alter-a-table-with-ttl)
* [Alter to a non-TTL Table](#alter-to-a-non-ttl-table)
* [Constraints](#constraints)
* [TTL Job Management](#ttl-job-management)
* [TTL Job Details](#ttl-job-details)
* [Scan Tasks](#scan-tasks)
* [Delete Tasks](#delete-tasks)
* [New System Variables](#new-system-variables)
* [New Metrics](#new-metrics)
* [Known Issues](#known-issues)
* [Future Works](#future-works)
* [Alternative Solutions](#alternative-solutions)
<!-- TOC -->

## Introduction

The rows in a TTL table will be deleted automatically when they are expired. It is useful for some scenes, for example, delete the expired verification codes. A TTL table will have a column with the type DATE/DATETIME/TIMESTAMP which will be compared with the current time, if the interval between them exceeds some threshold, the corresponding row will be deleted.
lcwangchao marked this conversation as resolved.
Show resolved Hide resolved

## Detailed Design

### Syntax

#### Create TTL Table

The following example shows how to create a TTL table. The column `create_at` is used to specify the creation time of the rows which will be deleted 3 months after that.

```sql
CREATE TABLE t1 (
id int PRIMARY KEY,
created_at TIMESTAMP
) TTL = `created_at` + INTERVAL 3 MONTH;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we pass -3 MONTH here?

btw, seem here should be "3 MONTHS", plural.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, I prefer Spanner's TTL Clause https://cloud.google.com/spanner/docs/ttl/working-with-ttl#postgresql
creating a TTL "ON" a column may be more straightforward.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we pass -3 MONTH here?

It is allowed currently, but meaningless in most times. If one row is created using the real time, it will be deleted after next job scheduled, so it equals to setting the expire interval to a very small value. If user sets a time in one row with a past time, it may be some different, but I don't know any scene requires it. Maybe we can disallow user to setting it a a negative value or just leave it alone because it will not bring some big problem.

btw, seem here should be "3 MONTHS", plural.

We are using an expression to parse the duration, so, it supports both 3 MONTH and 3 MONTHS. I can update it plural in doc to make it more natural

btw, I prefer Spanner's TTL Clause https://cloud.google.com/spanner/docs/ttl/working-with-ttl#postgresql
creating a TTL "ON" a column may be more straightforward.

I do not have a strong preference, PTAL @SunRunAway

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, I prefer Spanner's TTL Clause https://cloud.google.com/spanner/docs/ttl/working-with-ttl#postgresql
creating a TTL "ON" a column may be more straightforward.

I prefer keeping it as an expression syntax instead of a particular syntax because there may be more syntax extensions in the future if you don't have a strong opinion.

```

We can use another `TTL_ENABLE` option to disable/enable the TTL job for the table. For example:

```sql
CREATE TABLE t1 (
id int PRIMARY KEY,
created_at TIMESTAMP
) TTL = `created_at` + INTERVAL 3 MONTH TTL_ENABLE = 'OFF';
```

The above table will not delete expired rows automatically because `TTL_ENABLE` is set to `OFF`. When the `TTL_ENABLE` is omitted, it uses value `ON` by default.

To make it compatible with mysql, TTL options also support the comment format. For example:

```sql
CREATE TABLE t1 (
id int PRIMARY KEY,
created_at TIMESTAMP
) /*T![ttl] TTL = `created_at` + INTERVAL 3 MONTH */;
lcwangchao marked this conversation as resolved.
Show resolved Hide resolved
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the doc.
Just to confirm one thing: this will work even for virtual column?
Imagine if we have an existing table like below with a lot of data, and created_at is unix timestamp in nanosecond.
CREATE TABLE t1 (
id int PRIMARY KEY,
created_at long
)

We dont want to modify existing schema to add a 'DateTime' or 'Date' field, which may cause a lot of existing data to be rewritten.
So I just want to confirm below way can work as expected:
ALTER TABLE t1 ADD column timestamp_ttl bigint AS (FROM_UNIXTIME(created_at/1000000000)) VIRTUAL;
ALTER TABLE t1 ADD TTL = 'timestamp_ttl + INTERVAL 180 DAYS';

If so, can you please help document that (how to enable that on existing tables using long in a light way(without rewriting data))?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it supports generated column with some limitations. Because currently generated column can not be pushed down, so it make take more performance, we'll optimize it in the future. It is documented: https://github.com/pingcap/tidb/pull/39264/files#diff-3199b62f6b171f846c9c0c1670428817d339e0e2ee2b2a35822df8f532c51e58R202

Copy link

@Yanhan0507 Yanhan0507 Dec 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick response!
Just curious how is pushdown involved here, isn't this just some micro batch scan and delete statements?
If there is indeed a performance overhead even for this scan+delete approach, is it possible to just support column with the type of LONG (unix timestamp)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lcwangchao
Could you add a section about "generated column" in this document to elaborate clearly?
We can talk about the compatibility about the generated column and the solution to optimize the generated column calculation.


#### Alter a Table with TTL

We can alter an exist table with TTL options, for example:

```sql
ALTER TABLE t1 TTL = `created_at` + INTERVAL 3 MONTH;
```

We should allow to update the existed TTL options. When it is updated, the running background job for this table should stop or restart according to the newest settings.

#### Alter to a non-TTL Table

If we want to remove a table's TTL options, we can just do:

```sql
ALTER TABLE t1 NO_TTL;
lcwangchao marked this conversation as resolved.
Show resolved Hide resolved
```

morgo marked this conversation as resolved.
Show resolved Hide resolved
#### Constraints

- TTL does NOT work on a table that is referenced by a foreign key. For example, you cannot add TTL to the parent table, because it is referenced by a foreign key in the child table and deleting parent table could violate this constraint.
lcwangchao marked this conversation as resolved.
Show resolved Hide resolved

### TTL Job Management

We use a SQL-layer approach to delete expired rows. The "SQL-layer approach" means that the background jobs are using SQL protol to scan and delete rows. It is simple to implement and has a good compatibility with the tools such as BR and TiCDC.

In the current design, we'll schedule a job for each TTL table when needed. We will try to schedule the jobs from different tables to different TiDB nodes to reduce the performance affect. For one physical table, the job will be running in one TiDB node and the partition table will be recognized as several physical tables, so there can be multiple jobs running in different TiDB nodes at one time.

The TTL table status will be record in a new system table `mysql.tidb_ttl_table_status` with the definition:

```sql
CREATE TABLE `tidb_ttl_table_status` (
`table_id` bigint(64) PRIMARY KEY,
`last_job_id` varchar(64) DEFAULT NULL,
`last_job_start_time` timestamp NULL DEFAULT NULL,
`last_job_finish_time` timestamp NULL DEFAULT NULL,
`last_job_ttl_expire` timestamp NULL DEFAULT NULL,
`current_job_id` varchar(64) DEFAULT NULL,
`current_job_owner_id` varchar(64) DEFAULT NULL,
`current_job_owner_addr` varchar(256) DEFAULT NULL,
`current_job_owner_hb_time` timestamp,
`current_job_start_time` timestamp NULL DEFAULT NULL,
`current_job_ttl_expire` timestamp NULL DEFAULT NULL,
`current_job_state` text DEFAULT NULL,
`current_job_status` varchar(64) DEFAULT NULL
);
```

It stores some information for each TTL table. The fields prefix `last_job_` present the information of the last job which is successfully executed, and the fields with prefix `current_job_` present the current job which has not been finished yet.

The explanation of the fields:

- `table_id`: The id of the TTL table. If the table is a partitioned table, it stands for the physical table id of each partition.
- `last_job_id`: The id of the last success job.
- `last_job_start_time`: The start time of the last job.
- `last_job_finish_time`: The finish time of the last job.
- `last_job_ttl_expire`: The expired time used by the last job for TTL works.
- `current_job_id`: The id of the current job that is not finished. It not only includes the running job, but also includes the job that is failed or cancelled by user.
- `current_job_owner_id`: The id of the owner (a TiDB node) that runs the job.
- `current_job_owner_addr`: The network address of the owner that runs the job.
- `current_job_owner_hb_time`: The owner of the job updates this field with the current timestamp periodically. If it is not updated for a long time, it means the previous owner is offline and this job should be taken over by other node later.
- `current_job_start_time`: The start time of the current job.
- `current_job_ttl_expire`: The expired time used by the current job for TTL works.
- `current_job_state`: Some inner state for the current job. It can be used for the job's fail over.
- `current_job_status`: A enum with one of values: running, cancelling, cancelled, error

TTL job for each table runs periodically according to the configuration of the system variable `tidb_ttl_job_run_interval` . For example, if we configure `set @@global.tidb_ttl_job_run_interval='1h'`, the cluster will schedule TTL jobs for each table every one hour to delete expired rows.

If you want to cancel a running job, you can execute a statement like this:

```sql
ADMIN CANCEL TTL JOB 123456789
```

In the above example, the status of the TTL job with ID `123456789` will first become `cancelling` and then finally be updated to `cancelled`.

### TTL Job Details
lcwangchao marked this conversation as resolved.
Show resolved Hide resolved

TiDB schedules TTL jobs to delete expired rows. One job is related to a TTL table and runs in one TiDB node. One TiDB node servers multiple workers where tasks from the TTL jobs are running.

A running job contains two kinds of tasks: scan tasks and delete tasks. Scan tasks are used to filter out expired rows from the table and then send them to delete tasks which will do delete operations in batch. When all expired rows are deleted, the job will be finished. Let's talk about scan and delete tasks in detail.

#### Scan Tasks

When a job starts to run, it first splits the table to N (N >= 1) ranges according to their primary key. Each range will be assigned to a scan task and each scan task performs a range scan for the specified range. The pseudocode below shows how it works:
lcwangchao marked this conversation as resolved.
Show resolved Hide resolved

```
func doScanTask(tbl, range, expire, ch) {
var lastRow
for {
selectSQL := buildSelect(tbl, range lastRow, expire, LIMIT)
rows := execute(selectSQL)
ch <- deleteTask{tbl, expire, rows}
if len(rows) < LIMIT {
break
}
lastRow := rows[len(rows)-1]
}
}
```

As we see above, it builds some select queries in a loop. The first query is built like this:

```sql
SELECT id FROM t1
WHERE create_time < '2022-01-01 00:00:00' AND id >= 12345 AND id < 45678
ORDER BY id ASC
LIMIT 500;
```

In above example, the expired time is '2022-01-01 00:00:00' which is computed when the job started and the key range for the current scan task is `[12345, 45678)`. We also limit the max count of return rows as 500 which you can set the system variable `tidb_ttl_scan_batch_size` to change it.

For most cases, we cannot get all expired rows in one query. So if the return row count equals to the limit we set, that means there are still some rows not read yet. Support the latest id we just queried is `23456`, we should schedule the next query like this:
lcwangchao marked this conversation as resolved.
Show resolved Hide resolved

```sql
SELECT id FROM t1
WHERE create_time < '2022-01-01 00:00:00' AND id >= 23456 AND id < 45678
ORDER BY id ASC
LIMIT 500;
```

The only difference with the first query is that the second one uses `23456` as the read start to skip the rows just read. This procedure will continue until all expired records are read.

The expired rows will be wrapped as `deleteTask` and then be sent to the delete workers simultaneously. Before we talk about delete works, there are still some things we should mention:

- As we see, the scan operation is heavy because it scans the whole table. For a large table, it is recommended to set the system variable `tidb_ttl_job_run_interval` to a longer value to reduce the resource cost in one day.
- When a table has no primary key or its primary key is not clustered, we'll use the hidden column `_tidb_rowid` instead as the row id.
- Though we support generated column as the TTL time column, it is not efficient because TiDB currently cannot push down a generated column's condition to TiKV side, the TiDB has to do the filter works that requires more network traffics and CPU times.

#### Delete Tasks

There are several delete workers running in one TiDB node, and they consume tasks sent from the scan phase to delete expired rows. The following pseudocode shows how it works:

```
func doDelTask(ch) {
for _, task := range ch {
batches := splitRowsToDeleteBatches(task.rows)
for _, batch := range batches {
deleteBatch(task.tbl, task.batch, task.expire)
}
}
}
lcwangchao marked this conversation as resolved.
Show resolved Hide resolved
```

A delete worker receives tasks from a chain and then splits the rows in it to several batches according to the system variable `tidb_ttl_delete_batch_size`. After that, each batch will be deleted by a `DELETE` query one by one. For example:
lcwangchao marked this conversation as resolved.
Show resolved Hide resolved
lcwangchao marked this conversation as resolved.
Show resolved Hide resolved

```sql
DELETE FROM t
WHERE id in (1, 2, 3, ...) AND create_time < '2022-01-01 00:00:00';
```

Notice that we are still using the condition `create_time < '2022-01-01 00:00:00'` to avoid deleting some "not expired" rows by mistake. For example, a row is expired when scanning, but updated to a not expired value before deleting it.

If there is no secondary index in a TTL table, we can assume that most of the delete operations can do a commit with 1PC. That is because the incoming rows are in the same region in most cases. So it is recommended NOT to create any secondary index in a TTL table.

### New System Variables

Some new system variables will be introduced:

- `tidb_ttl_job_pause`
- When this variable is `ON`, the cluster will stop to schedule TTL jobs and the running jobs will be cancelled.
- Scope: Global
- Values: [ON, OFF]
- Default: OFF

- `tidb_ttl_job_run_interval`
- The schedule interval between two jobs for one TTL table
- Scope: Global
- Range: [10m0s, 8760h0m0s]
- Default: 1h

- `tidb_ttl_scan_worker_count`
- The count of the scan workers in each TiDB node
- Scope: Global
- Range: [1, 1024]
lcwangchao marked this conversation as resolved.
Show resolved Hide resolved
- Default: 4

- `tidb_ttl_scan_batch_size`
- The limit value of each SELECT query in scan task
- Scope: Global
- Range: [1, 10240]
- Default: 500

- `tidb_ttl_delete_worker_count`
- The count of the delete workers in each TiDB node
- Scope: Global
- Range: [1, 1024]
lcwangchao marked this conversation as resolved.
Show resolved Hide resolved
- Default: 4

- `tidb_ttl_delete_batch_size`
- The batch size in one delete query when deleting expired rows.
- Scope: Global
- Range: [0, 10240]
- Default: 100

- `tidb_ttl_delete_rate_limit`
- The rate limit of the delete operations in each TiDB node. 0 is for no limit
- Scope: Global
- Range: [0, MaxInt64]
- Default: 0

### New Metrics

We'll introduce some new metrics to monitor the TTL jobs:

- `ttl_select_queries`
- The total count of select queries in TTL jobs
- Type: Counter
- Labels: table

- `ttl_select_expire_rows`
- The total count of expired rows selected in TTL jobs
- Type: Counter
- Labels: table

- `ttl_select_duration`
- The duration of the select queries in TTL jobs
- Type: Histogram

- `ttl_delete_queries`
- The total count of delete queries in TTL jobs
- Type: Counter
- Labels: table

- `ttl_delete_expire_rows`
- The total count of expired rows deleted in TTL jobs
- Type: Counter
- Labels: table

- `ttl_delete_duration`
- The duration of the delete queries in TTL jobs
- Type: Histogram

## Known Issues

- Though the TTL jobs from different table runs distributively. However, one job from a table runs in a single TiDB node. If a table is very large, there may be a bottleneck.
- Currently, the condition of generated column cannot be pushed down to TiKV. If a table uses generated column as a TTL time column, the filter will be performed in TiDB side. It brings some necessary network traffics and makes the query slow.

## Future Works

- Schedule the scan tasks from one jobs to different nodes. This work will take full advantage of the cluster resource, especially for big tables.
- If an index with the prefix of the TTL time column exists, we can use it to query expire rows instead of scanning the full table. It will reduce the execution time of the scan tasks.
- Support TTL table as a parent table referred by a child table with 'ON DELETE CASCADE'. When some rows in a TTL table are deleted, the related rows in child table will be deleted too.
- Support pushing down generated column condition to TiKV side. Or use the definition of generated column to construct condition directly.
- Scan the table from TiFlash (if table has any TiFlash replica) instead of TiKV to reduce performance effect on TP business.
lcwangchao marked this conversation as resolved.
Show resolved Hide resolved

## Alternative Solutions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be helpful to put this toward the top so readers know, right away, that we've considered these options.

morgo marked this conversation as resolved.
Show resolved Hide resolved

TiKV supports TTL on RawKV, so we may get question that whether we can implement TiDB TTL in same concept. IMO, this method has some downsides:
- Not SQL awareness.
- Customer may want to use any column as TTL Column, and may want to use Generated Column to convert other column type(JSON, varchar) to a DATETIME column as TTL Column.
- No future foreign key support.
- Not table awareness: TTL on RawKV is kv-level, can't easily set TTL on/off by Table
- Not compatible with CDC, Backup, Secondary Index