From 1c327a1716ea2ee72a7e8276086cfbc5e676148d Mon Sep 17 00:00:00 2001
From: Chao Wang <cclcwangchao@hotmail.com>
Date: Fri, 18 Nov 2022 19:12:41 +0800
Subject: [PATCH 01/37] ttl: add rfc for ttl

---
 docs/design/2022-11-17-ttl-table.md | 325 ++++++++++++++++++++++++++++
 1 file changed, 325 insertions(+)
 create mode 100644 docs/design/2022-11-17-ttl-table.md

diff --git a/docs/design/2022-11-17-ttl-table.md b/docs/design/2022-11-17-ttl-table.md
new file mode 100644
index 0000000000000..e42927c2d2245
--- /dev/null
+++ b/docs/design/2022-11-17-ttl-table.md
@@ -0,0 +1,325 @@
+# Proposal: Support TTL Table
+- Author(s): [lcwangchao](https://github.com/lcwangchao)
+
+## Table of Contents
+
+<!-- TOC -->
+* [Proposal: Support TTL Table](#proposal--support-ttl-table)
+  * [Table of Contents](#table-of-contents)
+  * [Introduction](#introduction)
+  * [Detailed Design](#detailed-design)
+    * [Syntax](#syntax)
+      * [Create TTL Table](#create-ttl-table)
+      * [Alter a Table with TTL](#alter-a-table-with-ttl)
+      * [Alter to a non-TTL Table](#alter-to-a-non-ttl-table)
+      * [Constraints](#constraints)
+    * [TTL Job Management](#ttl-job-management)
+    * [TTL Job Details](#ttl-job-details)
+      * [Scan Task](#scan-task)
+      * [Delete Tasks](#delete-tasks)
+    * [New System Variables](#new-system-variables)
+    * [New Metrics](#new-metrics)
+  * [Known Issues](#known-issues)
+  * [Future Works](#future-works)
+  * [Alternative Solutions](#alternative-solutions)
+<!-- TOC -->
+
+## Introduction
+
+The rows in a TTL table will be deleted automatically when they are expired. It is useful for some scenes, for example, delete the expired verification codes which are used for mobile verifications. A TTL table will have a column with the type DATE/DATETIME/TIMESTAMP, and it will be compared with the current time, if the interval between them exceeds some threshold, the corresponding row will be deleted.
+
+## Detailed Design
+
+### Syntax
+
+#### Create TTL Table
+
+The following example shows how to create a TTL table. The column `create_at` is used by TTL to identify the creation time of the rows which will be deleted after 3 months after created.
+
+```sql
+CREATE TABLE t1 (
+    id int PRIMARY KEY,
+    created_at TIMESTAMP
+) TTL = `created_at` + INTERVAL 3 MONTH;
+```
+
+We can use another `TTL_ENABLE` option to disable/enable the TTL feature for the table. For example:
+
+```sql
+CREATE TABLE t1 (
+    id int PRIMARY KEY,
+    created_at TIMESTAMP
+) TTL = `created_at` + INTERVAL 3 MONTH TTL_ENABLE = 'OFF';
+```
+
+The above table will not delete expired rows automatically because `TTL_ENABLE` is set to `OFF`. When the `TTL_ENABLE` is omitted, it uses the `ON` value by default.
+
+To make it compatible with mysql, TTL options also support comment format. For example:
+
+```sql
+CREATE TABLE t1 (
+    id int PRIMARY KEY,
+    created_at TIMESTAMP
+) /*T![ttl] TTL = `created_at` + INTERVAL 3 MONTH */;
+```
+
+#### Alter a Table with TTL
+
+We can alter an exist table with TTL options, for example:
+
+```sql
+ALTER TABLE t1 TTL = `created_at` + INTERVAL 3 MONTH;
+```
+
+OR
+
+```sql
+ALTER TABLE t1 TTL_ENABLE = 'OFF';
+```
+
+We should allow to alter a TTL table with some new options. When a table's TTL option changed, the running background job for this table should stop or restart according to the newest settings.
+
+#### Alter to a non-TTL Table
+
+If we want to remove the TTL option in a table, we can just do:
+
+```sql
+ALTER TABLE t1 NO_TTL;
+```
+
+#### Constraints
+
+- TTL does NOT work on a table that is referenced by a foreign key. For example, you cannot add TTL to the parent table, because it is referenced by a foreign key in the child table and deleting parent table could violate this constraint.
+
+### TTL Job Management
+
+We use a SQL-layer approach to delete expired rows. The "SQL-layer approach" means that the background jobs are using SQL protol to scan or delete rows. It is simple to implement and should have a good compatibility with the tools such as BR and TiCDC.
+
+In the current design, we'll schedule a job for each TTL table when needed. We will try to schedule the jobs from different tables to different TiDB nodes to reduce the performance affect and for one table, the job will be running in one TiDB node. The partition table will be recognized as several physical tables, so there can be multiple jobs running in different TiDB nodes for it.
+
+The TTL table status will be record in a new system table `mysql.tidb_ttl_table_status` with the definition:
+
+```sql
+CREATE TABLE `tidb_ttl_table_status` (
+  `table_id` bigint(64) PRIMARY KEY,
+  `last_job_id` varchar(64) DEFAULT NULL,
+  `last_job_start_time` timestamp NULL DEFAULT NULL,
+  `last_job_finish_time` timestamp NULL DEFAULT NULL,
+  `last_job_ttl_expire` timestamp NULL DEFAULT NULL,
+  `current_job_id` varchar(64) DEFAULT NULL,
+  `current_job_owner_id` varchar(64) DEFAULT NULL,
+  `current_job_owner_hb_time` timestamp,
+  `current_job_start_time` timestamp NULL DEFAULT NULL,
+  `current_job_ttl_expire` timestamp NULL DEFAULT NULL,
+  `current_job_state` text DEFAULT NULL,
+  `current_job_status` varchar(64) DEFAULT NULL
+);
+```
+
+It stores some TTL job information for each TTL table. The fields with prefix `last_job_` present the information of the last job which is successfully executed, and the fields with prefix `current_job_` present the current job which has not been finished yet.
+
+The explanation of the fields:
+
+- `table_id`: The id of the TTL table. If the table is a partitioned table, it stands for the physical table id of each partition.
+- `last_job_id`: The job id of last successfully job.
+- `last_job_start_time`: The start time of last job.
+- `last_job_finish_time`: The finish time of last job.
+- `last_job_ttl_expire`: The expired time used by the last job for TTL works.
+- `current_job_id`: The id of the current job that is not finished. It not only includes the running job, but also includes the job that is failed or cancelled by user.
+- `current_job_owner_id`: The id of the owner (a TiDB node) that runs the job. When it is NULL, that means the previous owner yield this job on its own and this job has not been taken over by other TiDB yet.
+- `current_job_owner_hb_time`: The owner of the job updates this field with the current timestamp periodically. If it is not updated for a long time, it means the previous owner is offline and this job should be taken over by other node later. 
+- `current_job_start_time`: The start time of the current job.
+- `current_job_ttl_expire`: The expired time used by the current job for TTL works.
+- `current_job_state`: Some inner state for the current job. It can be used for the job's fail over.
+- `current_job_status`: A enum with one of values: running, cancelling, cancelled, error
+
+TTL job for each table runs periodically according to the configuration of the system variable `tidb_ttl_job_run_interval` . For example, if we configure `set @@global.tidb_ttl_job_run_interval='1h'`, the cluster will schedule TTL jobs for each table every one hour to delete expired rows.
+
+If you want to cancel a running job, you can execute a statement like this:
+
+```sql
+ADMIN CANCEL TTL JOB 123456789
+```
+
+In the above example, the status of TTL job with ID `123456789` will first become `cancelling` and then updated to `cancelled` finally.
+
+### TTL Job Details
+
+TiDB schedules TTL jobs to delete expired rows. One job is related to a TTL table and runs in one TiDB node. One TiDB node servers multiple workers where tasks from the TTL jobs are running. 
+
+A running job contains two kinds of tasks: scan task and delete task. Currently, we have one scan task and several delete tasks in one job.
+
+#### Scan Task
+
+Scan task runs in a scan worker. It scans the full table to find out all expired rows. The pseudocode below shows how it works:
+
+```
+func doScanTask(tbl, expire, ch) {
+    var lastRow
+    for {
+        selectSQL := buildSelect(tbl, lastRow, expire, LIMIT)
+        rows := execute(selectSQL)
+        ch <- deleteTask{tbl, expire, rows}
+        if len(rows) < LIMIT {
+            break
+        }
+        lastRow := rows[len(rows)-1]
+    }
+}
+```
+
+As we see above, it builds some select queries in a loop. The first query is built like below:
+
+```sql
+SELECT id FROM t1
+WHERE create_time < '2022-01-01 00:00:00'
+ORDER BY id ASC
+LIMIT 500;
+```
+
+The `id` in above example is the primary key of the table. We use a condition `create_time < '2022-01-01 00:00:00'` to filter out some expired rows. The value `2022-01-01 00:00:00` is computed before the scan task starts and all the rows created before it will be seen as 'expired'. We also use `LIMIT 500` to limit the max rows returned in one query, the limit value of the query is read from the system variable `tidb_ttl_scan_batch_size`. 
+
+If the row count from the above query equals to the limit, we should schedule a next query to read the following rows. For example:
+
+```sql
+SELECT id FROM t1
+WHERE create_time < '2022-01-01 00:00:00' AND id > 123456
+ORDER BY id ASC
+LIMIT 500;
+```
+
+Different with the first query, in the second example, we add an extra condition `id < 123456`. `123456` is the largest row id fetched from the last query. Using this condition, we can scan the table from the position last query end without scanning table from the beginning. The scan task will execute the select queries continuously until all the expired rows are fetched.
+
+Once the query is returned, a delete task including expired rows will be sent to the delete workers.
+
+Some other descriptions:
+
+- As we see, scan task is heavy because it performs a table full scan. For a large table, it is recommended to set `tidb_ttl_job_run_interval` as a longer interval.
+- When a table has no primary key or its primary key is not clustered, we'll use the hidden column `_tidb_rowid` instead of the primary key as the row id.
+- Though we support generated column as the TTL's time column, it is not efficient because TiDB currently cannot push down a generated column's condition to TiKV side, the TiDB has to do the filter works that requires more network traffics and CPU times.
+
+#### Delete Tasks
+
+A delete worker consumes the delete tasks from the chan. The following pseudocode shows it works:
+
+```
+func doDelTask(ch) {
+  for _, task := range ch {
+	    batches := splitRowsToDeleteBatches(task.rows)
+        for _, batch := range batches {
+            deleteBatch(task.tbl, task.batch, task.expire)
+        }
+  }
+}
+```
+
+The delete worker splits the received rows to several batches according to the system variable `tidb_ttl_delete_batch_size` and then delete the batches one by one with a `DELETE` query. For example:
+
+```sql
+DELETE FROM t
+WHERE id in (1, 2, 3, ...) AND create_time < '2022-01-01 00:00:00';
+```
+
+Notice that we are still using the condition `create_time < '2022-01-01 00:00:00'` to avoid deleting some "not expired" rows by mistake. For example, a row is expired when scanning, but updated to a not expired value before deleting it.
+
+Some other descriptions:
+
+- If there is no secondary index in a TTL table, we can assume that most of the delete operations can do a commit with 1PC. That is because the incoming rows are in the same region in most cases. So it is recommended NOT to create any secondary index in a TTL table.
+
+### New System Variables
+
+Some new system variables will be introduced:
+
+- `tidb_ttl_job_pause`
+  - When this variable is `ON`, the cluster will stop to schedule TTL jobs and the running jobs will be cancelled.
+  - Scope: Global
+  - Values: [ON, OFF]
+  - Default: OFF
+
+- `tidb_ttl_job_run_interval`
+  - The schedule interval between two jobs for one TTL table
+  - Scope: Global
+  - Range: [10m0s, 8760h0m0s]
+  - Default: 1h
+
+- `tidb_ttl_scan_worker_count`
+  - The worker count for the scan tasks in each TiDB node
+  - Scope: Global
+  - Range: [1, 1024]
+  - Default: 1
+
+- `tidb_ttl_scan_batch_size`
+  - The limit value of each SELECT query in scan task
+  - Scope: Global
+  - Range: [1, 10240]
+  - Default: 500
+
+- `tidb_ttl_delete_worker_count`
+  - The worker count for the delete tasks in each TiDB node
+  - Scope: Global
+  - Range: [1, 1024]
+  - Default: 4
+
+- `tidb_ttl_delete_batch_size`
+    - The batch size in one delete query when deleting expired rows
+    - Scope: Global
+    - Range: [1, 10240]
+    - Default: 500
+
+- `tidb_ttl_delete_rate_limit`
+  - The rate limit of the delete operations in each TiDB node. 0 is for no limit
+  - Scope: Global
+  - Range: [0, MaxInt64]
+  - Default: 0
+
+### New Metrics
+
+We'll introduce some new metrics to monitor the TTL jobs:
+
+- `ttl_select_queries`
+  - The total count of select queries in TTL jobs
+  - Type: Counter
+  - Labels: table
+
+- `ttl_select_expire_rows`
+  - The total count of expired rows selected in TTL jobs
+  - Type: Counter
+  - Labels: table
+
+- `ttl_select_duration`
+  - The duration of the select queries in TTL jobs
+  - Type: Histogram
+
+- `ttl_delete_queries`
+  - The total count of delete queries in TTL jobs
+  - Type: Counter
+  - Labels: table
+
+- `ttl_delete_expire_rows`
+  - The total count of expired rows deleted in TTL jobs
+  - Type: Counter
+  - Labels: table
+
+- `ttl_delete_duration`
+  - The duration of the delete queries in TTL jobs
+  - Type: Histogram
+
+## Known Issues
+
+- The TTL works for one table stays in a single TiDB node. It will cause a hotspot if the table is very large.
+- Currently, the condition of generated column cannot be pushed down to TiKV. If a table uses generated column as a TTL time column, the filter will be performed in TiDB side. It brings some necessary network traffics and makes the query slow.
+
+## Future Works
+
+- Split the scan task by table ranges and schedule them to different nodes in the cluster. This work will take full advantage of the cluster resource, especially for big tables.
+- If an index with the prefix of the TTL time column exists, we can use it to query expire rows instead of scanning the full table. It will reduce the execution time of the scan tasks.  
+- Support TTL table as a parent table referred by a child table by 'ON DELETE CASCADE'. When rows in a TTL table are deleted, the related rows in child table will be deleted too.
+
+## Alternative Solutions
+
+TiKV supports TTL on RawKV, so we may get question that whether we can implement TiDB TTL in same concept. IMO, this method has some downsides:
+  - Not SQL awareness.
+  - Customer may want to use any column as TTL Column, and may want to use Generated Column to convert other column type(JSON, varchar) to a DATETIME column as TTL Column.
+  - No future foreign key support.
+  - Not table awareness: TTL on RawKV is kv-level, can't easily set TTL on/off by Table
+  - Not compatible with CDC, Backup, Secondary Index

From 6cb2c91e6aae442905c3f9b824ccb20b25031b76 Mon Sep 17 00:00:00 2001
From: Chao Wang <cclcwangchao@hotmail.com>
Date: Sun, 20 Nov 2022 10:24:05 +0800
Subject: [PATCH 02/37] update

---
 docs/design/2022-11-17-ttl-table.md | 91 +++++++++++++----------------
 1 file changed, 41 insertions(+), 50 deletions(-)

diff --git a/docs/design/2022-11-17-ttl-table.md b/docs/design/2022-11-17-ttl-table.md
index e42927c2d2245..16b4b15884a4d 100644
--- a/docs/design/2022-11-17-ttl-table.md
+++ b/docs/design/2022-11-17-ttl-table.md
@@ -26,7 +26,7 @@
 
 ## Introduction
 
-The rows in a TTL table will be deleted automatically when they are expired. It is useful for some scenes, for example, delete the expired verification codes which are used for mobile verifications. A TTL table will have a column with the type DATE/DATETIME/TIMESTAMP, and it will be compared with the current time, if the interval between them exceeds some threshold, the corresponding row will be deleted.
+The rows in a TTL table will be deleted automatically when they are expired. It is useful for some scenes, for example, delete the expired verification codes. A TTL table will have a column with the type DATE/DATETIME/TIMESTAMP which will be compared with the current time, if the interval between them exceeds some threshold, the corresponding row will be deleted.
 
 ## Detailed Design
 
@@ -34,7 +34,7 @@ The rows in a TTL table will be deleted automatically when they are expired. It
 
 #### Create TTL Table
 
-The following example shows how to create a TTL table. The column `create_at` is used by TTL to identify the creation time of the rows which will be deleted after 3 months after created.
+The following example shows how to create a TTL table. The column `create_at` is used to specify the creation time of the rows which will be deleted 3 months after that.
 
 ```sql
 CREATE TABLE t1 (
@@ -43,7 +43,7 @@ CREATE TABLE t1 (
 ) TTL = `created_at` + INTERVAL 3 MONTH;
 ```
 
-We can use another `TTL_ENABLE` option to disable/enable the TTL feature for the table. For example:
+We can use another `TTL_ENABLE` option to disable/enable the TTL job for the table. For example:
 
 ```sql
 CREATE TABLE t1 (
@@ -52,9 +52,9 @@ CREATE TABLE t1 (
 ) TTL = `created_at` + INTERVAL 3 MONTH TTL_ENABLE = 'OFF';
 ```
 
-The above table will not delete expired rows automatically because `TTL_ENABLE` is set to `OFF`. When the `TTL_ENABLE` is omitted, it uses the `ON` value by default.
+The above table will not delete expired rows automatically because `TTL_ENABLE` is set to `OFF`. When the `TTL_ENABLE` is omitted, it uses value `ON` by default.
 
-To make it compatible with mysql, TTL options also support comment format. For example:
+To make it compatible with mysql, TTL options also support the comment format. For example:
 
 ```sql
 CREATE TABLE t1 (
@@ -71,17 +71,11 @@ We can alter an exist table with TTL options, for example:
 ALTER TABLE t1 TTL = `created_at` + INTERVAL 3 MONTH;
 ```
 
-OR
-
-```sql
-ALTER TABLE t1 TTL_ENABLE = 'OFF';
-```
-
-We should allow to alter a TTL table with some new options. When a table's TTL option changed, the running background job for this table should stop or restart according to the newest settings.
+We should allow to update the existed TTL options. When it is updated, the running background job for this table should stop or restart according to the newest settings.
 
 #### Alter to a non-TTL Table
 
-If we want to remove the TTL option in a table, we can just do:
+If we want to remove a table's TTL options, we can just do:
 
 ```sql
 ALTER TABLE t1 NO_TTL;
@@ -93,9 +87,9 @@ ALTER TABLE t1 NO_TTL;
 
 ### TTL Job Management
 
-We use a SQL-layer approach to delete expired rows. The "SQL-layer approach" means that the background jobs are using SQL protol to scan or delete rows. It is simple to implement and should have a good compatibility with the tools such as BR and TiCDC.
+We use a SQL-layer approach to delete expired rows. The "SQL-layer approach" means that the background jobs are using SQL protol to scan and delete rows. It is simple to implement and has a good compatibility with the tools such as BR and TiCDC.
 
-In the current design, we'll schedule a job for each TTL table when needed. We will try to schedule the jobs from different tables to different TiDB nodes to reduce the performance affect and for one table, the job will be running in one TiDB node. The partition table will be recognized as several physical tables, so there can be multiple jobs running in different TiDB nodes for it.
+In the current design, we'll schedule a job for each TTL table when needed. We will try to schedule the jobs from different tables to different TiDB nodes to reduce the performance affect. For one physical table, the job will be running in one TiDB node and the partition table will be recognized as several physical tables, so there can be multiple jobs running in different TiDB nodes at one time.
 
 The TTL table status will be record in a new system table `mysql.tidb_ttl_table_status` with the definition:
 
@@ -116,17 +110,17 @@ CREATE TABLE `tidb_ttl_table_status` (
 );
 ```
 
-It stores some TTL job information for each TTL table. The fields with prefix `last_job_` present the information of the last job which is successfully executed, and the fields with prefix `current_job_` present the current job which has not been finished yet.
+It stores some information for each TTL table. The fields prefix `last_job_` present the information of the last job which is successfully executed, and the fields with prefix `current_job_` present the current job which has not been finished yet.
 
 The explanation of the fields:
 
 - `table_id`: The id of the TTL table. If the table is a partitioned table, it stands for the physical table id of each partition.
-- `last_job_id`: The job id of last successfully job.
-- `last_job_start_time`: The start time of last job.
-- `last_job_finish_time`: The finish time of last job.
+- `last_job_id`: The id of the last success job.
+- `last_job_start_time`: The start time of the last job.
+- `last_job_finish_time`: The finish time of the last job.
 - `last_job_ttl_expire`: The expired time used by the last job for TTL works.
 - `current_job_id`: The id of the current job that is not finished. It not only includes the running job, but also includes the job that is failed or cancelled by user.
-- `current_job_owner_id`: The id of the owner (a TiDB node) that runs the job. When it is NULL, that means the previous owner yield this job on its own and this job has not been taken over by other TiDB yet.
+- `current_job_owner_id`: The id of the owner (a TiDB node) that runs the job. 
 - `current_job_owner_hb_time`: The owner of the job updates this field with the current timestamp periodically. If it is not updated for a long time, it means the previous owner is offline and this job should be taken over by other node later. 
 - `current_job_start_time`: The start time of the current job.
 - `current_job_ttl_expire`: The expired time used by the current job for TTL works.
@@ -141,23 +135,23 @@ If you want to cancel a running job, you can execute a statement like this:
 ADMIN CANCEL TTL JOB 123456789
 ```
 
-In the above example, the status of TTL job with ID `123456789` will first become `cancelling` and then updated to `cancelled` finally.
+In the above example, the status of the TTL job with ID `123456789` will first become `cancelling` and then finally be updated to `cancelled`.
 
 ### TTL Job Details
 
 TiDB schedules TTL jobs to delete expired rows. One job is related to a TTL table and runs in one TiDB node. One TiDB node servers multiple workers where tasks from the TTL jobs are running. 
 
-A running job contains two kinds of tasks: scan task and delete task. Currently, we have one scan task and several delete tasks in one job.
+A running job contains two kinds of tasks: scan tasks and delete tasks. Scan tasks are used to filter out expired rows from the table and then send them to delete tasks which will do delete operations in batch. When all expired rows are deleted, the job will be finished. Let's talk about scan and delete tasks in detail.
 
 #### Scan Task
 
-Scan task runs in a scan worker. It scans the full table to find out all expired rows. The pseudocode below shows how it works:
+When a job starts to run, it first splits the table to N (N >= 1) ranges according to their primary key. Each range will be assigned to a scan task and each scan task performs a range scan for the specified range. The pseudocode below shows how it works:
 
 ```
-func doScanTask(tbl, expire, ch) {
+func doScanTask(tbl, range, expire, ch) {
     var lastRow
     for {
-        selectSQL := buildSelect(tbl, lastRow, expire, LIMIT)
+        selectSQL := buildSelect(tbl, range lastRow, expire, LIMIT)
         rows := execute(selectSQL)
         ch <- deleteTask{tbl, expire, rows}
         if len(rows) < LIMIT {
@@ -168,39 +162,37 @@ func doScanTask(tbl, expire, ch) {
 }
 ```
 
-As we see above, it builds some select queries in a loop. The first query is built like below:
+As we see above, it builds some select queries in a loop. The first query is built like this:
 
 ```sql
 SELECT id FROM t1
-WHERE create_time < '2022-01-01 00:00:00'
+WHERE create_time < '2022-01-01 00:00:00' AND id >= 12345 AND id < 45678
 ORDER BY id ASC
 LIMIT 500;
 ```
 
-The `id` in above example is the primary key of the table. We use a condition `create_time < '2022-01-01 00:00:00'` to filter out some expired rows. The value `2022-01-01 00:00:00` is computed before the scan task starts and all the rows created before it will be seen as 'expired'. We also use `LIMIT 500` to limit the max rows returned in one query, the limit value of the query is read from the system variable `tidb_ttl_scan_batch_size`. 
+In above example, the expired time is '2022-01-01 00:00:00' which is computed when the job started and the key range for the current scan task is `[12345, 45678)`. We also limit the max count of return rows as 500 which you can set the system variable `tidb_ttl_scan_batch_size` to change it. 
 
-If the row count from the above query equals to the limit, we should schedule a next query to read the following rows. For example:
+For most cases, we cannot get all expired rows in one query. So if the return row count equals to the limit we set, that means there are still some rows not read yet. Support the latest id we just queried is `23456`, we should schedule the next query like this:
 
 ```sql
 SELECT id FROM t1
-WHERE create_time < '2022-01-01 00:00:00' AND id > 123456
+WHERE create_time < '2022-01-01 00:00:00' AND id >= 23456 AND id < 45678
 ORDER BY id ASC
 LIMIT 500;
 ```
 
-Different with the first query, in the second example, we add an extra condition `id < 123456`. `123456` is the largest row id fetched from the last query. Using this condition, we can scan the table from the position last query end without scanning table from the beginning. The scan task will execute the select queries continuously until all the expired rows are fetched.
+The only difference with the first query is that the second one uses `23456` as the read start to skip the rows just read. This procedure will continue until all expired records are read.
 
-Once the query is returned, a delete task including expired rows will be sent to the delete workers.
+The expired rows will be wrapped as `deleteTask` and then be sent to the delete workers simultaneously. Before we talk about delete works, there are still some things we should mention:
 
-Some other descriptions:
-
-- As we see, scan task is heavy because it performs a table full scan. For a large table, it is recommended to set `tidb_ttl_job_run_interval` as a longer interval.
-- When a table has no primary key or its primary key is not clustered, we'll use the hidden column `_tidb_rowid` instead of the primary key as the row id.
-- Though we support generated column as the TTL's time column, it is not efficient because TiDB currently cannot push down a generated column's condition to TiKV side, the TiDB has to do the filter works that requires more network traffics and CPU times.
+- As we see, the scan operation is heavy because it scans the whole table. For a large table, it is recommended to set the system variable `tidb_ttl_job_run_interval` to a longer value to reduce the resource cost in one day.
+- When a table has no primary key or its primary key is not clustered, we'll use the hidden column `_tidb_rowid` instead as the row id.
+- Though we support generated column as the TTL time column, it is not efficient because TiDB currently cannot push down a generated column's condition to TiKV side, the TiDB has to do the filter works that requires more network traffics and CPU times.
 
 #### Delete Tasks
 
-A delete worker consumes the delete tasks from the chan. The following pseudocode shows it works:
+There are several delete workers running in one TiDB node, and they consume tasks sent from the scan phase to delete expired rows. The following pseudocode shows how it works:
 
 ```
 func doDelTask(ch) {
@@ -213,7 +205,7 @@ func doDelTask(ch) {
 }
 ```
 
-The delete worker splits the received rows to several batches according to the system variable `tidb_ttl_delete_batch_size` and then delete the batches one by one with a `DELETE` query. For example:
+A delete worker receives tasks from a chain and then splits the rows in it to several batches according to the system variable `tidb_ttl_delete_batch_size`. After that, each batch will be deleted by a `DELETE` query one by one. For example:
 
 ```sql
 DELETE FROM t
@@ -222,9 +214,7 @@ WHERE id in (1, 2, 3, ...) AND create_time < '2022-01-01 00:00:00';
 
 Notice that we are still using the condition `create_time < '2022-01-01 00:00:00'` to avoid deleting some "not expired" rows by mistake. For example, a row is expired when scanning, but updated to a not expired value before deleting it.
 
-Some other descriptions:
-
-- If there is no secondary index in a TTL table, we can assume that most of the delete operations can do a commit with 1PC. That is because the incoming rows are in the same region in most cases. So it is recommended NOT to create any secondary index in a TTL table.
+If there is no secondary index in a TTL table, we can assume that most of the delete operations can do a commit with 1PC. That is because the incoming rows are in the same region in most cases. So it is recommended NOT to create any secondary index in a TTL table.
 
 ### New System Variables
 
@@ -243,10 +233,10 @@ Some new system variables will be introduced:
   - Default: 1h
 
 - `tidb_ttl_scan_worker_count`
-  - The worker count for the scan tasks in each TiDB node
+  - The count of the scan workers in each TiDB node
   - Scope: Global
   - Range: [1, 1024]
-  - Default: 1
+  - Default: 4
 
 - `tidb_ttl_scan_batch_size`
   - The limit value of each SELECT query in scan task
@@ -255,15 +245,15 @@ Some new system variables will be introduced:
   - Default: 500
 
 - `tidb_ttl_delete_worker_count`
-  - The worker count for the delete tasks in each TiDB node
+  - The count of the delete workers in each TiDB node
   - Scope: Global
   - Range: [1, 1024]
   - Default: 4
 
 - `tidb_ttl_delete_batch_size`
-    - The batch size in one delete query when deleting expired rows
+    - The batch size in one delete query when deleting expired rows. 0 is for no limit will be set.
     - Scope: Global
-    - Range: [1, 10240]
+    - Range: [0, MaxInt64]
     - Default: 500
 
 - `tidb_ttl_delete_rate_limit`
@@ -306,14 +296,15 @@ We'll introduce some new metrics to monitor the TTL jobs:
 
 ## Known Issues
 
-- The TTL works for one table stays in a single TiDB node. It will cause a hotspot if the table is very large.
+- Though the TTL jobs from different table runs distributively. However, one job from a table runs in a single TiDB node. If a table is very large, there may be a bottleneck.
 - Currently, the condition of generated column cannot be pushed down to TiKV. If a table uses generated column as a TTL time column, the filter will be performed in TiDB side. It brings some necessary network traffics and makes the query slow.
 
 ## Future Works
 
-- Split the scan task by table ranges and schedule them to different nodes in the cluster. This work will take full advantage of the cluster resource, especially for big tables.
+- Schedule the scan tasks from one jobs to different nodes. This work will take full advantage of the cluster resource, especially for big tables.
 - If an index with the prefix of the TTL time column exists, we can use it to query expire rows instead of scanning the full table. It will reduce the execution time of the scan tasks.  
-- Support TTL table as a parent table referred by a child table by 'ON DELETE CASCADE'. When rows in a TTL table are deleted, the related rows in child table will be deleted too.
+- Support TTL table as a parent table referred by a child table with 'ON DELETE CASCADE'. When some rows in a TTL table are deleted, the related rows in child table will be deleted too.
+- Support pushing down generated column condition to TiKV side.
 
 ## Alternative Solutions
 

From 91f78acc5f0fe17f0dc98e5f8c1df4cbb72c1800 Mon Sep 17 00:00:00 2001
From: Chao Wang <cclcwangchao@hotmail.com>
Date: Sun, 20 Nov 2022 10:34:12 +0800
Subject: [PATCH 03/37] update

---
 docs/design/2022-11-17-ttl-table.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/design/2022-11-17-ttl-table.md b/docs/design/2022-11-17-ttl-table.md
index 16b4b15884a4d..fd62c684de39d 100644
--- a/docs/design/2022-11-17-ttl-table.md
+++ b/docs/design/2022-11-17-ttl-table.md
@@ -304,7 +304,7 @@ We'll introduce some new metrics to monitor the TTL jobs:
 - Schedule the scan tasks from one jobs to different nodes. This work will take full advantage of the cluster resource, especially for big tables.
 - If an index with the prefix of the TTL time column exists, we can use it to query expire rows instead of scanning the full table. It will reduce the execution time of the scan tasks.  
 - Support TTL table as a parent table referred by a child table with 'ON DELETE CASCADE'. When some rows in a TTL table are deleted, the related rows in child table will be deleted too.
-- Support pushing down generated column condition to TiKV side.
+- Support pushing down generated column condition to TiKV side. Or use the definition of generated column to construct condition directly.
 
 ## Alternative Solutions
 

From 02c30acb37421acf658efe01fa03be17c9472fb7 Mon Sep 17 00:00:00 2001
From: Chao Wang <cclcwangchao@hotmail.com>
Date: Sun, 20 Nov 2022 10:43:07 +0800
Subject: [PATCH 04/37] update

---
 docs/design/2022-11-17-ttl-table.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/docs/design/2022-11-17-ttl-table.md b/docs/design/2022-11-17-ttl-table.md
index fd62c684de39d..7abbd68966baf 100644
--- a/docs/design/2022-11-17-ttl-table.md
+++ b/docs/design/2022-11-17-ttl-table.md
@@ -305,6 +305,7 @@ We'll introduce some new metrics to monitor the TTL jobs:
 - If an index with the prefix of the TTL time column exists, we can use it to query expire rows instead of scanning the full table. It will reduce the execution time of the scan tasks.  
 - Support TTL table as a parent table referred by a child table with 'ON DELETE CASCADE'. When some rows in a TTL table are deleted, the related rows in child table will be deleted too.
 - Support pushing down generated column condition to TiKV side. Or use the definition of generated column to construct condition directly.
+- Scan the table from TiFlash (if table has any TiFlash replica) instead of TiKV to reduce performance effect on TP business.
 
 ## Alternative Solutions
 

From e201649db500eeb4357ab0119c1170cb640165b3 Mon Sep 17 00:00:00 2001
From: Chao Wang <cclcwangchao@hotmail.com>
Date: Sun, 20 Nov 2022 11:01:19 +0800
Subject: [PATCH 05/37] update

---
 docs/design/2022-11-17-ttl-table.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/docs/design/2022-11-17-ttl-table.md b/docs/design/2022-11-17-ttl-table.md
index 7abbd68966baf..09a6c12e96626 100644
--- a/docs/design/2022-11-17-ttl-table.md
+++ b/docs/design/2022-11-17-ttl-table.md
@@ -102,6 +102,7 @@ CREATE TABLE `tidb_ttl_table_status` (
   `last_job_ttl_expire` timestamp NULL DEFAULT NULL,
   `current_job_id` varchar(64) DEFAULT NULL,
   `current_job_owner_id` varchar(64) DEFAULT NULL,
+  `current_job_owner_addr` varchar(256) DEFAULT NULL,
   `current_job_owner_hb_time` timestamp,
   `current_job_start_time` timestamp NULL DEFAULT NULL,
   `current_job_ttl_expire` timestamp NULL DEFAULT NULL,
@@ -121,6 +122,7 @@ The explanation of the fields:
 - `last_job_ttl_expire`: The expired time used by the last job for TTL works.
 - `current_job_id`: The id of the current job that is not finished. It not only includes the running job, but also includes the job that is failed or cancelled by user.
 - `current_job_owner_id`: The id of the owner (a TiDB node) that runs the job. 
+- `current_job_owner_addr`: The network address of the owner that runs the job.
 - `current_job_owner_hb_time`: The owner of the job updates this field with the current timestamp periodically. If it is not updated for a long time, it means the previous owner is offline and this job should be taken over by other node later. 
 - `current_job_start_time`: The start time of the current job.
 - `current_job_ttl_expire`: The expired time used by the current job for TTL works.

From 25850fa3ad540ed36c8306364f284c4c567d35df Mon Sep 17 00:00:00 2001
From: Chao Wang <cclcwangchao@hotmail.com>
Date: Sun, 20 Nov 2022 11:23:56 +0800
Subject: [PATCH 06/37] update

---
 docs/design/2022-11-17-ttl-table.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/design/2022-11-17-ttl-table.md b/docs/design/2022-11-17-ttl-table.md
index 09a6c12e96626..ca3fbffcf849b 100644
--- a/docs/design/2022-11-17-ttl-table.md
+++ b/docs/design/2022-11-17-ttl-table.md
@@ -253,9 +253,9 @@ Some new system variables will be introduced:
   - Default: 4
 
 - `tidb_ttl_delete_batch_size`
-    - The batch size in one delete query when deleting expired rows. 0 is for no limit will be set.
+    - The batch size in one delete query when deleting expired rows.
     - Scope: Global
-    - Range: [0, MaxInt64]
+    - Range: [0, 10240]
     - Default: 500
 
 - `tidb_ttl_delete_rate_limit`

From 56f9da4dfc104a644502173e1474b37369bf3b63 Mon Sep 17 00:00:00 2001
From: Chao Wang <cclcwangchao@hotmail.com>
Date: Sun, 20 Nov 2022 11:24:24 +0800
Subject: [PATCH 07/37] update

---
 docs/design/2022-11-17-ttl-table.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/design/2022-11-17-ttl-table.md b/docs/design/2022-11-17-ttl-table.md
index ca3fbffcf849b..41114462f7079 100644
--- a/docs/design/2022-11-17-ttl-table.md
+++ b/docs/design/2022-11-17-ttl-table.md
@@ -256,7 +256,7 @@ Some new system variables will be introduced:
     - The batch size in one delete query when deleting expired rows.
     - Scope: Global
     - Range: [0, 10240]
-    - Default: 500
+    - Default: 100
 
 - `tidb_ttl_delete_rate_limit`
   - The rate limit of the delete operations in each TiDB node. 0 is for no limit

From cb6b2d3b60f6db3905634ae5f9a6788894bbeae8 Mon Sep 17 00:00:00 2001
From: Chao Wang <cclcwangchao@hotmail.com>
Date: Mon, 21 Nov 2022 09:44:53 +0800
Subject: [PATCH 08/37] add issues

---
 docs/design/2022-11-17-ttl-table.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/docs/design/2022-11-17-ttl-table.md b/docs/design/2022-11-17-ttl-table.md
index 41114462f7079..ac41d9bfa5357 100644
--- a/docs/design/2022-11-17-ttl-table.md
+++ b/docs/design/2022-11-17-ttl-table.md
@@ -1,5 +1,6 @@
 # Proposal: Support TTL Table
 - Author(s): [lcwangchao](https://github.com/lcwangchao)
+- Tracking Issue: https://github.com/pingcap/tidb/issues/39262
 
 ## Table of Contents
 

From d2a960558ed8170636eaf73eb81b23b03a973091 Mon Sep 17 00:00:00 2001
From: Chao Wang <cclcwangchao@hotmail.com>
Date: Mon, 21 Nov 2022 09:47:33 +0800
Subject: [PATCH 09/37] update

---
 docs/design/2022-11-17-ttl-table.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/design/2022-11-17-ttl-table.md b/docs/design/2022-11-17-ttl-table.md
index ac41d9bfa5357..d232b4c79343c 100644
--- a/docs/design/2022-11-17-ttl-table.md
+++ b/docs/design/2022-11-17-ttl-table.md
@@ -16,7 +16,7 @@
       * [Constraints](#constraints)
     * [TTL Job Management](#ttl-job-management)
     * [TTL Job Details](#ttl-job-details)
-      * [Scan Task](#scan-task)
+      * [Scan Tasks](#scan-tasks)
       * [Delete Tasks](#delete-tasks)
     * [New System Variables](#new-system-variables)
     * [New Metrics](#new-metrics)
@@ -146,7 +146,7 @@ TiDB schedules TTL jobs to delete expired rows. One job is related to a TTL tabl
 
 A running job contains two kinds of tasks: scan tasks and delete tasks. Scan tasks are used to filter out expired rows from the table and then send them to delete tasks which will do delete operations in batch. When all expired rows are deleted, the job will be finished. Let's talk about scan and delete tasks in detail.
 
-#### Scan Task
+#### Scan Tasks
 
 When a job starts to run, it first splits the table to N (N >= 1) ranges according to their primary key. Each range will be assigned to a scan task and each scan task performs a range scan for the specified range. The pseudocode below shows how it works:
 

From bfe2ed7e21305647a1df8a798ffd3fd4c1d3d722 Mon Sep 17 00:00:00 2001
From: Chao Wang <cclcwangchao@hotmail.com>
Date: Mon, 21 Nov 2022 11:43:47 +0800
Subject: [PATCH 10/37] update

---
 docs/design/2022-11-17-ttl-table.md | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/docs/design/2022-11-17-ttl-table.md b/docs/design/2022-11-17-ttl-table.md
index d232b4c79343c..28ceb52c6be64 100644
--- a/docs/design/2022-11-17-ttl-table.md
+++ b/docs/design/2022-11-17-ttl-table.md
@@ -254,10 +254,10 @@ Some new system variables will be introduced:
   - Default: 4
 
 - `tidb_ttl_delete_batch_size`
-    - The batch size in one delete query when deleting expired rows.
-    - Scope: Global
-    - Range: [0, 10240]
-    - Default: 100
+  - The batch size in one delete query when deleting expired rows.
+  - Scope: Global
+  - Range: [0, 10240]
+  - Default: 100
 
 - `tidb_ttl_delete_rate_limit`
   - The rate limit of the delete operations in each TiDB node. 0 is for no limit
@@ -309,6 +309,7 @@ We'll introduce some new metrics to monitor the TTL jobs:
 - Support TTL table as a parent table referred by a child table with 'ON DELETE CASCADE'. When some rows in a TTL table are deleted, the related rows in child table will be deleted too.
 - Support pushing down generated column condition to TiKV side. Or use the definition of generated column to construct condition directly.
 - Scan the table from TiFlash (if table has any TiFlash replica) instead of TiKV to reduce performance effect on TP business.
+- Dynamically adjust the runtime settings according to the current overhead of the cluster.
 
 ## Alternative Solutions
 

From 4867da9ad63192fb9cb7c0584f0d915cbedf28da Mon Sep 17 00:00:00 2001
From: Chao Wang <cclcwangchao@hotmail.com>
Date: Mon, 21 Nov 2022 13:09:36 +0800
Subject: [PATCH 11/37] add sysvar `tidb_ttl_enable_instance_worker`

---
 docs/design/2022-11-17-ttl-table.md | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/docs/design/2022-11-17-ttl-table.md b/docs/design/2022-11-17-ttl-table.md
index 28ceb52c6be64..e93f08de6ecfd 100644
--- a/docs/design/2022-11-17-ttl-table.md
+++ b/docs/design/2022-11-17-ttl-table.md
@@ -265,6 +265,12 @@ Some new system variables will be introduced:
   - Range: [0, MaxInt64]
   - Default: 0
 
+- `tidb_ttl_enable_instance_worker`
+  - Whether to start TTL workers in the current instance or not.
+  - Scope: Instance
+  - Values: [ON, OFF]
+  - Default: ON
+
 ### New Metrics
 
 We'll introduce some new metrics to monitor the TTL jobs:

From ce3fb336c3b3cbd11d609611cb054133b29e077f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E7=8E=8B=E8=B6=85?= <cclcwangchao@hotmail.com>
Date: Tue, 22 Nov 2022 19:17:46 +0800
Subject: [PATCH 12/37] Update docs/design/2022-11-17-ttl-table.md

Co-authored-by: Mattias Jonsson <mjonss@users.noreply.github.com>
---
 docs/design/2022-11-17-ttl-table.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/design/2022-11-17-ttl-table.md b/docs/design/2022-11-17-ttl-table.md
index e93f08de6ecfd..6e4d0375e741c 100644
--- a/docs/design/2022-11-17-ttl-table.md
+++ b/docs/design/2022-11-17-ttl-table.md
@@ -176,7 +176,7 @@ LIMIT 500;
 
 In above example, the expired time is '2022-01-01 00:00:00' which is computed when the job started and the key range for the current scan task is `[12345, 45678)`. We also limit the max count of return rows as 500 which you can set the system variable `tidb_ttl_scan_batch_size` to change it. 
 
-For most cases, we cannot get all expired rows in one query. So if the return row count equals to the limit we set, that means there are still some rows not read yet. Support the latest id we just queried is `23456`, we should schedule the next query like this:
+For most cases, we cannot get all expired rows in one query. So if the return row count equals to the limit we set, that means there are still some rows not read yet. Suppose the latest id we just queried is `23456`, we should schedule the next query like this:
 
 ```sql
 SELECT id FROM t1

From 501a330eb2f2906b2e368a418e75eb96db9c6be0 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E7=8E=8B=E8=B6=85?= <cclcwangchao@hotmail.com>
Date: Tue, 22 Nov 2022 19:18:11 +0800
Subject: [PATCH 13/37] Update docs/design/2022-11-17-ttl-table.md

Co-authored-by: Mattias Jonsson <mjonss@users.noreply.github.com>
---
 docs/design/2022-11-17-ttl-table.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/design/2022-11-17-ttl-table.md b/docs/design/2022-11-17-ttl-table.md
index 6e4d0375e741c..f71ae85966ed3 100644
--- a/docs/design/2022-11-17-ttl-table.md
+++ b/docs/design/2022-11-17-ttl-table.md
@@ -208,7 +208,7 @@ func doDelTask(ch) {
 }
 ```
 
-A delete worker receives tasks from a chain and then splits the rows in it to several batches according to the system variable `tidb_ttl_delete_batch_size`. After that, each batch will be deleted by a `DELETE` query one by one. For example:
+A delete worker receives tasks from a chain and then splits the rows in it to several batches according to the system variable `tidb_ttl_delete_batch_size`. After that, each batch expired rows will be deleted by a multi row `DELETE` query. For example:
 
 ```sql
 DELETE FROM t

From c75ff9ee6e7bb09103db851723615bbbc46fd780 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E7=8E=8B=E8=B6=85?= <cclcwangchao@hotmail.com>
Date: Tue, 22 Nov 2022 19:18:27 +0800
Subject: [PATCH 14/37] Update docs/design/2022-11-17-ttl-table.md

Co-authored-by: Mattias Jonsson <mjonss@users.noreply.github.com>
---
 docs/design/2022-11-17-ttl-table.md | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/docs/design/2022-11-17-ttl-table.md b/docs/design/2022-11-17-ttl-table.md
index f71ae85966ed3..037d7c67580df 100644
--- a/docs/design/2022-11-17-ttl-table.md
+++ b/docs/design/2022-11-17-ttl-table.md
@@ -199,12 +199,12 @@ There are several delete workers running in one TiDB node, and they consume task
 
 ```
 func doDelTask(ch) {
-  for _, task := range ch {
-	    batches := splitRowsToDeleteBatches(task.rows)
-        for _, batch := range batches {
-            deleteBatch(task.tbl, task.batch, task.expire)
-        }
-  }
+	for _, task := range ch {
+		batches := splitRowsToDeleteBatches(task.rows)
+		for _, batch := range batches {
+			deleteBatch(task.tbl, task.batch, task.expire)
+		}
+	}
 }
 ```
 

From 18fe079fef75f5c7ceac0396b5187096634cf1b5 Mon Sep 17 00:00:00 2001
From: Chao Wang <cclcwangchao@hotmail.com>
Date: Tue, 22 Nov 2022 19:20:30 +0800
Subject: [PATCH 15/37] update

---
 docs/design/2022-11-17-ttl-table.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/design/2022-11-17-ttl-table.md b/docs/design/2022-11-17-ttl-table.md
index 037d7c67580df..ac6b27815e0b5 100644
--- a/docs/design/2022-11-17-ttl-table.md
+++ b/docs/design/2022-11-17-ttl-table.md
@@ -61,7 +61,7 @@ To make it compatible with mysql, TTL options also support the comment format. F
 CREATE TABLE t1 (
     id int PRIMARY KEY,
     created_at TIMESTAMP
-) /*T![ttl] TTL = `created_at` + INTERVAL 3 MONTH */;
+) /*T![ttl] TTL = `created_at` + INTERVAL 3 MONTH TTL_ENABLE = 'OFF'*/;
 ```
 
 #### Alter a Table with TTL

From e19989e01a7e8cff94bef068d01628a4e59ed008 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E7=8E=8B=E8=B6=85?= <cclcwangchao@hotmail.com>
Date: Tue, 22 Nov 2022 19:43:01 +0800
Subject: [PATCH 16/37] Update docs/design/2022-11-17-ttl-table.md

Co-authored-by: Mattias Jonsson <mjonss@users.noreply.github.com>
---
 docs/design/2022-11-17-ttl-table.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/design/2022-11-17-ttl-table.md b/docs/design/2022-11-17-ttl-table.md
index ac6b27815e0b5..1308792262aed 100644
--- a/docs/design/2022-11-17-ttl-table.md
+++ b/docs/design/2022-11-17-ttl-table.md
@@ -84,7 +84,7 @@ ALTER TABLE t1 NO_TTL;
 
 #### Constraints
 
-- TTL does NOT work on a table that is referenced by a foreign key. For example, you cannot add TTL to the parent table, because it is referenced by a foreign key in the child table and deleting parent table could violate this constraint.
+- TTL does NOT work on a table that is referenced by a foreign key. For example, you cannot add TTL to the parent table, because it is referenced by a foreign key in the child table and deleting a row from the parent table could violate this constraint.
 
 ### TTL Job Management
 

From 3511a2ba0c57c250bdb25e301cc5e7c634fc20d1 Mon Sep 17 00:00:00 2001
From: Chao Wang <cclcwangchao@hotmail.com>
Date: Tue, 22 Nov 2022 19:51:28 +0800
Subject: [PATCH 17/37] update

---
 docs/design/2022-11-17-ttl-table.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/design/2022-11-17-ttl-table.md b/docs/design/2022-11-17-ttl-table.md
index 1308792262aed..0bbece2bd7303 100644
--- a/docs/design/2022-11-17-ttl-table.md
+++ b/docs/design/2022-11-17-ttl-table.md
@@ -208,7 +208,7 @@ func doDelTask(ch) {
 }
 ```
 
-A delete worker receives tasks from a chain and then splits the rows in it to several batches according to the system variable `tidb_ttl_delete_batch_size`. After that, each batch expired rows will be deleted by a multi row `DELETE` query. For example:
+A delete worker receives tasks from a channel (a chan object in goland) and then splits the rows in it to several batches according to the system variable `tidb_ttl_delete_batch_size`. After that, each batch expired rows will be deleted by a multi row `DELETE` query. For example:
 
 ```sql
 DELETE FROM t

From 51b87b7c18069be74af3b98b7bc6fdfbf1e10a52 Mon Sep 17 00:00:00 2001
From: Chao Wang <cclcwangchao@hotmail.com>
Date: Tue, 22 Nov 2022 20:21:10 +0800
Subject: [PATCH 18/37] update remove ttl

---
 docs/design/2022-11-17-ttl-table.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/design/2022-11-17-ttl-table.md b/docs/design/2022-11-17-ttl-table.md
index 0bbece2bd7303..9e3d98ea6e463 100644
--- a/docs/design/2022-11-17-ttl-table.md
+++ b/docs/design/2022-11-17-ttl-table.md
@@ -79,7 +79,7 @@ We should allow to update the existed TTL options. When it is updated, the runni
 If we want to remove a table's TTL options, we can just do:
 
 ```sql
-ALTER TABLE t1 NO_TTL;
+ALTER TABLE t1 REMOVE TTL;
 ```
 
 #### Constraints

From 29e372627e8ce3a9dd8206f2f73efaf4dd5dd5f5 Mon Sep 17 00:00:00 2001
From: Chao Wang <cclcwangchao@hotmail.com>
Date: Wed, 23 Nov 2022 15:59:54 +0800
Subject: [PATCH 19/37] update document

---
 docs/design/2022-11-17-ttl-table.md | 59 ++++++++++++++++++-----------
 1 file changed, 37 insertions(+), 22 deletions(-)

diff --git a/docs/design/2022-11-17-ttl-table.md b/docs/design/2022-11-17-ttl-table.md
index 9e3d98ea6e463..cbc9e6bfc2cc1 100644
--- a/docs/design/2022-11-17-ttl-table.md
+++ b/docs/design/2022-11-17-ttl-table.md
@@ -101,6 +101,7 @@ CREATE TABLE `tidb_ttl_table_status` (
   `last_job_start_time` timestamp NULL DEFAULT NULL,
   `last_job_finish_time` timestamp NULL DEFAULT NULL,
   `last_job_ttl_expire` timestamp NULL DEFAULT NULL,
+  `last_job_summary` text DEFAULT NULL,
   `current_job_id` varchar(64) DEFAULT NULL,
   `current_job_owner_id` varchar(64) DEFAULT NULL,
   `current_job_owner_addr` varchar(256) DEFAULT NULL,
@@ -108,7 +109,8 @@ CREATE TABLE `tidb_ttl_table_status` (
   `current_job_start_time` timestamp NULL DEFAULT NULL,
   `current_job_ttl_expire` timestamp NULL DEFAULT NULL,
   `current_job_state` text DEFAULT NULL,
-  `current_job_status` varchar(64) DEFAULT NULL
+  `current_job_status` varchar(64) DEFAULT NULL,
+  `current_job_status_update_time` timestamp NULL DEFAULT NULL
 );
 ```
 
@@ -121,6 +123,7 @@ The explanation of the fields:
 - `last_job_start_time`: The start time of the last job.
 - `last_job_finish_time`: The finish time of the last job.
 - `last_job_ttl_expire`: The expired time used by the last job for TTL works.
+- `last_job_summary`: The summary info for the last job.
 - `current_job_id`: The id of the current job that is not finished. It not only includes the running job, but also includes the job that is failed or cancelled by user.
 - `current_job_owner_id`: The id of the owner (a TiDB node) that runs the job. 
 - `current_job_owner_addr`: The network address of the owner that runs the job.
@@ -128,7 +131,8 @@ The explanation of the fields:
 - `current_job_start_time`: The start time of the current job.
 - `current_job_ttl_expire`: The expired time used by the current job for TTL works.
 - `current_job_state`: Some inner state for the current job. It can be used for the job's fail over.
-- `current_job_status`: A enum with one of values: running, cancelling, cancelled, error
+- `current_job_status`: A enum with one of values: waiting, running, cancelling, cancelled, error
+- `current_job_status_update_time`: The update time of the current status
 
 TTL job for each table runs periodically according to the configuration of the system variable `tidb_ttl_job_run_interval` . For example, if we configure `set @@global.tidb_ttl_job_run_interval='1h'`, the cluster will schedule TTL jobs for each table every one hour to delete expired rows.
 
@@ -223,11 +227,11 @@ If there is no secondary index in a TTL table, we can assume that most of the de
 
 Some new system variables will be introduced:
 
-- `tidb_ttl_job_pause`
-  - When this variable is `ON`, the cluster will stop to schedule TTL jobs and the running jobs will be cancelled.
+- `tidb_ttl_job_enable`
+  - When this variable is `OFF`, the cluster will stop to schedule TTL jobs and the running jobs will be cancelled.
   - Scope: Global
   - Values: [ON, OFF]
-  - Default: OFF
+  - Default: ON
 
 - `tidb_ttl_job_run_interval`
   - The schedule interval between two jobs for one TTL table
@@ -235,6 +239,18 @@ Some new system variables will be introduced:
   - Range: [10m0s, 8760h0m0s]
   - Default: 1h
 
+- `tidb_ttl_job_schedule_window_start_time`
+  - This variable is used to restrict the start time of the time window of scheduling the ttl jobs. 
+  - Scope: Global
+  - Type: Time
+  - Default: 00:00 +0000
+
+- `tidb_ttl_job_schedule_window_end_time`
+  - This variable is used to restrict the end time of the time window of scheduling the ttl jobs.
+  - Scope: Global
+  - Type: Time
+  - Default: 23:59 +0000
+
 - `tidb_ttl_scan_worker_count`
   - The count of the scan workers in each TiDB node
   - Scope: Global
@@ -275,33 +291,32 @@ Some new system variables will be introduced:
 
 We'll introduce some new metrics to monitor the TTL jobs:
 
-- `ttl_select_queries`
+- `ttl_queries`
   - The total count of select queries in TTL jobs
   - Type: Counter
-  - Labels: table
+  - Labels: type, table, result
 
-- `ttl_select_expire_rows`
-  - The total count of expired rows selected in TTL jobs
+- `ttl_processed_expired_rows`
+  - The total count of expired rows processed in TTL jobs
   - Type: Counter
-  - Labels: table
+  - Labels: type, table
 
-- `ttl_select_duration`
-  - The duration of the select queries in TTL jobs
+- `ttl_query_duration`
+  - The duration of the queries in TTL jobs
+  - Labels: type
   - Type: Histogram
 
-- `ttl_delete_queries`
-  - The total count of delete queries in TTL jobs
-  - Type: Counter
-  - Labels: table
+- `ttl_job_status`
+  - The status for the current TTL job. When the job is in the specified status, the value will 1, otherwise, it will be 0
+  - Labels: table, status
+  - Type: Gauge
 
-- `ttl_delete_expire_rows`
-  - The total count of expired rows deleted in TTL jobs
-  - Type: Counter
+- `ttl_job_scan_workers`
+  - The running scan workers count for the ttl jobs
   - Labels: table
+  - Type: Gauge
 
-- `ttl_delete_duration`
-  - The duration of the delete queries in TTL jobs
-  - Type: Histogram
+In the above metrics, the optional values for type label is 'select' and 'delete' and the optional values for result label is `success` and `error`
 
 ## Known Issues
 

From d4196fad8a61df148fdd9296cf713321ced1ebb3 Mon Sep 17 00:00:00 2001
From: Chao Wang <cclcwangchao@hotmail.com>
Date: Fri, 25 Nov 2022 12:12:18 +0800
Subject: [PATCH 20/37] update future works

---
 docs/design/2022-11-17-ttl-table.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/docs/design/2022-11-17-ttl-table.md b/docs/design/2022-11-17-ttl-table.md
index cbc9e6bfc2cc1..dba958e55a305 100644
--- a/docs/design/2022-11-17-ttl-table.md
+++ b/docs/design/2022-11-17-ttl-table.md
@@ -331,6 +331,7 @@ In the above metrics, the optional values for type label is 'select' and 'delete
 - Support pushing down generated column condition to TiKV side. Or use the definition of generated column to construct condition directly.
 - Scan the table from TiFlash (if table has any TiFlash replica) instead of TiKV to reduce performance effect on TP business.
 - Dynamically adjust the runtime settings according to the current overhead of the cluster.
+- Add some builtin alters for cloud environment.
 
 ## Alternative Solutions
 

From 49b388a01e311fc304f738cc53b4c1d30d1a0a27 Mon Sep 17 00:00:00 2001
From: Chao Wang <cclcwangchao@hotmail.com>
Date: Fri, 25 Nov 2022 12:17:43 +0800
Subject: [PATCH 21/37] update table

---
 docs/design/2022-11-17-ttl-table.md | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/docs/design/2022-11-17-ttl-table.md b/docs/design/2022-11-17-ttl-table.md
index dba958e55a305..a5031c7af79cd 100644
--- a/docs/design/2022-11-17-ttl-table.md
+++ b/docs/design/2022-11-17-ttl-table.md
@@ -97,6 +97,8 @@ The TTL table status will be record in a new system table `mysql.tidb_ttl_table_
 ```sql
 CREATE TABLE `tidb_ttl_table_status` (
   `table_id` bigint(64) PRIMARY KEY,
+  `parent_table_id` bigint(64),
+  `table_statistics` TEXT DEFAULT NULL,
   `last_job_id` varchar(64) DEFAULT NULL,
   `last_job_start_time` timestamp NULL DEFAULT NULL,
   `last_job_finish_time` timestamp NULL DEFAULT NULL,
@@ -119,6 +121,8 @@ It stores some information for each TTL table. The fields prefix `last_job_` pre
 The explanation of the fields:
 
 - `table_id`: The id of the TTL table. If the table is a partitioned table, it stands for the physical table id of each partition.
+- `parent_table_id`: If the current row is a for table partition, it is the table id of its parent table. Otherwise, it equals to `table_id`.
+- `table_statistics`: some statistics of the table.
 - `last_job_id`: The id of the last success job.
 - `last_job_start_time`: The start time of the last job.
 - `last_job_finish_time`: The finish time of the last job.

From a0137d3649c56e30ca7a7a62233dcc078856633b Mon Sep 17 00:00:00 2001
From: Chao Wang <cclcwangchao@hotmail.com>
Date: Fri, 25 Nov 2022 17:47:45 +0800
Subject: [PATCH 22/37] update

---
 docs/design/2022-11-17-ttl-table.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/design/2022-11-17-ttl-table.md b/docs/design/2022-11-17-ttl-table.md
index a5031c7af79cd..0bd13fa52fa88 100644
--- a/docs/design/2022-11-17-ttl-table.md
+++ b/docs/design/2022-11-17-ttl-table.md
@@ -225,7 +225,7 @@ WHERE id in (1, 2, 3, ...) AND create_time < '2022-01-01 00:00:00';
 
 Notice that we are still using the condition `create_time < '2022-01-01 00:00:00'` to avoid deleting some "not expired" rows by mistake. For example, a row is expired when scanning, but updated to a not expired value before deleting it.
 
-If there is no secondary index in a TTL table, we can assume that most of the delete operations can do a commit with 1PC. That is because the incoming rows are in the same region in most cases. So it is recommended NOT to create any secondary index in a TTL table.
+If there is no secondary index in a TTL table, we can assume that most of the delete operations can do a commit with 1PC. That is because the incoming rows are in the same region in most cases. So if you have a big table, keep no secondary indexes in this table may achieve a better performance for TTL jobs.
 
 ### New System Variables
 

From c2f27dac275a5c6502caf7a12edb88609f142bcf Mon Sep 17 00:00:00 2001
From: Chao Wang <cclcwangchao@hotmail.com>
Date: Fri, 25 Nov 2022 18:11:46 +0800
Subject: [PATCH 23/37] update timezone

---
 docs/design/2022-11-17-ttl-table.md | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/docs/design/2022-11-17-ttl-table.md b/docs/design/2022-11-17-ttl-table.md
index 0bd13fa52fa88..de8960701ccc4 100644
--- a/docs/design/2022-11-17-ttl-table.md
+++ b/docs/design/2022-11-17-ttl-table.md
@@ -227,6 +227,14 @@ Notice that we are still using the condition `create_time < '2022-01-01 00:00:00
 
 If there is no secondary index in a TTL table, we can assume that most of the delete operations can do a commit with 1PC. That is because the incoming rows are in the same region in most cases. So if you have a big table, keep no secondary indexes in this table may achieve a better performance for TTL jobs.
 
+#### Time Zone Consideration
+
+Currently, we support three field types as the time column for TTL: `Date`, `DateTime` and `TimeStamp` . However, for `Date` and `DateTime`, they are not an absolute time, and we need another "time zone" information to determine what their accurate time point is. We have two options to select the time zone:
+
+1. Use the system time zone which can be fetched from global variable `system_time_zone` and is determined when the cluster bootstraps. Because the `system_time_zone` will never changes, the row will be deleted at a certain time. However, it will have some problem when the user sets the cluster time zone `time_zone` to a different value other than `SYSTEM`. For example, if the `system_time_zone` is `+08:00` and `time_zone` is set to `UTC`, the record will be seen as expired immediately when the `tidb_ttl_job_run_interval` is less than 8 hours.
+
+2. Use the cluster time zone `time_zone`. It is fine when cluster time zone `time_zone` does not change. But if user changes the `time_zone` settings, the TTL job should be aware of it in time to forbid deleting some unexpected rows. In this case, whether one row is expired or not can not be determined without knowing the `time_zone` at that time.
+
 ### New System Variables
 
 Some new system variables will be introduced:

From 2d094f2c2a814f8db75afe1abf824a8d7b1d2e9b Mon Sep 17 00:00:00 2001
From: Chao Wang <cclcwangchao@hotmail.com>
Date: Fri, 25 Nov 2022 19:49:59 +0800
Subject: [PATCH 24/37] update

---
 docs/design/2022-11-17-ttl-table.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/docs/design/2022-11-17-ttl-table.md b/docs/design/2022-11-17-ttl-table.md
index de8960701ccc4..ad863dbe72456 100644
--- a/docs/design/2022-11-17-ttl-table.md
+++ b/docs/design/2022-11-17-ttl-table.md
@@ -353,3 +353,5 @@ TiKV supports TTL on RawKV, so we may get question that whether we can implement
   - No future foreign key support.
   - Not table awareness: TTL on RawKV is kv-level, can't easily set TTL on/off by Table
   - Not compatible with CDC, Backup, Secondary Index
+
+There is another proposal to implement TTL by pushing configurations to TiKV: https://github.com/pingcap/tidb/pull/22763 . However, there are still some unresolved problems. For example, it may break the constraint of snapshot isolation.

From 72000430bcf9505db1ce34c7a93e203d39004757 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E7=8E=8B=E8=B6=85?= <cclcwangchao@hotmail.com>
Date: Wed, 30 Nov 2022 16:14:18 +0800
Subject: [PATCH 25/37] Update docs/design/2022-11-17-ttl-table.md

Co-authored-by: Morgan Tocker <tocker@gmail.com>
---
 docs/design/2022-11-17-ttl-table.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/design/2022-11-17-ttl-table.md b/docs/design/2022-11-17-ttl-table.md
index ad863dbe72456..4fdca36ec2ed5 100644
--- a/docs/design/2022-11-17-ttl-table.md
+++ b/docs/design/2022-11-17-ttl-table.md
@@ -266,7 +266,7 @@ Some new system variables will be introduced:
 - `tidb_ttl_scan_worker_count`
   - The count of the scan workers in each TiDB node
   - Scope: Global
-  - Range: [1, 1024]
+  - Range: [1, 256]
   - Default: 4
 
 - `tidb_ttl_scan_batch_size`

From 9301a17578fca15cc74ba5713c5689c22b3d2aae Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E7=8E=8B=E8=B6=85?= <cclcwangchao@hotmail.com>
Date: Wed, 30 Nov 2022 16:15:29 +0800
Subject: [PATCH 26/37] Update docs/design/2022-11-17-ttl-table.md

Co-authored-by: Morgan Tocker <tocker@gmail.com>
---
 docs/design/2022-11-17-ttl-table.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/design/2022-11-17-ttl-table.md b/docs/design/2022-11-17-ttl-table.md
index 4fdca36ec2ed5..d17f4b99e6837 100644
--- a/docs/design/2022-11-17-ttl-table.md
+++ b/docs/design/2022-11-17-ttl-table.md
@@ -278,7 +278,7 @@ Some new system variables will be introduced:
 - `tidb_ttl_delete_worker_count`
   - The count of the delete workers in each TiDB node
   - Scope: Global
-  - Range: [1, 1024]
+  - Range: [1, 256]
   - Default: 4
 
 - `tidb_ttl_delete_batch_size`

From 3a759ff807fc2fc19d38679181a98c478007b788 Mon Sep 17 00:00:00 2001
From: Chao Wang <cclcwangchao@hotmail.com>
Date: Wed, 30 Nov 2022 17:00:31 +0800
Subject: [PATCH 27/37] update

---
 docs/design/2022-11-17-ttl-table.md | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/docs/design/2022-11-17-ttl-table.md b/docs/design/2022-11-17-ttl-table.md
index d17f4b99e6837..90ed9c7dac0a7 100644
--- a/docs/design/2022-11-17-ttl-table.md
+++ b/docs/design/2022-11-17-ttl-table.md
@@ -76,7 +76,7 @@ We should allow to update the existed TTL options. When it is updated, the runni
 
 #### Alter to a non-TTL Table
 
-If we want to remove a table's TTL options, we can just do:
+Similar with mysql syntax "ALTER TABLE t REMOVE PARTITIONING", if we want to remove a table's TTL options, we can just do:
 
 ```sql
 ALTER TABLE t1 REMOVE TTL;
@@ -355,3 +355,15 @@ TiKV supports TTL on RawKV, so we may get question that whether we can implement
   - Not compatible with CDC, Backup, Secondary Index
 
 There is another proposal to implement TTL by pushing configurations to TiKV: https://github.com/pingcap/tidb/pull/22763 . However, there are still some unresolved problems. For example, it may break the constraint of snapshot isolation.
+
+CockroachDB also supports row-level TTL feature, and it is very similar with our design, for example:
+
+```sql
+CREATE TABLE events (
+  id UUID PRIMARY KEY default gen_random_uuid(),
+  description TEXT,
+  inserted_at TIMESTAMP default current_timestamp()
+) WITH (ttl_expire_after = '3 months');
+```
+
+The difference is that the CockroachDB is using a hidden column to store the "expire" time instead of "create_time". So when altering a table's TTL options, it will result in some data change and affect the performance depending on the table size, see: https://www.cockroachlabs.com/docs/stable/row-level-ttl.html#add-or-update-the-row-level-ttl-for-an-existing-table

From 4b5aa556ec58db42a6c55a2c2ec1ac28022138ac Mon Sep 17 00:00:00 2001
From: Chao Wang <cclcwangchao@hotmail.com>
Date: Thu, 1 Dec 2022 10:49:08 +0800
Subject: [PATCH 28/37] update

---
 docs/design/2022-11-17-ttl-table.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/docs/design/2022-11-17-ttl-table.md b/docs/design/2022-11-17-ttl-table.md
index 90ed9c7dac0a7..a12460fb29513 100644
--- a/docs/design/2022-11-17-ttl-table.md
+++ b/docs/design/2022-11-17-ttl-table.md
@@ -41,7 +41,7 @@ The following example shows how to create a TTL table. The column `create_at` is
 CREATE TABLE t1 (
     id int PRIMARY KEY,
     created_at TIMESTAMP
-) TTL = `created_at` + INTERVAL 3 MONTH;
+) TTL = `created_at` + INTERVAL 3 MONTHS;
 ```
 
 We can use another `TTL_ENABLE` option to disable/enable the TTL job for the table. For example:
@@ -50,7 +50,7 @@ We can use another `TTL_ENABLE` option to disable/enable the TTL job for the tab
 CREATE TABLE t1 (
     id int PRIMARY KEY,
     created_at TIMESTAMP
-) TTL = `created_at` + INTERVAL 3 MONTH TTL_ENABLE = 'OFF';
+) TTL = `created_at` + INTERVAL 3 MONTHS TTL_ENABLE = 'OFF';
 ```
 
 The above table will not delete expired rows automatically because `TTL_ENABLE` is set to `OFF`. When the `TTL_ENABLE` is omitted, it uses value `ON` by default.
@@ -61,7 +61,7 @@ To make it compatible with mysql, TTL options also support the comment format. F
 CREATE TABLE t1 (
     id int PRIMARY KEY,
     created_at TIMESTAMP
-) /*T![ttl] TTL = `created_at` + INTERVAL 3 MONTH TTL_ENABLE = 'OFF'*/;
+) /*T![ttl] TTL = `created_at` + INTERVAL 3 MONTHS TTL_ENABLE = 'OFF'*/;
 ```
 
 #### Alter a Table with TTL
@@ -69,7 +69,7 @@ CREATE TABLE t1 (
 We can alter an exist table with TTL options, for example:
 
 ```sql
-ALTER TABLE t1 TTL = `created_at` + INTERVAL 3 MONTH;
+ALTER TABLE t1 TTL = `created_at` + INTERVAL 3 MONTHS;
 ```
 
 We should allow to update the existed TTL options. When it is updated, the running background job for this table should stop or restart according to the newest settings.

From 630b1e69c5a9d4be5b46fb7620f330420772d9b9 Mon Sep 17 00:00:00 2001
From: Chao Wang <cclcwangchao@hotmail.com>
Date: Fri, 2 Dec 2022 09:33:45 +0800
Subject: [PATCH 29/37] update doc

---
 docs/design/2022-11-17-ttl-table.md | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/docs/design/2022-11-17-ttl-table.md b/docs/design/2022-11-17-ttl-table.md
index a12460fb29513..17dc31f673e85 100644
--- a/docs/design/2022-11-17-ttl-table.md
+++ b/docs/design/2022-11-17-ttl-table.md
@@ -13,11 +13,13 @@
       * [Create TTL Table](#create-ttl-table)
       * [Alter a Table with TTL](#alter-a-table-with-ttl)
       * [Alter to a non-TTL Table](#alter-to-a-non-ttl-table)
+      * [Generated Columns](#generated-columns)
       * [Constraints](#constraints)
     * [TTL Job Management](#ttl-job-management)
     * [TTL Job Details](#ttl-job-details)
       * [Scan Tasks](#scan-tasks)
       * [Delete Tasks](#delete-tasks)
+      * [Time Zone Consideration](#time-zone-consideration)
     * [New System Variables](#new-system-variables)
     * [New Metrics](#new-metrics)
   * [Known Issues](#known-issues)
@@ -82,6 +84,13 @@ Similar with mysql syntax "ALTER TABLE t REMOVE PARTITIONING", if we want to rem
 ALTER TABLE t1 REMOVE TTL;
 ```
 
+#### Generated Columns
+
+TTL table supports a generated column as the time column, but currently, there maybe a performance degradation if you use it. The main reason is that generated column does not support pushing down its expression to TiKV side and the scan phase will have more performance cost we'll discuss later. Another reason is that most date functions are not push down supported too. So, if we want to solve this problem, we must:
+
+1. support pushing down generated column expression to TiKV side.
+2. support pushing down some common date functions to TiKV side.
+
 #### Constraints
 
 - TTL does NOT work on a table that is referenced by a foreign key. For example, you cannot add TTL to the parent table, because it is referenced by a foreign key in the child table and deleting a row from the parent table could violate this constraint.
@@ -199,7 +208,7 @@ The expired rows will be wrapped as `deleteTask` and then be sent to the delete
 
 - As we see, the scan operation is heavy because it scans the whole table. For a large table, it is recommended to set the system variable `tidb_ttl_job_run_interval` to a longer value to reduce the resource cost in one day.
 - When a table has no primary key or its primary key is not clustered, we'll use the hidden column `_tidb_rowid` instead as the row id.
-- Though we support generated column as the TTL time column, it is not efficient because TiDB currently cannot push down a generated column's condition to TiKV side, the TiDB has to do the filter works that requires more network traffics and CPU times.
+- As we just mentioned, using a generated column as the time column is not efficient because TiDB currently cannot push down a generated column's condition to TiKV side, the TiDB has to do the filter works that requires more network traffics and CPU times.
 
 #### Delete Tasks
 

From a290eea5d0ec2a90b72eaf78d28c3b44cafd42bb Mon Sep 17 00:00:00 2001
From: Chao Wang <cclcwangchao@hotmail.com>
Date: Mon, 5 Dec 2022 18:14:19 +0800
Subject: [PATCH 30/37] add test plans

---
 docs/design/2022-11-17-ttl-table.md | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/docs/design/2022-11-17-ttl-table.md b/docs/design/2022-11-17-ttl-table.md
index 17dc31f673e85..95817ff45eedf 100644
--- a/docs/design/2022-11-17-ttl-table.md
+++ b/docs/design/2022-11-17-ttl-table.md
@@ -25,6 +25,9 @@
   * [Known Issues](#known-issues)
   * [Future Works](#future-works)
   * [Alternative Solutions](#alternative-solutions)
+  * [Testing Plan](#testing-plan)
+    * [Functional Test](#functional-test)
+    * [Performance Test](#performance-test)
 <!-- TOC -->
 
 ## Introduction
@@ -376,3 +379,21 @@ CREATE TABLE events (
 ```
 
 The difference is that the CockroachDB is using a hidden column to store the "expire" time instead of "create_time". So when altering a table's TTL options, it will result in some data change and affect the performance depending on the table size, see: https://www.cockroachlabs.com/docs/stable/row-level-ttl.html#add-or-update-the-row-level-ttl-for-an-existing-table
+
+## Testing Plan
+
+### Functional Test
+
+- Test DDL operations including:
+  - Create a table with TTL options
+  - Alter a non TTL table with DDL options
+  - Alter a TTL table with new DDL options
+  - Remove a table's DDL options
+- Test a TTL table schedule background job and delete expire rows.
+- Test a TTL table should not schedule job when `TTL_ENABLE` or `@@global.tidb_ttl_job_enable` is `OFF`
+- Test TTL jobs should only be scheduled in time window.
+
+### Performance Test
+
+- Set up a 10 million rows table with 10 percent expire rows. Test the time it takes to clear the table.
+- Start a benchmark test and then start a TTL job. Collect QPS/latency of the benchmark and compare it with that when TTL job is not running. 

From c0cbec0636cb4791204b1bae495137cbe639b9f9 Mon Sep 17 00:00:00 2001
From: Chao Wang <cclcwangchao@hotmail.com>
Date: Tue, 6 Dec 2022 19:12:27 +0800
Subject: [PATCH 31/37] update future works

---
 docs/design/2022-11-17-ttl-table.md | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/docs/design/2022-11-17-ttl-table.md b/docs/design/2022-11-17-ttl-table.md
index 95817ff45eedf..8e80cd1749bac 100644
--- a/docs/design/2022-11-17-ttl-table.md
+++ b/docs/design/2022-11-17-ttl-table.md
@@ -349,11 +349,18 @@ In the above metrics, the optional values for type label is 'select' and 'delete
 
 ## Future Works
 
-- Schedule the scan tasks from one jobs to different nodes. This work will take full advantage of the cluster resource, especially for big tables.
-- If an index with the prefix of the TTL time column exists, we can use it to query expire rows instead of scanning the full table. It will reduce the execution time of the scan tasks.  
+We'll do some performance optimizations in the future to reduce the execution time and performance cost of the TTL job:
+
+- If an index with the prefix of the TTL time column exists, we can use it to query expire rows instead of scanning the full table. It will reduce the execution time of the scan tasks.
+- In most times, secondary indexes will not be created to avoid the hotspot of the insert. In this scene, we can also reduce some unnecessary scans by caching the statistical information. For example, we can cache the created time of the oldest row for each region after a job finished. When the next job starts, it can check the all the regions, if the data of one region do not have any updates and its cached time is not expired, just skip that region.
+- If a TTL table has some Tiflash replicas, we can the TiFlash instead of TiKV. 
+- In the future, we can schedule the tasks from one job to multiple nodes instead of executing them only in one node. This approach will improve the resource utilization of the cluster. It also means we can execute more tasks in concurrency at the same time that makes the scan and delete faster.
+- If a table does not have any secondary index, we can do some further optimizations. One optimization is that to push down the scan and delete to TiKV side without data exchanging between TiDB and TiKV. It is somewhat like what GCWorker dose. In this way, a new coprocessor command "TTLGC" will be introduced and when a job starts, TiDB will send "TTLGC" commands to each region and TiKV will then scan and delete the expired rows (TiKV should delete expired rows in a non-transactional way).
+
+There are also some features we can support in the future:
+
 - Support TTL table as a parent table referred by a child table with 'ON DELETE CASCADE'. When some rows in a TTL table are deleted, the related rows in child table will be deleted too.
 - Support pushing down generated column condition to TiKV side. Or use the definition of generated column to construct condition directly.
-- Scan the table from TiFlash (if table has any TiFlash replica) instead of TiKV to reduce performance effect on TP business.
 - Dynamically adjust the runtime settings according to the current overhead of the cluster.
 - Add some builtin alters for cloud environment.
 

From ed0c3f895e0efbaa40865cc03c77ec76f18e20a6 Mon Sep 17 00:00:00 2001
From: Chao Wang <cclcwangchao@hotmail.com>
Date: Tue, 6 Dec 2022 19:14:40 +0800
Subject: [PATCH 32/37] update

---
 docs/design/2022-11-17-ttl-table.md | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/docs/design/2022-11-17-ttl-table.md b/docs/design/2022-11-17-ttl-table.md
index 8e80cd1749bac..84f3a4735e7a0 100644
--- a/docs/design/2022-11-17-ttl-table.md
+++ b/docs/design/2022-11-17-ttl-table.md
@@ -24,6 +24,8 @@
     * [New Metrics](#new-metrics)
   * [Known Issues](#known-issues)
   * [Future Works](#future-works)
+    * [Performance Optimizations](#performance-optimizations)
+    * [More Features to Support](#more-features-to-support)
   * [Alternative Solutions](#alternative-solutions)
   * [Testing Plan](#testing-plan)
     * [Functional Test](#functional-test)
@@ -349,6 +351,8 @@ In the above metrics, the optional values for type label is 'select' and 'delete
 
 ## Future Works
 
+### Performance Optimizations
+
 We'll do some performance optimizations in the future to reduce the execution time and performance cost of the TTL job:
 
 - If an index with the prefix of the TTL time column exists, we can use it to query expire rows instead of scanning the full table. It will reduce the execution time of the scan tasks.
@@ -357,6 +361,8 @@ We'll do some performance optimizations in the future to reduce the execution ti
 - In the future, we can schedule the tasks from one job to multiple nodes instead of executing them only in one node. This approach will improve the resource utilization of the cluster. It also means we can execute more tasks in concurrency at the same time that makes the scan and delete faster.
 - If a table does not have any secondary index, we can do some further optimizations. One optimization is that to push down the scan and delete to TiKV side without data exchanging between TiDB and TiKV. It is somewhat like what GCWorker dose. In this way, a new coprocessor command "TTLGC" will be introduced and when a job starts, TiDB will send "TTLGC" commands to each region and TiKV will then scan and delete the expired rows (TiKV should delete expired rows in a non-transactional way).
 
+### More Features to Support
+
 There are also some features we can support in the future:
 
 - Support TTL table as a parent table referred by a child table with 'ON DELETE CASCADE'. When some rows in a TTL table are deleted, the related rows in child table will be deleted too.

From 41b45fb0956129a8b99b763bdf8b93794c39005b Mon Sep 17 00:00:00 2001
From: Chao Wang <cclcwangchao@hotmail.com>
Date: Thu, 8 Dec 2022 11:28:31 +0800
Subject: [PATCH 33/37] update

---
 docs/design/2022-11-17-ttl-table.md | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/docs/design/2022-11-17-ttl-table.md b/docs/design/2022-11-17-ttl-table.md
index 84f3a4735e7a0..91c59790fcb91 100644
--- a/docs/design/2022-11-17-ttl-table.md
+++ b/docs/design/2022-11-17-ttl-table.md
@@ -190,7 +190,7 @@ func doScanTask(tbl, range, expire, ch) {
 As we see above, it builds some select queries in a loop. The first query is built like this:
 
 ```sql
-SELECT id FROM t1
+SELECT LOW_PRIORITY id FROM t1
 WHERE create_time < '2022-01-01 00:00:00' AND id >= 12345 AND id < 45678
 ORDER BY id ASC
 LIMIT 500;
@@ -201,7 +201,7 @@ In above example, the expired time is '2022-01-01 00:00:00' which is computed wh
 For most cases, we cannot get all expired rows in one query. So if the return row count equals to the limit we set, that means there are still some rows not read yet. Suppose the latest id we just queried is `23456`, we should schedule the next query like this:
 
 ```sql
-SELECT id FROM t1
+SELECT LOW_PRIORITY id FROM t1
 WHERE create_time < '2022-01-01 00:00:00' AND id >= 23456 AND id < 45678
 ORDER BY id ASC
 LIMIT 500;
@@ -233,7 +233,7 @@ func doDelTask(ch) {
 A delete worker receives tasks from a channel (a chan object in goland) and then splits the rows in it to several batches according to the system variable `tidb_ttl_delete_batch_size`. After that, each batch expired rows will be deleted by a multi row `DELETE` query. For example:
 
 ```sql
-DELETE FROM t
+DELETE LOW_PRIORITY FROM t
 WHERE id in (1, 2, 3, ...) AND create_time < '2022-01-01 00:00:00';
 ```
 
@@ -355,11 +355,12 @@ In the above metrics, the optional values for type label is 'select' and 'delete
 
 We'll do some performance optimizations in the future to reduce the execution time and performance cost of the TTL job:
 
+- In the future, we can schedule the tasks from one job to multiple nodes instead of executing them only in one node. This approach will improve the resource utilization of the cluster. It also means we can execute more tasks in concurrency at the same time that makes the scan and delete faster.
+- If a TTL table has some Tiflash replicas, we can scan the TiFlash instead of TiKV.
 - If an index with the prefix of the TTL time column exists, we can use it to query expire rows instead of scanning the full table. It will reduce the execution time of the scan tasks.
 - In most times, secondary indexes will not be created to avoid the hotspot of the insert. In this scene, we can also reduce some unnecessary scans by caching the statistical information. For example, we can cache the created time of the oldest row for each region after a job finished. When the next job starts, it can check the all the regions, if the data of one region do not have any updates and its cached time is not expired, just skip that region.
-- If a TTL table has some Tiflash replicas, we can the TiFlash instead of TiKV. 
-- In the future, we can schedule the tasks from one job to multiple nodes instead of executing them only in one node. This approach will improve the resource utilization of the cluster. It also means we can execute more tasks in concurrency at the same time that makes the scan and delete faster.
 - If a table does not have any secondary index, we can do some further optimizations. One optimization is that to push down the scan and delete to TiKV side without data exchanging between TiDB and TiKV. It is somewhat like what GCWorker dose. In this way, a new coprocessor command "TTLGC" will be introduced and when a job starts, TiDB will send "TTLGC" commands to each region and TiKV will then scan and delete the expired rows (TiKV should delete expired rows in a non-transactional way).
+- The resource control works in ongoing: https://github.com/pingcap/tidb/issues/38025 and the global resource control framework is still in design. We can take advantages of it in the future to maximize the resource utilization of the cluster.
 
 ### More Features to Support
 

From 4a0cf1e8b071700a46d62a2b01e809445864a272 Mon Sep 17 00:00:00 2001
From: Chao Wang <cclcwangchao@hotmail.com>
Date: Mon, 12 Dec 2022 10:29:29 +0800
Subject: [PATCH 34/37] update

---
 docs/design/2022-11-17-ttl-table.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/docs/design/2022-11-17-ttl-table.md b/docs/design/2022-11-17-ttl-table.md
index 91c59790fcb91..c80b44c0e0d85 100644
--- a/docs/design/2022-11-17-ttl-table.md
+++ b/docs/design/2022-11-17-ttl-table.md
@@ -48,7 +48,7 @@ The following example shows how to create a TTL table. The column `create_at` is
 CREATE TABLE t1 (
     id int PRIMARY KEY,
     created_at TIMESTAMP
-) TTL = `created_at` + INTERVAL 3 MONTHS;
+) TTL = `created_at` + INTERVAL 3 MONTH;
 ```
 
 We can use another `TTL_ENABLE` option to disable/enable the TTL job for the table. For example:
@@ -57,7 +57,7 @@ We can use another `TTL_ENABLE` option to disable/enable the TTL job for the tab
 CREATE TABLE t1 (
     id int PRIMARY KEY,
     created_at TIMESTAMP
-) TTL = `created_at` + INTERVAL 3 MONTHS TTL_ENABLE = 'OFF';
+) TTL = `created_at` + INTERVAL 3 MONTH TTL_ENABLE = 'OFF';
 ```
 
 The above table will not delete expired rows automatically because `TTL_ENABLE` is set to `OFF`. When the `TTL_ENABLE` is omitted, it uses value `ON` by default.
@@ -68,7 +68,7 @@ To make it compatible with mysql, TTL options also support the comment format. F
 CREATE TABLE t1 (
     id int PRIMARY KEY,
     created_at TIMESTAMP
-) /*T![ttl] TTL = `created_at` + INTERVAL 3 MONTHS TTL_ENABLE = 'OFF'*/;
+) /*T![ttl] TTL = `created_at` + INTERVAL 3 MONTH TTL_ENABLE = 'OFF'*/;
 ```
 
 #### Alter a Table with TTL
@@ -76,7 +76,7 @@ CREATE TABLE t1 (
 We can alter an exist table with TTL options, for example:
 
 ```sql
-ALTER TABLE t1 TTL = `created_at` + INTERVAL 3 MONTHS;
+ALTER TABLE t1 TTL = `created_at` + INTERVAL 3 MONTH;
 ```
 
 We should allow to update the existed TTL options. When it is updated, the running background job for this table should stop or restart according to the newest settings.

From f9b0515b021482c7a0d9f35c654c7c24f9ab1c78 Mon Sep 17 00:00:00 2001
From: Chao Wang <cclcwangchao@hotmail.com>
Date: Tue, 13 Dec 2022 14:53:25 +0800
Subject: [PATCH 35/37] update

---
 docs/design/2022-11-17-ttl-table.md | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/docs/design/2022-11-17-ttl-table.md b/docs/design/2022-11-17-ttl-table.md
index c80b44c0e0d85..f10bec4bad224 100644
--- a/docs/design/2022-11-17-ttl-table.md
+++ b/docs/design/2022-11-17-ttl-table.md
@@ -320,29 +320,29 @@ We'll introduce some new metrics to monitor the TTL jobs:
 - `ttl_queries`
   - The total count of select queries in TTL jobs
   - Type: Counter
-  - Labels: type, table, result
+  - Labels: sql_type, result
 
 - `ttl_processed_expired_rows`
   - The total count of expired rows processed in TTL jobs
   - Type: Counter
-  - Labels: type, table
+  - Labels: sql_type
 
 - `ttl_query_duration`
   - The duration of the queries in TTL jobs
-  - Labels: type
+  - Labels: sql_type, result
   - Type: Histogram
 
 - `ttl_job_status`
   - The status for the current TTL job. When the job is in the specified status, the value will 1, otherwise, it will be 0
-  - Labels: table, status
-  - Type: Gauge
-
-- `ttl_job_scan_workers`
-  - The running scan workers count for the ttl jobs
   - Labels: table
   - Type: Gauge
 
-In the above metrics, the optional values for type label is 'select' and 'delete' and the optional values for result label is `success` and `error`
+- `ttl_phase_time`
+  - The spent in different phases of each worker
+  - Labels: type, phase
+  - Type: Counter
+
+In the above metrics, the optional values for sql_type label is 'select' and 'delete' and the optional values for result label is `ok` and `error`
 
 ## Known Issues
 

From a202c348684988a45371681e61e1415a6b41c3cb Mon Sep 17 00:00:00 2001
From: Chao Wang <cclcwangchao@hotmail.com>
Date: Fri, 17 Mar 2023 17:33:53 +0800
Subject: [PATCH 36/37] update

---
 docs/design/2022-11-17-ttl-table.md | 34 ++++++++++++++++++-----------
 1 file changed, 21 insertions(+), 13 deletions(-)

diff --git a/docs/design/2022-11-17-ttl-table.md b/docs/design/2022-11-17-ttl-table.md
index f10bec4bad224..9cf981a53ce1e 100644
--- a/docs/design/2022-11-17-ttl-table.md
+++ b/docs/design/2022-11-17-ttl-table.md
@@ -79,6 +79,14 @@ We can alter an exist table with TTL options, for example:
 ALTER TABLE t1 TTL = `created_at` + INTERVAL 3 MONTH;
 ```
 
+TTL job is scheduled every 1 hour by default, if we want to custom the schedule interval, we can do it like this:
+
+```sql
+ALTER TABLE t1 TTL_JOB_INTERVAL='1d';
+```
+
+This alters the TTL job's schedule interval of table t1 to 1 day.
+
 We should allow to update the existed TTL options. When it is updated, the running background job for this table should stop or restart according to the newest settings.
 
 #### Alter to a non-TTL Table
@@ -241,6 +249,10 @@ Notice that we are still using the condition `create_time < '2022-01-01 00:00:00
 
 If there is no secondary index in a TTL table, we can assume that most of the delete operations can do a commit with 1PC. That is because the incoming rows are in the same region in most cases. So if you have a big table, keep no secondary indexes in this table may achieve a better performance for TTL jobs.
 
+#### Distributed Task executions
+
+As we mentioned above, a TTL job will be split to several scan tasks. In order to maximize the use of cluster resources, we can distribute these tasks to all TiDB nodes. Every time a new job is created, some rows will be inserted to a table named `mysql.tidb_ttl_task`. Each row in this table stands for a scan task. Every TiDB node will scan the table periodically and once a new task is discovered, the TiDB node will set the owner of the task to itself and then execute it.
+
 #### Time Zone Consideration
 
 Currently, we support three field types as the time column for TTL: `Date`, `DateTime` and `TimeStamp` . However, for `Date` and `DateTime`, they are not an absolute time, and we need another "time zone" information to determine what their accurate time point is. We have two options to select the time zone:
@@ -259,12 +271,6 @@ Some new system variables will be introduced:
   - Values: [ON, OFF]
   - Default: ON
 
-- `tidb_ttl_job_run_interval`
-  - The schedule interval between two jobs for one TTL table
-  - Scope: Global
-  - Range: [10m0s, 8760h0m0s]
-  - Default: 1h
-
 - `tidb_ttl_job_schedule_window_start_time`
   - This variable is used to restrict the start time of the time window of scheduling the ttl jobs. 
   - Scope: Global
@@ -307,11 +313,11 @@ Some new system variables will be introduced:
   - Range: [0, MaxInt64]
   - Default: 0
 
-- `tidb_ttl_enable_instance_worker`
-  - Whether to start TTL workers in the current instance or not.
-  - Scope: Instance
-  - Values: [ON, OFF]
-  - Default: ON
+- `tidb_ttl_running_tasks`
+  - The max count of running tasks in the cluster at the same time. -1 stands for auto determined by TiDB.
+  - Scope: Global
+  - Range: -1 or [1, MaxInt64]
+  - Default: 0
 
 ### New Metrics
 
@@ -342,11 +348,14 @@ We'll introduce some new metrics to monitor the TTL jobs:
   - Labels: type, phase
   - Type: Counter
 
+- `ttl_insert_rows`
+  - The total inserted rows to TTL tables
+  - Type: Counter
+
 In the above metrics, the optional values for sql_type label is 'select' and 'delete' and the optional values for result label is `ok` and `error`
 
 ## Known Issues
 
-- Though the TTL jobs from different table runs distributively. However, one job from a table runs in a single TiDB node. If a table is very large, there may be a bottleneck.
 - Currently, the condition of generated column cannot be pushed down to TiKV. If a table uses generated column as a TTL time column, the filter will be performed in TiDB side. It brings some necessary network traffics and makes the query slow.
 
 ## Future Works
@@ -355,7 +364,6 @@ In the above metrics, the optional values for sql_type label is 'select' and 'de
 
 We'll do some performance optimizations in the future to reduce the execution time and performance cost of the TTL job:
 
-- In the future, we can schedule the tasks from one job to multiple nodes instead of executing them only in one node. This approach will improve the resource utilization of the cluster. It also means we can execute more tasks in concurrency at the same time that makes the scan and delete faster.
 - If a TTL table has some Tiflash replicas, we can scan the TiFlash instead of TiKV.
 - If an index with the prefix of the TTL time column exists, we can use it to query expire rows instead of scanning the full table. It will reduce the execution time of the scan tasks.
 - In most times, secondary indexes will not be created to avoid the hotspot of the insert. In this scene, we can also reduce some unnecessary scans by caching the statistical information. For example, we can cache the created time of the oldest row for each region after a job finished. When the next job starts, it can check the all the regions, if the data of one region do not have any updates and its cached time is not expired, just skip that region.

From 87c3465ef8329fc138dce5d883a2d080463ab610 Mon Sep 17 00:00:00 2001
From: Chao Wang <cclcwangchao@hotmail.com>
Date: Fri, 17 Mar 2023 17:37:11 +0800
Subject: [PATCH 37/37] update

---
 docs/design/2022-11-17-ttl-table.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/design/2022-11-17-ttl-table.md b/docs/design/2022-11-17-ttl-table.md
index 9cf981a53ce1e..e9046ded2c084 100644
--- a/docs/design/2022-11-17-ttl-table.md
+++ b/docs/design/2022-11-17-ttl-table.md
@@ -314,7 +314,7 @@ Some new system variables will be introduced:
   - Default: 0
 
 - `tidb_ttl_running_tasks`
-  - The max count of running tasks in the cluster at the same time. -1 stands for auto determined by TiDB.
+  - The max count of running tasks in a cluster. -1 stands for auto determined by TiDB.
   - Scope: Global
   - Range: -1 or [1, MaxInt64]
   - Default: 0