From 1dc2a81d9ceb8018fc56dd0b4082fe972e797419 Mon Sep 17 00:00:00 2001 From: CaitinChen Date: Fri, 20 Apr 2018 16:38:43 +0800 Subject: [PATCH 1/8] docs/op-guide: create GC document --- README.md | 1 + op-guide/gc.md | 77 ++++++++++++++++++++++++++++++++++++++++ op-guide/history-read.md | 41 +++++---------------- 3 files changed, 86 insertions(+), 33 deletions(-) create mode 100644 op-guide/gc.md diff --git a/README.md b/README.md index 8933543da048a..1e83abdae7c23 100644 --- a/README.md +++ b/README.md @@ -72,6 +72,7 @@ - [TiDB Memory Control](sql/tidb-memory-control.md) + Advanced Usage - [Read Data From History Versions](op-guide/history-read.md) + - [Garbage Collection (GC)](op-guide/gc.md) + TiDB Operations Guide - [Hardware and Software Requirements](op-guide/recommendation.md) + Deploy diff --git a/op-guide/gc.md b/op-guide/gc.md new file mode 100644 index 0000000000000..af80bf9db10cf --- /dev/null +++ b/op-guide/gc.md @@ -0,0 +1,77 @@ +--- +title: TiDB Garbage Collection (GC) +category: advanced +--- + +# TiDB Garbage Collection (GC) + +TiDB uses MVCC to control concurrency. When you update or delete data, the data is not deleted immediately and it is saved for a period during which it can be read. Thus the write operation and the read operation are not mutually exclusive and it is possible to read the previous data. + +The data versions whose duration exceeds a specific time and that are not used any more will be cleared, or they will occupy the disk space, affecting the system performance. TiDB uses Garbage Collection (GC) to clear the obsolete data. + +## Working mechanism + +GC runs periodically on TiDB. When a TiDB server is started, a `gc_worker` is enabled in the background. In a TiDB cluster, one `gc_worker` is elected to be the leader which is used to maintain the GC status and send GC commands to all the TiKV Region leaders. + +## Configuration and monitor + +The GC configuration and operation status are recorded in the `mysql.tidb` system table, which can be monitored and configured using the following SQL statement: + +```sql +mysql> select VARIABLE_NAME, VARIABLE_VALUE from mysql.tidb; ++-----------------------+------------------------------------------------------------------------------------------------+ +| VARIABLE_NAME | VARIABLE_VALUE | ++-----------------------+------------------------------------------------------------------------------------------------+ +| bootstrapped | True | +| tidb_server_version | 18 | +| tikv_gc_leader_uuid | 58accebfa7c0004 | +| tikv_gc_leader_desc | host:ip-172-16-30-5, pid:95472, start at 2018-04-11 13:43:30.73076656 +0800 CST m=+0.068873865 | +| tikv_gc_leader_lease | 20180418-11:02:30 +0800 CST | +| tikv_gc_run_interval | 10m0s | +| tikv_gc_life_time | 10m0s | +| tikv_gc_last_run_time | 20180418-10:59:30 +0800 CST | +| tikv_gc_safe_point | 20180418-10:58:30 +0800 CST | +| tikv_gc_concurrency | 1 | ++-----------------------+------------------------------------------------------------------------------------------------+ +10 rows in set (0.02 sec) +``` + +In the table above, `tikv_gc_run_interval`, `tikv_gc_life_time` and `tikv_gc_concurrency` can be configured manually. Other `tikv_gc`- variables record the current status, which are automatically updated by TiDB. Do not modify these variables. + +`tikv_gc_run_interval` (10 min by default) indicates the interval of GC work. `tikv_gc_life_time` (10 min by default) indicates the retaining time of data versions. When GC works, the outdated data is cleared. The `tikv_gc_run_interval` and `tikv_gc_life_time` should be not less than 10 minutes. You can set them using SQL statements. For example, if you want to retain the data within a day, you can execute the operation as below: + +```sql +update mysql.tidb set VARIABLE_VALUE = '24h' where VARIABLE_NAME = 'tikv_gc_life_time'; +``` + +The duration strings are a sequence of a number with the time unit, such as `24h`, `2h30m` and `2.5h`. The time units you can use include "h", "m" and "s". + +> **Note**: When you set `tikv_gc_life_time` to a large number (like days or even months) in a data updated frequently scenario, some problems as follows may occur: + + - The more versions of the data, the more disk storage space is occupied. + - A large number of history versions might slow down the query. They may affect range queries like `select count(*) from t`. + - If `tikv_gc_life_time` is suddenly turned to a smaller value during operation, a great deal of old data may be deleted in a short time, causing I/O pressure. + +`tikv_gc_concurrency` indicates the GC concurrency. It is set to `1` by default. In this case, a single thread operates and threads send request to each Region and wait for the response one by one. You can set the variable value larger to improve the system performance, but keep the value smaller than 128. + +`tikv_gc_leader_uuid`, `tikv_gc_leader_desc`, `tikv_gc_leader_lease` indicate the current GC leader information. `tikv_gc_last_run_time` indicates the last time GC works. + +`tikv_gc_safe_point` indicates the time, versions before which are cleared by GC and versions after which are readable. + +## Implementation details + +The GC implementation process is complex. Clearing the data that is not used any more should be on the premise that data consistency is guaranteed. The process of doing GC is as below: + +### 1. Resolve locks + +The TiDB transaction is based on Google Percolator. The transaction committing is a two-phase committing process. When the first phase is finished, all the related keys are locked. Among these locks, one is the primary lock and the others are secondary locks which contain a pointer of the primary locks; in the secondary phase, the key with the primary lock gets a write record and its lock is removed. The write record indicates the write or delete operation in the history or the transactional rollback record of this key. Replacing the primary lock by which write record indicates whether the corresponding transaction is committed successfully. Then all the secondary locks are replaced successively. If the threads to replace the secondary locks fail, these locks are retained. During GC, locks whose timestamp is before the safe point will be replaced by the corresponding write record based on the transaction committing status. + +This step is necessary, because you are not informed whether this transaction is successful if GC has cleared the write record of the primary lock, thus data consistency cannot be guaranteed. + +### 2. Delete ranges + +The `DeleteRanges` operation is usually necessary after the operation such as `drop table`. It is used to delete a range which may be very large. If the `use_delete_range` option of TiKV is not enabled, TiKV deletes the keys in the range. + +### 3. Do GC + +Clear the data before the safe point of each key and the write record. \ No newline at end of file diff --git a/op-guide/history-read.md b/op-guide/history-read.md index 95002ace05c92..d737a3a1f603b 100644 --- a/op-guide/history-read.md +++ b/op-guide/history-read.md @@ -29,39 +29,14 @@ After reading data from history versions, you can read data from the latest vers ## How TiDB manages the data versions -TiDB implements Multi-Version Concurrency Control (MVCC) to manage data versions. The history versions of data are kept because each update / removal creates a new version of the data object instead of updating / removing the data object in-place. But not all the versions are kept. If the versions are older than a specific time, they will be removed completely to reduce the storage occupancy and the performance overhead caused by too many history versions. - -In TiDB, Garbage Collection (GC) runs periodically to remove the obsolete data versions. GC is triggered in the following way: There is a `gc_worker` goroutine running in the background of each TiDB server. In a cluster with multiple TiDB servers, one of the `gc_worker` goroutines will be automatically selected to be the leader. The leader is responsible for maintaining the GC state and sends GC commands to each TiKV region leader. - -The running record of GC is recorded in the system table of `mysql.tidb` as follows and can be monitored and configured using the SQL statements: - -``` -mysql> select variable_name, variable_value from mysql.tidb; -+-----------------------+----------------------------+ -| variable_name | variable_value | -+-----------------------+----------------------------+ -| bootstrapped | True | -| tikv_gc_leader_uuid | 55daa0dfc9c0006 | -| tikv_gc_leader_desc | host:pingcap-pc5 pid:10549 | -| tikv_gc_leader_lease | 20160927-13:18:28 +0800 CST| -| tikv_gc_run_interval | 10m0s | -| tikv_gc_life_time | 10m0s | -| tikv_gc_last_run_time | 20160927-13:13:28 +0800 CST| -| tikv_gc_safe_point | 20160927-13:03:28 +0800 CST| -+-----------------------+----------------------------+ -7 rows in set (0.00 sec) -``` - -Pay special attention to the following two rows: - -- `tikv_gc_life_time`: This row is to configure the retention time of the history version and its default value is 10m. You can use SQL statements to configure it. For example, if you want all the data within one day to be readable, set this row to 24h by using the `update mysql.tidb set variable_value='24h' where variable_name='tikv_gc_life_time'` statement. The format is: "24h", "2h30m", "2.5h". The unit of time can be: "h", "m", "s". - -> **Note:** If your data is updated very frequently, the following issues might occur if the value of the `tikv_gc_life_time` is set to be too large like in days or months: -> -> - The more versions of the data, the more disk storage is occupied. -> - A large amount of the history versions might slow down the query, especially the range queries like `select count(*) from t`. -> - If the value of the `tikv_gc_life_time` variable is suddenly changed to be smaller while the database is running, it might lead to the removal of large amounts of history data and cause huge I/O burden. -> - `tikv_gc_safe_point`: This row records the current safePoint. You can safely create the Snapshot to read the history data using the timestamp that is later than the safePoint. The safePoint automatically updates every time GC runs. +TiDB implements Multi-Version Concurrency Control (MVCC) to manage data versions. The history versions of data are kept because each update/removal creates a new version of the data object instead of updating/removing the data object in-place. But not all the versions are kept. If the versions are older than a specific time, they will be removed completely to reduce the storage occupancy and the performance overhead caused by too many history versions. + +In TiDB, Garbage Collection (GC) runs periodically to remove the obsolete data versions. For GC details, see [TiDB Garbage Collection (GC)](gc.md) + +Pay special attention to the following two variables: + +- `tikv_gc_life_time`: It is used to configure the retention time of the history version. You can modify it manually. +- `tikv_gc_safe_point`: It records the current `safePoint`. You can safely create the snapshot to read the history data using the timestamp that is later than `safePoint`. `safePoint` automatically updates every time GC runs. ## Example From 0311955a3eb881c6d40542d5c13156f841010c77 Mon Sep 17 00:00:00 2001 From: Caitin <34535727+CaitinChen@users.noreply.github.com> Date: Mon, 23 Apr 2018 17:38:47 +0800 Subject: [PATCH 2/8] Update gc.md --- op-guide/gc.md | 50 ++++++++++++++++++++++++++++---------------------- 1 file changed, 28 insertions(+), 22 deletions(-) diff --git a/op-guide/gc.md b/op-guide/gc.md index af80bf9db10cf..4cf9b8df35549 100644 --- a/op-guide/gc.md +++ b/op-guide/gc.md @@ -5,17 +5,17 @@ category: advanced # TiDB Garbage Collection (GC) -TiDB uses MVCC to control concurrency. When you update or delete data, the data is not deleted immediately and it is saved for a period during which it can be read. Thus the write operation and the read operation are not mutually exclusive and it is possible to read the previous data. +TiDB uses MVCC to control concurrency. When you update or delete data, the data is not deleted immediately but is saved for a period during which it can be read. Thus the write operation and the read operation are not mutually exclusive and it is possible to read the previous data. -The data versions whose duration exceeds a specific time and that are not used any more will be cleared, or they will occupy the disk space, affecting the system performance. TiDB uses Garbage Collection (GC) to clear the obsolete data. +The data versions whose duration exceeds a specific time and that are not used any more will be cleared, otherwise they will occupy the disk space, affecting the system performance. TiDB uses Garbage Collection (GC) to clear the obsolete data. ## Working mechanism -GC runs periodically on TiDB. When a TiDB server is started, a `gc_worker` is enabled in the background. In a TiDB cluster, one `gc_worker` is elected to be the leader which is used to maintain the GC status and send GC commands to all the TiKV Region leaders. +GC runs periodically on TiDB. When a TiDB server is started, a `gc_worker` is enabled in the background. In each TiDB cluster, one `gc_worker` is elected to be the leader which is used to maintain the GC status and send GC commands to all the TiKV Region leaders. ## Configuration and monitor -The GC configuration and operation status are recorded in the `mysql.tidb` system table, which can be monitored and configured using the following SQL statement: +The GC configuration and operational status are recorded in the `mysql.tidb` system table, which can be monitored and configured using the following SQL statement: ```sql mysql> select VARIABLE_NAME, VARIABLE_VALUE from mysql.tidb; @@ -36,42 +36,48 @@ mysql> select VARIABLE_NAME, VARIABLE_VALUE from mysql.tidb; 10 rows in set (0.02 sec) ``` -In the table above, `tikv_gc_run_interval`, `tikv_gc_life_time` and `tikv_gc_concurrency` can be configured manually. Other `tikv_gc`- variables record the current status, which are automatically updated by TiDB. Do not modify these variables. +In the table above, `tikv_gc_run_interval`, `tikv_gc_life_time` and `tikv_gc_concurrency` can be configured manually. Other variables with the `tikv_gc prefix` prefix record the current status, which are automatically updated by TiDB. Do not modify these variables. -`tikv_gc_run_interval` (10 min by default) indicates the interval of GC work. `tikv_gc_life_time` (10 min by default) indicates the retaining time of data versions. When GC works, the outdated data is cleared. The `tikv_gc_run_interval` and `tikv_gc_life_time` should be not less than 10 minutes. You can set them using SQL statements. For example, if you want to retain the data within a day, you can execute the operation as below: +- `tikv_gc_leader_uuid`, `tikv_gc_leader_desc`, `tikv_gc_leader_lease`: the current GC leader information. -```sql -update mysql.tidb set VARIABLE_VALUE = '24h' where VARIABLE_NAME = 'tikv_gc_life_time'; -``` +- `tikv_gc_run_interval`: the interval of GC work. The value is 10 min by default and cannot be smaller than 10 min. + +- `tikv_gc_life_time`: the retention period of data versions; The value is 10 min by default and cannot be smaller than 10 min. + + When GC works, the outdated data is cleared. You can set it using the SQL statement. For example, if you want to retain the data within a day, you can execute the operation as below: + + ```sql + update mysql.tidb set VARIABLE_VALUE = '24h' where VARIABLE_NAME = 'tikv_gc_life_time'; + ``` -The duration strings are a sequence of a number with the time unit, such as `24h`, `2h30m` and `2.5h`. The time units you can use include "h", "m" and "s". + The duration strings are a sequence of a number with the time unit, such as 24h, 2h30m and 2.5h. The time units you can use include "h", "m" and "s". -> **Note**: When you set `tikv_gc_life_time` to a large number (like days or even months) in a data updated frequently scenario, some problems as follows may occur: + > **Note**: When you set `tikv_gc_life_time` to a large number (like days or even months) in a data updated frequently scenario, some problems as follows may occur: - - The more versions of the data, the more disk storage space is occupied. - - A large number of history versions might slow down the query. They may affect range queries like `select count(*) from t`. - - If `tikv_gc_life_time` is suddenly turned to a smaller value during operation, a great deal of old data may be deleted in a short time, causing I/O pressure. + - The more versions of the data, the more disk storage space is occupied. + - A large number of history versions might slow down the query. They may affect range queries like `select count(*) from t`. + - If `tikv_gc_life_time` is suddenly turned to a smaller value during operation, a great deal of old data may be deleted in a short time, causing I/O pressure. -`tikv_gc_concurrency` indicates the GC concurrency. It is set to `1` by default. In this case, a single thread operates and threads send request to each Region and wait for the response one by one. You can set the variable value larger to improve the system performance, but keep the value smaller than 128. +- `tikv_gc_last_run_time`: the last time GC works. -`tikv_gc_leader_uuid`, `tikv_gc_leader_desc`, `tikv_gc_leader_lease` indicate the current GC leader information. `tikv_gc_last_run_time` indicates the last time GC works. +- `tikv_gc_safe_point`: the time that versions before which are cleared by GC and versions after which are readable. -`tikv_gc_safe_point` indicates the time, versions before which are cleared by GC and versions after which are readable. +- `tikv_gc_concurrency`: the GC concurrency. It is set to 1 by default. In this case, a single thread operates and threads send request to each Region and wait for the response one by one. You can set the variable value larger to improve the system performance, but keep the value smaller than 128. ## Implementation details -The GC implementation process is complex. Clearing the data that is not used any more should be on the premise that data consistency is guaranteed. The process of doing GC is as below: +The GC implementation process is complex. When the obsolete data is cleared, data consistency is guaranteed. The process of doing GC is as below: ### 1. Resolve locks -The TiDB transaction is based on Google Percolator. The transaction committing is a two-phase committing process. When the first phase is finished, all the related keys are locked. Among these locks, one is the primary lock and the others are secondary locks which contain a pointer of the primary locks; in the secondary phase, the key with the primary lock gets a write record and its lock is removed. The write record indicates the write or delete operation in the history or the transactional rollback record of this key. Replacing the primary lock by which write record indicates whether the corresponding transaction is committed successfully. Then all the secondary locks are replaced successively. If the threads to replace the secondary locks fail, these locks are retained. During GC, locks whose timestamp is before the safe point will be replaced by the corresponding write record based on the transaction committing status. +The TiDB transaction model is inspired by Google's Percolator. It's mainly a two-phase commit protocol with some practical optimizations. When the first phase is finished, all the related keys are locked. Among these locks, one is the primary lock and the others are secondary locks which contain a pointer of the primary locks; in the secondary phase, the key with the primary lock gets a write record and its lock is removed. The write record indicates the write or delete operation in the history or the transactional rollback record of this key. Replacing the primary lock with which write record indicates whether the corresponding transaction is committed successfully. Then all the secondary locks are replaced successively. If the threads to replace the secondary locks fail, these locks are retained. During GC, the lock whose timestamp is before the safe point is replaced with the corresponding write record based on the transaction committing status. -This step is necessary, because you are not informed whether this transaction is successful if GC has cleared the write record of the primary lock, thus data consistency cannot be guaranteed. +**Note**: This is a required step. Once GC has cleared the write record of the primary lock, you can never know whether this transaction is successful or not. As a result, data consistency cannot be guaranteed. ### 2. Delete ranges -The `DeleteRanges` operation is usually necessary after the operation such as `drop table`. It is used to delete a range which may be very large. If the `use_delete_range` option of TiKV is not enabled, TiKV deletes the keys in the range. +`DeleteRanges` is usually executed after operations like `drop table`, used to delete a range which might be very large. If the `use_delete_range` option of TiKV is not enabled, TiKV deletes the keys in the range. ### 3. Do GC -Clear the data before the safe point of each key and the write record. \ No newline at end of file +Clear the data before the safe point of each key and the write record. From 4215a65c256f64d5130319f5c7b094d096b0d825 Mon Sep 17 00:00:00 2001 From: Caitin <34535727+CaitinChen@users.noreply.github.com> Date: Mon, 23 Apr 2018 18:04:34 +0800 Subject: [PATCH 3/8] Update gc.md --- op-guide/gc.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/op-guide/gc.md b/op-guide/gc.md index 4cf9b8df35549..911ed7a87e92f 100644 --- a/op-guide/gc.md +++ b/op-guide/gc.md @@ -53,10 +53,10 @@ In the table above, `tikv_gc_run_interval`, `tikv_gc_life_time` and `tikv_gc_con The duration strings are a sequence of a number with the time unit, such as 24h, 2h30m and 2.5h. The time units you can use include "h", "m" and "s". > **Note**: When you set `tikv_gc_life_time` to a large number (like days or even months) in a data updated frequently scenario, some problems as follows may occur: - - - The more versions of the data, the more disk storage space is occupied. - - A large number of history versions might slow down the query. They may affect range queries like `select count(*) from t`. - - If `tikv_gc_life_time` is suddenly turned to a smaller value during operation, a great deal of old data may be deleted in a short time, causing I/O pressure. + + - The more versions of the data, the more disk storage space is occupied. + - A large number of history versions might slow down the query. They may affect range queries like `select count(*) from t`. + - If `tikv_gc_life_time` is suddenly turned to a smaller value during operation, a great deal of old data may be deleted in a short time, causing I/O pressure. - `tikv_gc_last_run_time`: the last time GC works. From 8dd94589da8e9397369a258adc76c52241eef18b Mon Sep 17 00:00:00 2001 From: Caitin <34535727+CaitinChen@users.noreply.github.com> Date: Mon, 23 Apr 2018 19:51:22 +0800 Subject: [PATCH 4/8] Update GC via: https://github.com/pingcap/docs-cn/pull/689 --- op-guide/gc.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/op-guide/gc.md b/op-guide/gc.md index 911ed7a87e92f..16737358b9182 100644 --- a/op-guide/gc.md +++ b/op-guide/gc.md @@ -72,7 +72,7 @@ The GC implementation process is complex. When the obsolete data is cleared, dat The TiDB transaction model is inspired by Google's Percolator. It's mainly a two-phase commit protocol with some practical optimizations. When the first phase is finished, all the related keys are locked. Among these locks, one is the primary lock and the others are secondary locks which contain a pointer of the primary locks; in the secondary phase, the key with the primary lock gets a write record and its lock is removed. The write record indicates the write or delete operation in the history or the transactional rollback record of this key. Replacing the primary lock with which write record indicates whether the corresponding transaction is committed successfully. Then all the secondary locks are replaced successively. If the threads to replace the secondary locks fail, these locks are retained. During GC, the lock whose timestamp is before the safe point is replaced with the corresponding write record based on the transaction committing status. -**Note**: This is a required step. Once GC has cleared the write record of the primary lock, you can never know whether this transaction is successful or not. As a result, data consistency cannot be guaranteed. +> **Note**: This is a required step. Once GC has cleared the write record of the primary lock, you can never know whether this transaction is successful or not. As a result, data consistency cannot be guaranteed. ### 2. Delete ranges @@ -81,3 +81,4 @@ The TiDB transaction model is inspired by Google's Percolator. It's mainly a two ### 3. Do GC Clear the data before the safe point of each key and the write record. +> **Note**: if the last record in all the write records of `Put` and `Delete` types before the safe point is `Put`, this record and its data cannot be deleted directly. Otherwise, you cannot successfully perform the read operation whose timestamp is after the safe point and before the next key version. From 9f169f282a0cbe63c911e1d8582123b0c06761d8 Mon Sep 17 00:00:00 2001 From: Caitin <34535727+CaitinChen@users.noreply.github.com> Date: Tue, 24 Apr 2018 10:33:27 +0800 Subject: [PATCH 5/8] Update gc.md --- op-guide/gc.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/op-guide/gc.md b/op-guide/gc.md index 16737358b9182..2fc1aac407280 100644 --- a/op-guide/gc.md +++ b/op-guide/gc.md @@ -81,4 +81,4 @@ The TiDB transaction model is inspired by Google's Percolator. It's mainly a two ### 3. Do GC Clear the data before the safe point of each key and the write record. -> **Note**: if the last record in all the write records of `Put` and `Delete` types before the safe point is `Put`, this record and its data cannot be deleted directly. Otherwise, you cannot successfully perform the read operation whose timestamp is after the safe point and before the next key version. +> **Note**: If the last record in all the write records of `Put` and `Delete` types before the safe point is `Put`, this record and its data cannot be deleted directly. Otherwise, you cannot successfully perform the read operation whose timestamp is after the safe point and before the next key version. From 7c47836256cdfe9926b64ab9e44668ddf71ab1d5 Mon Sep 17 00:00:00 2001 From: Caitin <34535727+CaitinChen@users.noreply.github.com> Date: Tue, 24 Apr 2018 11:02:12 +0800 Subject: [PATCH 6/8] Update gc.md --- op-guide/gc.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/op-guide/gc.md b/op-guide/gc.md index 2fc1aac407280..02cf9c15c696e 100644 --- a/op-guide/gc.md +++ b/op-guide/gc.md @@ -5,9 +5,9 @@ category: advanced # TiDB Garbage Collection (GC) -TiDB uses MVCC to control concurrency. When you update or delete data, the data is not deleted immediately but is saved for a period during which it can be read. Thus the write operation and the read operation are not mutually exclusive and it is possible to read the previous data. +TiDB uses MVCC to control concurrency. When you update or delete data, the original data is not deleted immediately but is kept for a period during which it can be read. Thus the write operation and the read operation are not mutually exclusive and it is possible to read the history versions of the data. -The data versions whose duration exceeds a specific time and that are not used any more will be cleared, otherwise they will occupy the disk space, affecting the system performance. TiDB uses Garbage Collection (GC) to clear the obsolete data. +The data versions whose duration exceeds a specific time and that are not used any more will be cleared, otherwise they will occupy the disk space and affect TiDB's performance. TiDB uses Garbage Collection (GC) to clear the obsolete data. ## Working mechanism From d1ec16a5c7ee65dd26211981b28e12661f8467ae Mon Sep 17 00:00:00 2001 From: Caitin <34535727+CaitinChen@users.noreply.github.com> Date: Tue, 24 Apr 2018 14:47:56 +0800 Subject: [PATCH 7/8] Update gc.md --- op-guide/gc.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/op-guide/gc.md b/op-guide/gc.md index 02cf9c15c696e..1d53a19d3315e 100644 --- a/op-guide/gc.md +++ b/op-guide/gc.md @@ -15,7 +15,7 @@ GC runs periodically on TiDB. When a TiDB server is started, a `gc_worker` is en ## Configuration and monitor -The GC configuration and operational status are recorded in the `mysql.tidb` system table, which can be monitored and configured using the following SQL statement: +The GC configuration and operational status are recorded in the `mysql.tidb` system table as below, which can be monitored and configured using SQL statements: ```sql mysql> select VARIABLE_NAME, VARIABLE_VALUE from mysql.tidb; @@ -36,7 +36,7 @@ mysql> select VARIABLE_NAME, VARIABLE_VALUE from mysql.tidb; 10 rows in set (0.02 sec) ``` -In the table above, `tikv_gc_run_interval`, `tikv_gc_life_time` and `tikv_gc_concurrency` can be configured manually. Other variables with the `tikv_gc prefix` prefix record the current status, which are automatically updated by TiDB. Do not modify these variables. +In the table above, `tikv_gc_run_interval`, `tikv_gc_life_time` and `tikv_gc_concurrency` can be configured manually. Other variables with the `tikv_gc` prefix record the current status, which are automatically updated by TiDB. Do not modify these variables. - `tikv_gc_leader_uuid`, `tikv_gc_leader_desc`, `tikv_gc_leader_lease`: the current GC leader information. From cbd1158e36968a60a2ad5eebbb0bfab0bb3b98c5 Mon Sep 17 00:00:00 2001 From: Caitin <34535727+CaitinChen@users.noreply.github.com> Date: Tue, 24 Apr 2018 17:00:55 +0800 Subject: [PATCH 8/8] Update gc.md --- op-guide/gc.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/op-guide/gc.md b/op-guide/gc.md index 1d53a19d3315e..694e0d1e5a004 100644 --- a/op-guide/gc.md +++ b/op-guide/gc.md @@ -81,4 +81,4 @@ The TiDB transaction model is inspired by Google's Percolator. It's mainly a two ### 3. Do GC Clear the data before the safe point of each key and the write record. -> **Note**: If the last record in all the write records of `Put` and `Delete` types before the safe point is `Put`, this record and its data cannot be deleted directly. Otherwise, you cannot successfully perform the read operation whose timestamp is after the safe point and before the next key version. +> **Note**: If the last record in all the write records of `Put` and `Delete` types before the safe point is `Put`, this record and its data cannot be deleted directly. Otherwise, you cannot successfully perform the read operation whose timestamp is after the safe point and before the next version of the key.