Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Some proposal for renaming and configurations for Lock View #24718

Merged
merged 9 commits into from
May 19, 2021
46 changes: 33 additions & 13 deletions docs/design/2021-04-26-lock-view.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# TiDB Design Documents

- Author(s): [longfangsong](https://github.com/longfangsong), [MyonKeminta](http://github.com/MyonKeminta)
- Last updated: May 6, 2021
- Last updated: May 18, 2021
- Discussion PR: N/A
- Tracking Issue: https://github.com/pingcap/tidb/issues/24199

Expand Down Expand Up @@ -35,14 +35,14 @@ Several tables will be provided in `information_schema`. Some tables has both lo

| Field | Type | Comment |
|------------|------------|---------|
|`TRX_ID` | `unsigned bigint` | The transaction ID (aka. start ts) |
|`TRX_STARTED`|`time`| Human readable start time of the transaction |
|`DIGEST`|`text`| The digest of the current executing SQL statement |
|`SQLS` | `text` | A list of all executed SQL statements' digests |
|`STATE`| `enum('Running', 'Lock waiting', 'Committing', 'RollingBack')`| The state of the transaction |
| `TRX_ID` | `unsigned bigint` | The transaction ID (aka. start ts) |
| `TRX_STARTED`|`time`| Human readable start time of the transaction |
| `DIGEST`|`text`| The digest of the current executing SQL statement |
| `ALL_SQLS` | `text` | A list of all executed SQL statements' digests |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original SQLS doesn't looks understandable.
Maybe ALL_SQLS is still kind of confusing. Maybe it's better to name it something like ALL_SQL_DIGESTS, FINISHED_SQL_DIGESTS 🤔

Copy link
Contributor

@longfangsong longfangsong May 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be too big for this sprint, but:
Would it be better if we make them a seperated table? eg. TRX_STATEMENT.

Field Type Comment
TRX_ID unsigned bigint
digest text
index int This sql is the index-th statement in this transaction

And we can probably add more fields here, eg. each statements's execution time, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks logically correct to split it to another table, but I'm afraid it's too complicated, and it would be hard to get consistent data from this table and TIDB_TRX when the user wants to join them. It would also be kind of difficult to attach the information to the DEADLOCK table like how we designed before.
What's your opinion? @youjiali1995 @cfzjywxk

By the way, I think we need to implement it in this sprint since we've planed so. Actually, I'm about to start trying it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currenly maybe we could do it in the simple way.

| `STATE`| `enum('Running', 'Lock waiting', 'Committing', 'RollingBack')`| The state of the transaction |
| `WAITING_START_TIME` | `time` | The elapsed time since the start of the current lock waiting (if any) |
| `SCOPE` | `enum('Global', 'Local')` | The scope of the transaction |
| `ISOLATION_LEVEL` | `enum('RR', 'RC')` | |
| `ISOLATION_LEVEL` | `enum('REPEATABLE-READ', 'READ-COMMITTED')` | |
| `AUTOCOMMIT` | `bool` | |
| `SESSION_ID` | `unsigned bigint` | |
| `USER` | `varchar` | |
Expand Down Expand Up @@ -79,24 +79,28 @@ Several tables will be provided in `information_schema`. Some tables has both lo
* Permission:
* `PROCESS` privilege is needed to access this table.

### Table `(CLUSTER_)DEAD_LOCK`
### Table `(CLUSTER_)DEADLOCKS`

| Field | Type | Comment |
|------------|------------|---------|
| `DEADLOCK_ID` | `int` | There needs multiple rows to represent information of a single deadlock event. This field is used to distinguish different events. |
| `OCCUR_TIME` | `time` | The physical time when the deadlock occurs |
| `RETRYABLE` | `bool` | Is the deadlock retryable. TiDB tries to determine if the current statement is (indirectly) waiting for a lock locked by the current statement. |
| `TRY_LOCK_TRX_ID` | `unsigned bigint` | The transaction ID (start ts) of the transaction that's trying to acquire the lock |
| `CURRENT_SQL_DIGEST` | `text` | The SQL that's being blocked |
| `KEY` | `varchar` | The key that's being locked, but locked by another transaction in the deadlock event |
| `SQLS` | `text` | A list of the digest of SQL statements that the transaction has executed |
| `ALL_SQLS` | `text` | A list of the digest of SQL statements that the transaction has executed |
| `TRX_HOLDING_LOCK` | `unsigned bigint` | The transaction that's currently holding the lock. There will be another record in the table with the same `DEADLOCK_ID` for that transaction. |

* Life span of rows:
* Create after TiDB receive a deadlock error
* FIFO,clean the oldest after buffer is full
* Collecting, storing and querying:
* All of these information can be collected on TiDB side. It just need to add the information to the table when receives deadlock error from TiKV. The information of other transactions involved in the deadlock circle needed to be fetched from elsewhere (the `TIDB_TRX` table) when handling the deadlock error.
* Currently there are no much information in the deadlock error (it doesn't has the SQLs and keys' information), which needs to be improved.
* All of these information can be collected on TiDB side. It just need to add the information to the table when receives deadlock error from TiKV. The information of other transactions involved in the deadlock circle needed to be fetched from elsewhere (the `CLUSTER_TIDB_TRX` table) when handling the deadlock error.
* TiKV needs to report more rich information in the deadlock error for collecting.
* There are two types of deadlock errors internally: retryable or non-retryable. The transaction will internally retry on retryable deadlocks and won't report error to the client. Therefore, the user are typically more interested in the non-retryable deadlocks.
* Retryable deadlock errors are by default not collected, and can be enabled with configuration.
* Collecting `CLUSTER_TIDB_TRX` for more rich information for retryable deadlock is possible to make the performance worse. Whether it will be collected for retryable deadlock will be decided after some tests.
* Permission:
* `PROCESS` privilege is needed to access this table.

Expand Down Expand Up @@ -151,9 +155,25 @@ The locking key and `resource_group_tag` that comes from the `Context` of the pe

The wait chain will be added to the `Deadlock` error which is returned by the `PessimisticLock` request, so that when deadlock happens, the full wait chain information can be passed to TiDB.

### Configurations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe some background HTTP API could be used, currently, we need to be careful adding system variables as there are already many and once released we can't take them back.


#### TiDB Config File `pessimistic-txn.tidb_deadlock_history_capacity`

Specifies how many recent deadlock events each TiDB node should keep.
Dynamically changeable via HTTP API.
Value: 0 to 10000
Default: 10

#### TiDB Config File `pessimistic-txn.tidb_deadlock_history_collect_retryable`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be a low priority development task and perhaps out of this sprint, depending on the time.


Specifies whether to collect retryable deadlock errors to the `(CLUSTER_)DEADLOCKS` table.
Dynamically changeable via HTTP API.
Value: 0 (do not collect) or 1 (collect)
Default: 0

## Compatibility

This feature is not expected to be incompatible with other features. During upgrading, when there are different versions of TiDB nodes exists at the same time, it's possible that the `CLUSTER_` prefixed tables may encounter errors. But since this feature is typically used by user manually, this shouldn't be a severe problem. So we don't need to care much about that.
This feature is not expected to be incompatible with other features. During upgrading, when there are different versions of TiDB nodes exists at the same time, it's possible that the `CLUSTER_` prefixed tables may encounter errors. However, since this feature is typically used by user manually, this shouldn't be a severe problem. So we don't need to care much about that.

## Test Design

Expand Down Expand Up @@ -190,7 +210,7 @@ This feature is not expected to be incompatible with other features. During upgr

* Since lock waiting on TiKV may timeout and retry, it's possible that in a single query to `DATA_LOCK_WAIT` table doesn't shows all (logical) lock waiting.
* Information about internal transactions may not be collected in our first version of implementation.
* Since TiDB need to query transaction information after it receives the deadlock error, the transactions' status may be changed during that time. As a result the information in `(CLUSTER_)DEAD_LOCK` table can't be promised to be accurate and complete.
* Since TiDB need to query transaction information after it receives the deadlock error, the transactions' status may be changed during that time. As a result the information in `(CLUSTER_)DEADLOCKS` table can't be promised to be accurate and complete.
* Statistics about transaction conflicts is still not enough.
* Historical information of `TIDB_TRX` and `DATA_LOCK_WAITS` is not kept, which possibly makes it still difficult to investigate some kind of problems.
* The SQL digest that's holding lock and blocking the current transaction is hard to retrieve and is not included in the current design.