Skip to content

Latest commit

 

History

History
284 lines (203 loc) · 13.9 KB

2021-04-26-lock-view.md

File metadata and controls

284 lines (203 loc) · 13.9 KB

TiDB Design Documents

Table of Contents

Introduction

This document describes the design of the feature Lock View, which provides tools to analyze problems about transaction's lock waiting, lock contentions and deadlocks.

Motivation or Background

Currently, it's very hard to analyze lock contentions and deadlocks for transactions. One may need to enable general log, try to reproduce the problem, and try to analyze the log to find the cause, which is very difficult and inconvenient. Even if you have the log, many conflict's information contains only a start_ts, which won't tell you much useful information (like the SQLs in the transaction) for reproducing the problem. We also found that this way of analyzing is not feasible in some scenarios. It's highly required to provide some better approach to analyze these kinds of problems.

Detailed Design

Several tables will be provided in information_schema. Some tables has both local version (fetches data on the current TiDB node) and global version (fetches data among the whole cluster), and the global version's table name has the "CLUSTER_" prefix.

Table (CLUSTER_)TIDB_TRX

Field Type Comment
TRX_ID unsigned bigint The transaction ID (aka. start ts)
TRX_STARTED time Human readable start time of the transaction
CURRENT_SQL_DIGEST varchar(64) The digest of the current executing SQL statement
ALL_SQL_DIGESTS text A list of all executed SQL statements' digests
STATE enum('Running', 'Lock waiting', 'Committing', 'RollingBack') The state of the transaction
WAITING_START_TIME time The elapsed time since the start of the current lock waiting (if any)
SCOPE enum('Global', 'Local') The scope of the transaction
ISOLATION_LEVEL enum('REPEATABLE-READ', 'READ-COMMITTED')
AUTOCOMMIT bool
SESSION_ID unsigned bigint
USER varchar
DB varchar
SET_COUNT int Modified keys of the current transaction
LOCKED_COUNT int Locked keys of the current transaction
MEM_BUFFER_KEYS int Entries in transaction's membuffer
MEM_BUFFER_BYTES int Size occupied by the transaction's membuffer
  • Life span of rows:
    • Create on first writing or locking operation in a transaction
    • Remove after the transaction is done
  • Collecting, storing and querying:
    • All these information can be collected on TiDB side. Since the amount of concurrent transactions won't be too large, and it doesn't need to be persisted, so it's ok to implement it as a memory table. For querying among the cluster, just register the table under infoschema/cluster.go and write the global table name with the local one.
    • As the simplest way of implementing, most information can be passed with a similar way like ProcessInfo, or even directly passed via the ProcessInfo struct.
  • Permission:
    • PROCESS privilege is needed to access the full content of this table. For users without PROCESS permission, only transactions started by the current user will be shown, and others will be filtered out, which is similar to the processlist table.

Table DATA_LOCK_WAITS

Field Type Comment
KEY varchar The key that's being waiting on
TRX_ID unsigned bigint The current transaction that's waiting for the lock
SQL_DIGEST varchar(64) The digest of the SQL that's trying to acquire the lock
CURRENT_HOLDING_TRX_ID unsigned bigint The transaction that's holding the lock and blocks the current transaction
  • Life span of rows:
    • Created on a lock come into LockManager
    • Removed after a lock leave LockManager
  • Collecting, storing and querying:
    • All these will be collected on TiKV LockManager, and will need a new RPC entry for TiDB to query. LockManager won't store the un-hashed key or SQL_DIGEST for now, so we need to modify it.
  • The SQL Digest of the transaction that's currently holding the lock may be helpful, but it's hard to implement under the current architecture. So it won't be included in the first version of the feature.
  • Permission:
    • PROCESS privilege is needed to access this table.

Table (CLUSTER_)DEADLOCKS

Field Type Comment
DEADLOCK_ID int There needs multiple rows to represent information of a single deadlock event. This field is used to distinguish different events.
OCCUR_TIME time The physical time when the deadlock occurs
RETRYABLE bool Is the deadlock retryable. TiDB tries to determine if the current statement is (indirectly) waiting for a lock locked by the current statement.
TRY_LOCK_TRX_ID unsigned bigint The transaction ID (start ts) of the transaction that's trying to acquire the lock
CURRENT_SQL_DIGEST text The SQL that's being blocked
KEY varchar The key that's being locked, but locked by another transaction in the deadlock event
ALL_SQL_DIGESTS text A list of the digest of SQL statements that the transaction has executed
TRX_HOLDING_LOCK unsigned bigint The transaction that's currently holding the lock. There will be another record in the table with the same DEADLOCK_ID for that transaction.
  • Life span of rows:
    • Create after TiDB receive a deadlock error
    • FIFO,clean the oldest after buffer is full
  • Collecting, storing and querying:
    • All of these information can be collected on TiDB side. It just need to add the information to the table when receives deadlock error from TiKV. The information of other transactions involved in the deadlock circle needed to be fetched from elsewhere (the CLUSTER_TIDB_TRX table) when handling the deadlock error.
    • TiKV needs to report more rich information in the deadlock error for collecting.
    • There are two types of deadlock errors internally: retryable or non-retryable. The transaction will internally retry on retryable deadlocks and won't report error to the client. Therefore, the user are typically more interested in the non-retryable deadlocks.
      • Retryable deadlock errors are by default not collected, and can be enabled with configuration.
      • Collecting CLUSTER_TIDB_TRX for more rich information for retryable deadlock is possible to make the performance worse. Whether it will be collected for retryable deadlock will be decided after some tests.
  • Permission:
    • PROCESS privilege is needed to access this table.

TABLE (CLUSTER_)TRANSACTION_SUMMARY

Field Type Comment
DIGEST varchar(16) Digest of a transaction, calculated with ALL_SQL_DIGEST
ALL_SQL_DIGEST text A json array which contains all SQLs' digest executed in this kind of transaction
  • Life span of rows:
    • Create after first this kind of transaction ended.
    • LRU,clean the least existed(in TRANSACTION_ID_DIGEST, see below) after buffer is full.
  • Collecting, storing and querying:
    • All of these information can be collected on TiDB side. It just need to add the information to the table when a transaction ended.
  • Permission:
    • PROCESS privilege is needed to access this table.

TABLE (CLUSTER_)TRANSACTION_ID_DIGEST

Field Type Comment
DIGEST varchar(16) Digest of a transaction, calculated with ALL_SQL_DIGEST
TRX_ID bigint The transaction ID (aka. start ts)
  • Life span of rows:
    • Create after a transaction ended and it meets some certain condition (due to the memory is limited).
    • Currently, the condition is that the transaction executed too slow, which has more chance to be conflict with other transactions.
    • FIFO,clean the oldest after buffer is full.
  • Collecting, storing and querying:
    • All of these information can be collected on TiDB side. It just need to add the information to the table when a transaction ended.
  • Permission:
    • PROCESS privilege is needed to access this table.

Protocol

To pass necessary information between TiDB and TiKV to make this feature possible, there needs some additional information carried in the protocol defined in kvproto.

deadlockpb:

  message WaitForEntry {
    ...
+   bytes key = ...;
+   bytes resource_group_tag = ...;
  }

  message DeadlockResponse {
    ...
+   repeated WaitiForEntry wait_chain = ...;
  }

kvrpcpb:

  message Context {
    ...
+   bytes resource_group_tag = ...;
  }

  message Deadlock {
    ...
+   repeated deadlock.WaitForEntry wait_chain = ...;
  }

+ message GetLockWaitInfoRequest {
+     Context context = 1;
+ }
+ 
+ message GetLockWaitInfoResponse {
+     errorpb.Error region_error = 1;
+     string error = 2;
+     repeated deadlock.WaitForEntry entries = 3;
+ }

A field resource_group_tag will be added to Context. The SQL digest (and maybe more information) will be serialized and carried in this field. This field is expected to be reused by another feature named Top SQL which wants to carry SQL digest and plan to most transactional requests.

A new KV RPC GetLockWait will be added to allow getting the lock waiting status from TiKV. This is a store-level (instead of region level) request, like UnsafeDestroyRange, and those Green GC related RPCs. The request can carry some filtering options to filter out those information the user don't care about. But the current memory table implementation only allow TiDB to scan the whole table and then filter it. This may need further optimization in the future.

The locking key and resource_group_tag that comes from the Context of the pessimistic lock request is added to the deadlock detect request, and the wait chain is added to the deadlock detect response.

The wait chain will be added to the Deadlock error which is returned by the PessimisticLock request, so that when deadlock happens, the full wait chain information can be passed to TiDB.

Configurations

TiDB Config File pessimistic-txn.deadlock-history-capacity

Specifies how many recent deadlock events each TiDB node should keep.

Dynamically changeable via HTTP API.

Value: 0 to 10000

Default: 10

TiDB Config File pessimistic-txn.deadlock-history-collect-retryable

Specifies whether to collect retryable deadlock errors to the (CLUSTER_)DEADLOCKS table.

Dynamically changeable via HTTP API.

Value: 0 (do not collect) or 1 (collect)

Default: 0

TiDB Config File transaction-summary.transaction-id-digest-capacity

Specifies how many transaction in transaction_id_digest each TiDB node should keep.

Dynamically changeable via HTTP API.

Value: 0 to 100000

Default: 10000

TiDB Config File transaction-summary.transaction-id-digest-min-duration

Specifies how long a transaction should be executed to make it be recorded in transaction_id_digest and considered when calculating trx_summary.

Dynamically changeable via HTTP API.

Value: 0 to 2147483647

Unit: ms

Default: 1000

TiDB Config File transaction-summary.transaction-summary-capacity

Specifies how many transaction summary in trx_summary each TiDB node should keep.

Dynamically changeable via HTTP API.

Value: 0 to 5000

Default: 500

Compatibility

This feature is not expected to be incompatible with other features. During upgrading, when there are different versions of TiDB nodes exists at the same time, it's possible that the CLUSTER_ prefixed tables may encounter errors. However, since this feature is typically used by user manually, this shouldn't be a severe problem. So we don't need to care much about that.

Test Design

Functional Tests

  • Querying the tables defined above gives correct result.

Scenario Tests

  • In a scenario where there's lock contention, this feature helps locating the problem.
  • In a scenario where some a SQL is blocked by another transaction, this feature helps locating the problem.
  • In a scenario where deadlock happens, this feature helps finding how the deadlock is formed.

Compatibility Tests

  • N/A

Benchmark Tests

  • The feature shouldn't cause any obvious performance regression (< 2%) on normal scenarios.
  • Accessing these tables shouldn't increase latency of concurrent normal queries.

Impacts & Risks

  • To be investigated

Investigation & Alternatives

  • MySQL provides data_locks and data_lock_waits tables.
  • Oracle provides v$lock view.
  • CRDB provides crdb_internal.node_transaction_statistics that shows rich information for transactions.

Unresolved Questions

  • Since lock waiting on TiKV may timeout and retry, it's possible that in a single query to DATA_LOCK_WAIT table doesn't shows all (logical) lock waiting.
  • Information about internal transactions may not be collected in our first version of implementation.
  • Since TiDB need to query transaction information after it receives the deadlock error, the transactions' status may be changed during that time. As a result the information in (CLUSTER_)DEADLOCKS table can't be promised to be accurate and complete.
  • Statistics about transaction conflicts is still not enough.
  • Historical information of TIDB_TRX and DATA_LOCK_WAITS is not kept, which possibly makes it still difficult to investigate some kind of problems.