This proposes a design of how to control global memory of TiDB instance.
Currently, TiDB has a query-level memory control strategy mem-quota-query
, which triggers Cancel when the memory usage of a single SQL exceeds mem-quota-query
. However, there is currently no global memory control strategy.
When TiDB has multiple SQLs whose memory usage does not exceed mem-quota-query
or memory tracking inaccurate, it will lead to high memory usage or even OOM.
Therefore, we need an observer to check whether the memory usage of the current system is normal. When there are some problems, try to control TiDB's memory no longer continue to grow, to reduce the risk of process crashes.
- Control the TiDB execution memory within the system variable
tidb_server_memory_limit
.
New system variables:
tidb_server_memory_limit
: TiDB maintains the overall memory usage withintidb_server_memory_limit
tidb_server_memory_gc_trigger
: When TiDB memory usage reaches a certain percentage oftidb_server_memory_limit
, try to take the initiative to trigger golang GC to release memorytidb_server_memory_limit_sess_min_size
: The minimum memory of a session that can be killed by TiDB
We need to implement the following three functions to control the memory usage of TiDB:
- Kill the SQL with the most memory usage in the current system, when
HeapInuse
is larger thantidb_server_memory_limit
. - Take the initiative to trigger
runtime.GC()
, whenHeapInuse
is large thantidb_server_memory_limit
*tidb_server_memory_limit_gc_trigger
. - Introduce some memory tables to observe the memory status of the current system.
New variables:
- Global variable
MemUsageTop1Tracker atomic.Pointer[Tracker]
: Indicates the Tracker with the largest memory usage. - The flag
NeedKill atomic.Bool
in the structureTracker
: Indicates whether the SQL for the current Tracker needs to be Killed. SessionID int64
in Structure Tracker: Indicates the Session ID corresponding to the current Tracker.
Implements:
When Tracker.Consume()
calling, check the following logic. If all are satisfied, update the MemUsageTop1Tracker
.
- Is it a Session-level Tracker?
- Whether the flag
NeedKill
is false, to avoid cancel the current SQL twice - Whether the memory usage exceeds the threshold
tidb_server_memory_limit_sess_min_size
(default 128MB, can be dynamically adjusted), can be candidate of theMemUsageTop1Tracker
- Is the memory usage of the current Tracker greater than the current
MemUsageTop1Tracker
- Create a goroutine that calls Golang's
ReadMemStat
interface in a 100 ms cycle. (Get the memory usage of the current TiDB instance) - If the
heapInuse
of the current instance is greater thantidb_server_memory_limit
, setMemUsageTop1Tracker
'sNeedKill
flag. (Sends a Kill signal) - When the SQL call to
Tracker.Consume()
, check its ownNeedKill
flag. If it is true, trigger Panic and exit. (terminates the execution of SQL) - Get the
SessionID
from the tracker and continuously query its status, waiting for it to complete exited. When SQL successfully exited, explicitly trigger Golang GC to release memory. (Wait for SQL exited completely and release memory)
The inspiration for this design comes from uber-go-gc-tuner:
- Use the Go1.19
SetMemoryLimit
feature to set the soft limit totidb_server_memory_limit
*tidb_server_memory_limit_gc_trigger
to ensure that GC can be triggered when reaching the certain threshold. - After each GC, check whether this GC is caused by memory limit. If it is caused by this, temporarily set memory limit to infinite, and then set it back to the specified threshold after 1 minute. In this way, the problem of frequent GC caused by
heapInUse
being larger than the soft limit can be avoided.
Introduce performance_schema.memory_usage
and performance_schema.memory_usage_ops_history
to display the current system memory usage and historical operations.
This can be implemented by maintaining a set of global data, and reading and outputting directly from the global data when querying.