-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Acquire read lock in LockKeys to avoid data race #585
Conversation
@@ -86,6 +86,7 @@ func (actionPessimisticRollback) tiKVTxnRegionsNumHistogram() prometheus.Observe | |||
func (action actionPessimisticLock) handleSingleBatch(c *twoPhaseCommitter, bo *retry.Backoffer, batch batchMutations) error { | |||
m := batch.mutations | |||
mutations := make([]*kvrpcpb.Mutation, m.Len()) | |||
c.txn.GetMemBuffer().RLock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the contention happens here would it be possible that actionPessimisticLock
tasks are processed one by one according to the feature of the RWLock
? I'm not quite sure if it may introduce extra risks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that all later RLock
requests will wait for a queuing write lock request? Could you point out which write lock may cause it to run one by one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example:
Update // Try to acquire the write lock.
Some possible parallel executors(like parallel projection in the early issue)
selectLockExec // Try to acquire read locks.
The update execution may try to Set
new key-value pairs into the memory buffer which needs to acquire the write lock, while the parallel actionPessimisticLock
operations triggered by LockKeys
are being processed. If an actionPessimisticLock
gets the read lock successfully first and then the write lock acquisition is pending, other actionPessimisticLock
requests may have to wait.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Yet I think it will have little impact if all these operations are fast. And the goroutines that are blocked after the write lock will then acquire the read lock at the same time, so they can run concurrently as before.
I will try to design a worst case to see how much impact it will bring.
We may need a test case to verify that the panic could be avoided, some benchmark tests are needed too. |
Signed-off-by: Yilin Chen <sticnarf@gmail.com>
dd2de13
to
9a1aa06
Compare
Table schema: CREATE TABLE `t` (
`id` bigint(20) NOT NULL,
`v` bigint(20) DEFAULT NULL,
PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
PARTITION BY HASH (`id`) PARTITIONS 10 Workload: BEGIN;
UPDATE t SET v = v + 1 WHERE id BETWEEN {x} AND {x} + 1000
ROLLBACK; Execution plan:
I revert pingcap/tidb#30290 to allow concurrency for the projection executor in this plan. I limit the CPU of TiDB to only 4 cores to let TiDB be a bottleneck.
In benchbot TPC-C 400 threads test, I find no regression but the TiDB CPU is not 100% used. This PR: 69682 QPS, Baseline: 68184. (It should not improve performance, it should be just small environment error). I'll also benchmark against other cases when TiDB is the bottleneck. |
I also test sysbench read_write in the benchbot, which almost runs out of TiDB CPU. (Sorry, the links below are not publicly accessible) There seems no big difference |
Signed-off-by: Yilin Chen <sticnarf@gmail.com> Signed-off-by: Yilin Chen <sticnarf@gmail.com>
* Clear intersecting regions in the cache when inserting a region (#566) Signed-off-by: Yilin Chen <sticnarf@gmail.com> * Acquire read lock in LockKeys to avoid data race (#585) Signed-off-by: Yilin Chen <sticnarf@gmail.com> Signed-off-by: Yilin Chen <sticnarf@gmail.com> Signed-off-by: Yilin Chen <sticnarf@gmail.com>
Signed-off-by: Yilin Chen <sticnarf@gmail.com> Signed-off-by: Yilin Chen <sticnarf@gmail.com> Signed-off-by: guo-shaoge <shaoge1994@163.com>
This PR tries to reduce the risk of data race in memDB like pingcap/tidb#26832, by acquiring the read lock when it needs to read the memDB in the execution path of
LockKeys
. The sameRWMutex
is used inmemDB.set
, so we won't have concurrent write and read.Throughout our observations about similar problems, most of them are connected with locking keys while updating the membuf. So, although this PR does not thoroughly solve the problem of memDB data race, I think it will largely reduce its possibility in the real world.
The granularity of locking is relatively coarse to minimize the performance impact as much as possible.
extractKeyExistsErr
is not in the critical path, so the lock is acquired in each call.