Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Acquire read lock in LockKeys to avoid data race #585

Merged
merged 3 commits into from
Oct 10, 2022

Conversation

sticnarf
Copy link
Collaborator

This PR tries to reduce the risk of data race in memDB like pingcap/tidb#26832, by acquiring the read lock when it needs to read the memDB in the execution path of LockKeys. The same RWMutex is used in memDB.set, so we won't have concurrent write and read.

Throughout our observations about similar problems, most of them are connected with locking keys while updating the membuf. So, although this PR does not thoroughly solve the problem of memDB data race, I think it will largely reduce its possibility in the real world.

The granularity of locking is relatively coarse to minimize the performance impact as much as possible. extractKeyExistsErr is not in the critical path, so the lock is acquired in each call.

@@ -86,6 +86,7 @@ func (actionPessimisticRollback) tiKVTxnRegionsNumHistogram() prometheus.Observe
func (action actionPessimisticLock) handleSingleBatch(c *twoPhaseCommitter, bo *retry.Backoffer, batch batchMutations) error {
m := batch.mutations
mutations := make([]*kvrpcpb.Mutation, m.Len())
c.txn.GetMemBuffer().RLock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the contention happens here would it be possible that actionPessimisticLock tasks are processed one by one according to the feature of the RWLock? I'm not quite sure if it may introduce extra risks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that all later RLock requests will wait for a queuing write lock request? Could you point out which write lock may cause it to run one by one?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example:

Update  // Try to acquire the write lock.
    Some possible parallel executors(like parallel projection in the early issue)
              selectLockExec // Try to acquire read locks.

The update execution may try to Set new key-value pairs into the memory buffer which needs to acquire the write lock, while the parallel actionPessimisticLock operations triggered by LockKeys are being processed. If an actionPessimisticLock gets the read lock successfully first and then the write lock acquisition is pending, other actionPessimisticLock requests may have to wait.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Yet I think it will have little impact if all these operations are fast. And the goroutines that are blocked after the write lock will then acquire the read lock at the same time, so they can run concurrently as before.

I will try to design a worst case to see how much impact it will bring.

@cfzjywxk
Copy link
Contributor

We may need a test case to verify that the panic could be avoided, some benchmark tests are needed too.

Signed-off-by: Yilin Chen <sticnarf@gmail.com>
@sticnarf
Copy link
Collaborator Author

Table schema:

CREATE TABLE `t` (
  `id` bigint(20) NOT NULL,
  `v` bigint(20) DEFAULT NULL,
  PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
PARTITION BY HASH (`id`) PARTITIONS 10

Workload:

BEGIN;
UPDATE t SET v = v + 1 WHERE id BETWEEN {x} AND {x} + 1000
ROLLBACK;

Execution plan:

id                         	task     	estRows	operator info                                                             	actRows	execution info                                                                                                                                                                                                                                                                                       	memory 	disk
Update_5                   	root     	0      	N/A                                                                       	0      	time:440.4ms, loops:5, lock_keys: {time:3.39ms, region:10, keys:1001, slowest_rpc: {total: 0.002s, region_id: 100075, store: 127.0.0.1:30160, tikv_wall_time: 1.83ms, scan_detail: {get_snapshot_time: 2.99µs, rocksdb: {block: {cache_hit_count: 501}}}, }, lock_rpc:12.518293ms, rpc_count:10}    	38.4 KB	N/A
└─Projection_18            	root     	10000  	test.t.id, test.t.v                                                       	5005   	time:421.8ms, loops:55, Concurrency:5                                                                                                                                                                                                                                                                	26.7 KB	N/A
  └─SelectLock_19          	root     	10000  	for update 0                                                              	5005   	time:436ms, loops:55                                                                                                                                                                                                                                                                                 	N/A    	N/A
    └─PartitionUnion_20    	root     	10000  	                                                                          	5005   	time:152.7ms, loops:55                                                                                                                                                                                                                                                                               	N/A    	N/A
      ├─TableReader_22     	root     	1000   	data:TableRangeScan_21                                                    	505    	time:55.8ms, loops:10, cop_task: {num: 5, max: 29ms, min: 403.8µs, avg: 11ms, p95: 29ms, max_proc_keys: 101, p95_proc_keys: 101, rpc_num: 9, rpc_time: 80.7ms, copr_cache_hit_ratio: 0.00, distsql_concurrency: 15}                                                                                 	2.66 KB	N/A
      │ └─TableRangeScan_21	cop[tikv]	1000   	table:t, partition:p0, range:[78180,79180], keep order:false, stats:pseudo	505    	tikv_task:{proc max:0s, min:0s, avg: 0s, p80:0s, p95:0s, iters:15, tasks:5}, scan_detail: {total_process_keys: 505, total_process_keys_size: 20200, total_keys: 510, get_snapshot_time: 19µs, rocksdb: {delete_skipped_count: 505, key_skipped_count: 2210, block: {cache_hit_count: 25}}}          	N/A    	N/A
      ├─TableReader_24     	root     	1000   	data:TableRangeScan_23                                                    	500    	time:31.5ms, loops:10, cop_task: {num: 5, max: 16.8ms, min: 690.4µs, avg: 6.12ms, p95: 16.8ms, max_proc_keys: 100, p95_proc_keys: 100, rpc_num: 9, rpc_time: 44.3ms, copr_cache_hit_ratio: 0.00, distsql_concurrency: 15}                                                                           	2.63 KB	N/A
      │ └─TableRangeScan_23	cop[tikv]	1000   	table:t, partition:p1, range:[78180,79180], keep order:false, stats:pseudo	500    	tikv_task:{proc max:0s, min:0s, avg: 0s, p80:0s, p95:0s, iters:15, tasks:5}, scan_detail: {total_process_keys: 500, total_process_keys_size: 20000, total_keys: 505, get_snapshot_time: 15.9µs, rocksdb: {delete_skipped_count: 500, key_skipped_count: 2282, block: {cache_hit_count: 20}}}        	N/A    	N/A
      ├─TableReader_26     	root     	1000   	data:TableRangeScan_25                                                    	500    	time:44.3ms, loops:10, cop_task: {num: 5, max: 28.9ms, min: 332.4µs, avg: 8.68ms, p95: 28.9ms, max_proc_keys: 100, p95_proc_keys: 100, rpc_num: 9, rpc_time: 77.4ms, copr_cache_hit_ratio: 0.00, distsql_concurrency: 15}                                                                           	2.64 KB	N/A
      │ └─TableRangeScan_25	cop[tikv]	1000   	table:t, partition:p2, range:[78180,79180], keep order:false, stats:pseudo	500    	tikv_task:{proc max:4ms, min:0s, avg: 800µs, p80:4ms, p95:4ms, iters:15, tasks:5}, scan_detail: {total_process_keys: 500, total_process_keys_size: 20000, total_keys: 505, get_snapshot_time: 16.2µs, rocksdb: {delete_skipped_count: 500, key_skipped_count: 2191, block: {cache_hit_count: 20}}} 	N/A    	N/A
      ├─TableReader_28     	root     	1000   	data:TableRangeScan_27                                                    	500    	time:44.3ms, loops:10, cop_task: {num: 5, max: 22.1ms, min: 241.2µs, avg: 8.69ms, p95: 22.1ms, max_proc_keys: 100, p95_proc_keys: 100, rpc_num: 9, rpc_time: 70.4ms, copr_cache_hit_ratio: 0.00, distsql_concurrency: 15}                                                                           	2.64 KB	N/A
      │ └─TableRangeScan_27	cop[tikv]	1000   	table:t, partition:p3, range:[78180,79180], keep order:false, stats:pseudo	500    	tikv_task:{proc max:0s, min:0s, avg: 0s, p80:0s, p95:0s, iters:15, tasks:5}, scan_detail: {total_process_keys: 500, total_process_keys_size: 20000, total_keys: 505, get_snapshot_time: 15.6µs, rocksdb: {delete_skipped_count: 500, key_skipped_count: 2191, block: {cache_hit_count: 20}}}        	N/A    	N/A
      ├─TableReader_30     	root     	1000   	data:TableRangeScan_29                                                    	500    	time:54.8ms, loops:10, cop_task: {num: 5, max: 30.5ms, min: 463.6µs, avg: 10.8ms, p95: 30.5ms, max_proc_keys: 100, p95_proc_keys: 100, rpc_num: 9, rpc_time: 77.4ms, copr_cache_hit_ratio: 0.00, distsql_concurrency: 15}                                                                           	2.64 KB	N/A
      │ └─TableRangeScan_29	cop[tikv]	1000   	table:t, partition:p4, range:[78180,79180], keep order:false, stats:pseudo	500    	tikv_task:{proc max:0s, min:0s, avg: 0s, p80:0s, p95:0s, iters:15, tasks:5}, scan_detail: {total_process_keys: 500, total_process_keys_size: 20000, total_keys: 1005, get_snapshot_time: 13.7µs, rocksdb: {delete_skipped_count: 500, key_skipped_count: 2694, block: {cache_hit_count: 20}}}       	N/A    	N/A
      ├─TableReader_32     	root     	1000   	data:TableRangeScan_31                                                    	500    	time:26.2ms, loops:10, cop_task: {num: 5, max: 11.3ms, min: 423.1µs, avg: 4.52ms, p95: 11.3ms, max_proc_keys: 100, p95_proc_keys: 100, rpc_num: 9, rpc_time: 35.3ms, copr_cache_hit_ratio: 0.00, distsql_concurrency: 15}                                                                           	2.64 KB	N/A
      │ └─TableRangeScan_31	cop[tikv]	1000   	table:t, partition:p5, range:[78180,79180], keep order:false, stats:pseudo	500    	tikv_task:{proc max:0s, min:0s, avg: 0s, p80:0s, p95:0s, iters:15, tasks:5}, scan_detail: {total_process_keys: 500, total_process_keys_size: 20000, total_keys: 1005, get_snapshot_time: 17.3µs, rocksdb: {delete_skipped_count: 500, key_skipped_count: 2789, block: {cache_hit_count: 20}}}       	N/A    	N/A
      ├─TableReader_34     	root     	1000   	data:TableRangeScan_33                                                    	500    	time:26.4ms, loops:10, cop_task: {num: 5, max: 12.8ms, min: 376.7µs, avg: 5.16ms, p95: 12.8ms, max_proc_keys: 100, p95_proc_keys: 100, rpc_num: 9, rpc_time: 38.8ms, copr_cache_hit_ratio: 0.00, distsql_concurrency: 15}                                                                           	2.64 KB	N/A
      │ └─TableRangeScan_33	cop[tikv]	1000   	table:t, partition:p6, range:[78180,79180], keep order:false, stats:pseudo	500    	tikv_task:{proc max:0s, min:0s, avg: 0s, p80:0s, p95:0s, iters:15, tasks:5}, scan_detail: {total_process_keys: 500, total_process_keys_size: 20000, total_keys: 1005, get_snapshot_time: 18.3µs, rocksdb: {delete_skipped_count: 500, key_skipped_count: 2694, block: {cache_hit_count: 20}}}       	N/A    	N/A
      ├─TableReader_36     	root     	1000   	data:TableRangeScan_35                                                    	500    	time:29.8ms, loops:10, cop_task: {num: 5, max: 17.5ms, min: 420µs, avg: 5.32ms, p95: 17.5ms, max_proc_keys: 100, p95_proc_keys: 100, rpc_num: 9, rpc_time: 45.8ms, copr_cache_hit_ratio: 0.00, distsql_concurrency: 15}                                                                             	2.64 KB	N/A
      │ └─TableRangeScan_35	cop[tikv]	1000   	table:t, partition:p7, range:[78180,79180], keep order:false, stats:pseudo	500    	tikv_task:{proc max:0s, min:0s, avg: 0s, p80:0s, p95:0s, iters:15, tasks:5}, scan_detail: {total_process_keys: 500, total_process_keys_size: 20000, total_keys: 1005, get_snapshot_time: 17.9µs, rocksdb: {delete_skipped_count: 500, key_skipped_count: 2787, block: {cache_hit_count: 20}}}       	N/A    	N/A
      ├─TableReader_38     	root     	1000   	data:TableRangeScan_37                                                    	500    	time:16.6ms, loops:10, cop_task: {num: 5, max: 7.3ms, min: 308.6µs, avg: 2.69ms, p95: 7.3ms, max_proc_keys: 100, p95_proc_keys: 100, rpc_num: 9, rpc_time: 22.1ms, copr_cache_hit_ratio: 0.00, distsql_concurrency: 15}                                                                             	2.64 KB	N/A
      │ └─TableRangeScan_37	cop[tikv]	1000   	table:t, partition:p8, range:[78180,79180], keep order:false, stats:pseudo	500    	tikv_task:{proc max:0s, min:0s, avg: 0s, p80:0s, p95:0s, iters:15, tasks:5}, scan_detail: {total_process_keys: 500, total_process_keys_size: 20000, total_keys: 1005, get_snapshot_time: 17.5µs, rocksdb: {delete_skipped_count: 500, key_skipped_count: 2787, block: {cache_hit_count: 20}}}       	N/A    	N/A
      └─TableReader_40     	root     	1000   	data:TableRangeScan_39                                                    	500    	time:64.4ms, loops:10, cop_task: {num: 5, max: 34.7ms, min: 193.5µs, avg: 12.6ms, p95: 34.7ms, max_proc_keys: 100, p95_proc_keys: 100, rpc_num: 9, rpc_time: 91.4ms, copr_cache_hit_ratio: 0.00, distsql_concurrency: 15}                                                                           	2.64 KB	N/A
        └─TableRangeScan_39	cop[tikv]	1000   	table:t, partition:p9, range:[78180,79180], keep order:false, stats:pseudo	500    	tikv_task:{proc max:4ms, min:0s, avg: 800µs, p80:4ms, p95:4ms, iters:15, tasks:5}, scan_detail: {total_process_keys: 500, total_process_keys_size: 20000, total_keys: 1005, get_snapshot_time: 15.7µs, rocksdb: {delete_skipped_count: 500, key_skipped_count: 2687, block: {cache_hit_count: 20}}}	N/A    	N/A

I revert pingcap/tidb#30290 to allow concurrency for the projection executor in this plan.

I limit the CPU of TiDB to only 4 cores to let TiDB be a bottleneck.

Threads 4 100
Master w/o 30290 Panic quickly 285 TPS
This PR w/o 30290 287 TPS 271 TPS
Master w/ 30290 278 TPS 278 TPS

In benchbot TPC-C 400 threads test, I find no regression but the TiDB CPU is not 100% used. This PR: 69682 QPS, Baseline: 68184. (It should not improve performance, it should be just small environment error).

I'll also benchmark against other cases when TiDB is the bottleneck.

@sticnarf
Copy link
Collaborator Author

I also test sysbench read_write in the benchbot, which almost runs out of TiDB CPU.

(Sorry, the links below are not publicly accessible)
http://perf.pingcap.net/d/MB_crJk4z/oltp-bench-details?orgId=1&var-oltp_summary_log_id=4020030 (baseline) vs http://perf.pingcap.net/d/MB_crJk4z/oltp-bench-details?orgId=1&var-oltp_summary_log_id=4050030 (with this PR)

There seems no big difference

image

@cfzjywxk cfzjywxk merged commit 8dfd76b into tikv:master Oct 10, 2022
sticnarf added a commit to sticnarf/client-go that referenced this pull request Oct 12, 2022
Signed-off-by: Yilin Chen <sticnarf@gmail.com>

Signed-off-by: Yilin Chen <sticnarf@gmail.com>
cfzjywxk pushed a commit that referenced this pull request Oct 12, 2022
* Clear intersecting regions in the cache when inserting a region (#566)

Signed-off-by: Yilin Chen <sticnarf@gmail.com>

* Acquire read lock in LockKeys to avoid data race (#585)

Signed-off-by: Yilin Chen <sticnarf@gmail.com>

Signed-off-by: Yilin Chen <sticnarf@gmail.com>

Signed-off-by: Yilin Chen <sticnarf@gmail.com>
guo-shaoge pushed a commit to guo-shaoge/client-go that referenced this pull request Oct 13, 2022
Signed-off-by: Yilin Chen <sticnarf@gmail.com>

Signed-off-by: Yilin Chen <sticnarf@gmail.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants