kvserver: revisit `COCKROACH_SCHEDULER_CONCURRENCY` cap #99063

erikgrinaker · 2023-03-20T20:56:22Z

We currently cap COCKROACH_SCHEDULER_CONCURRENCY at 96, to avoid mutex collapse as seen in #56851.

Lines 116 to 124 in 4e8e867

    
           var storeSchedulerConcurrency = envutil.EnvOrDefaultInt( 
        
           	// For small machines, we scale the scheduler concurrency by the number of 
        
           	// CPUs. 8*NumCPU was determined in 9a68241 (April 2017) as the optimal 
        
           	// concurrency level on 8 CPU machines. For larger machines, we've seen 
        
           	// (#56851) that this scaling curve can be too aggressive and lead to too much 
        
           	// contention in the Raft scheduler, so we cap the concurrency level at 96. 
        
           	// 
        
           	// As of November 2020, this default value could be re-tuned. 
        
           	"COCKROACH_SCHEDULER_CONCURRENCY", min(8*runtime.GOMAXPROCS(0), 96))

However, with recent changes such as scheduler sharding (#98854) and async IO (#94165), this may no longer be appropriate, and we should revisit the policy here.

Jira issue: CRDB-25685

The text was updated successfully, but these errors were encountered:

blathers-crl · 2023-03-20T20:56:25Z

cc @cockroachdb/replication

erikgrinaker · 2023-06-24T14:05:30Z

Tried dropping the 96 worker cap now that we have scheduler sharding. A few quick runs of kv0/enc=false/nodes=3 with 32 and 96 CPUs show similar results as the roachperf baseline, once kv flow control was disabled (#105508).

It might have more of an effect with 8 stores. Will try a few runs.

erikgrinaker · 2023-06-25T17:12:15Z

I've run a few experiments. Setup:

kv0 for 20 minutes, with workload concurrency of 8 per CPU per node (e.g. 96 for 3x 4-CPU nodes).
Single runs, to reduce costs.
3-node clusters, with single local SSD store unless otherwise noted.
Raft scheduler sharding left at default 16 workers per shard.
Disabled kv flow control, due to known mutex contention: kvflowcontroller: mutex contention #105508.

Results showing throughput as a function of Raft scheduler concurrency per CPU. Additional runs with a fixed 96 workers were also run to compare with the current default value. Current defaults are highlighted in bold.

CPUs / Raft workers	2/cpu	4/cpu	6/cpu	8/cpu	10/cpu	96
2	3,940	3,692	3,995	3,721	3,566
4	8,521	8,681	8,131	8,485	7,738
8	17,240	17,291	16,746	16,833	17,218
32	48,420	48,815	51,081	49,060	51,491	50,957
32, 8 SSDs	45,686	50,704	47,851	45,786	47,875	47,562
96	106,142	95,195	92,158	89,168	95,231	112,619
96, 10k ranges	86,404	85,842	83,105	81,033	79,331	87,953
96, 10k ranges, 8 SSDs	99,638	99,443	100,769	101,123	99,284	95,943

Overall, the number of workers per CPU does not significantly affect the results.

With 2/4 CPUs we see some possible degradation at 10 workers/CPU, but 8 workers/CPU seems reasonable.

With 96 cores, we see significant degradation. The nodes were only running at about 50% CPU. Mutex profiles mostly showed contention on per-replica mutexes (see below), so another set of benchmarks were run with 10k rather than 1k ranges, but this showed similar degradation. Another set of benchmarks were run with 8 stores, which removed the performance degradation with increasing worker counts, but failed to significantly increase throughput -- at this point the nodes were running at about 65-70% CPU.

Given the above, it seems reasonable to retain 8 workers/CPU with a hard cap. However, I'll increase the cap from 96 to 128 -- this did not appear to impact results for 32 cores and below (we don't recommend running beyond 32 cores anyway), and it somewhat reduces the starvation risk with sharding and multi-store.

Mutex profile of 96-core 1k range single-store run with 6 workers/CPU primarily shows per-replica and Pebble contention, as well as Raft transport (likely introduced by recent Raft flow control work). This goes some way towards explaining why multi-store performed better.

erikgrinaker · 2023-06-26T12:43:48Z

Another set of benchmarks were run with 8 stores, which removed the performance degradation with increasing worker counts, but failed to significantly increase throughput -- at this point the nodes were running at about 65-70% CPU.

I suppose it's possible that we hit a resource ceiling on the worker node at this point. I didn't examine it. We were able to push 10% more throughput with 96 CPUs on 96 Raft workers though, which indicates this wasn't the case.

erikgrinaker added C-performance Perf of queries or internals. Solution not expected to change functional behavior. T-kv-replication labels Mar 20, 2023

erikgrinaker changed the title ~~kvnemesis: revisit COCKROACH_SCHEDULER_CONCURRENCY cap~~ kvserver: revisit COCKROACH_SCHEDULER_CONCURRENCY cap May 30, 2023

exalate-issue-sync bot assigned erikgrinaker May 30, 2023

exalate-issue-sync bot changed the title ~~kvserver: revisit COCKROACH_SCHEDULER_CONCURRENCY cap~~ kvnemesis: revisit COCKROACH_SCHEDULER_CONCURRENCY cap May 30, 2023

exalate-issue-sync bot changed the title ~~kvnemesis: revisit COCKROACH_SCHEDULER_CONCURRENCY cap~~ kvserver: revisit COCKROACH_SCHEDULER_CONCURRENCY cap May 30, 2023

erikgrinaker mentioned this issue Jun 25, 2023

kvserver: increase COCKROACH_SCHEDULER_CONCURRENCY cap #105521

Merged

craig bot closed this as completed in a2d07f7 Jun 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver: revisit `COCKROACH_SCHEDULER_CONCURRENCY` cap #99063

kvserver: revisit `COCKROACH_SCHEDULER_CONCURRENCY` cap #99063

erikgrinaker commented Mar 20, 2023 •

edited by cockroach-jira-scripts

Loading

blathers-crl bot commented Mar 20, 2023

erikgrinaker commented Jun 24, 2023

erikgrinaker commented Jun 25, 2023 •

edited

Loading

erikgrinaker commented Jun 26, 2023 •

edited

Loading

kvserver: revisit COCKROACH_SCHEDULER_CONCURRENCY cap #99063

kvserver: revisit COCKROACH_SCHEDULER_CONCURRENCY cap #99063

Comments

erikgrinaker commented Mar 20, 2023 • edited by cockroach-jira-scripts Loading

blathers-crl bot commented Mar 20, 2023

erikgrinaker commented Jun 24, 2023

erikgrinaker commented Jun 25, 2023 • edited Loading

erikgrinaker commented Jun 26, 2023 • edited Loading

kvserver: revisit `COCKROACH_SCHEDULER_CONCURRENCY` cap #99063

kvserver: revisit `COCKROACH_SCHEDULER_CONCURRENCY` cap #99063

erikgrinaker commented Mar 20, 2023 •

edited by cockroach-jira-scripts

Loading

erikgrinaker commented Jun 25, 2023 •

edited

Loading

erikgrinaker commented Jun 26, 2023 •

edited

Loading