Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: revisit COCKROACH_SCHEDULER_CONCURRENCY cap #99063

Closed
erikgrinaker opened this issue Mar 20, 2023 · 4 comments · Fixed by #105521
Closed

kvserver: revisit COCKROACH_SCHEDULER_CONCURRENCY cap #99063

erikgrinaker opened this issue Mar 20, 2023 · 4 comments · Fixed by #105521
Assignees
Labels
C-performance Perf of queries or internals. Solution not expected to change functional behavior.

Comments

@erikgrinaker
Copy link
Contributor

erikgrinaker commented Mar 20, 2023

We currently cap COCKROACH_SCHEDULER_CONCURRENCY at 96, to avoid mutex collapse as seen in #56851.

var storeSchedulerConcurrency = envutil.EnvOrDefaultInt(
// For small machines, we scale the scheduler concurrency by the number of
// CPUs. 8*NumCPU was determined in 9a68241 (April 2017) as the optimal
// concurrency level on 8 CPU machines. For larger machines, we've seen
// (#56851) that this scaling curve can be too aggressive and lead to too much
// contention in the Raft scheduler, so we cap the concurrency level at 96.
//
// As of November 2020, this default value could be re-tuned.
"COCKROACH_SCHEDULER_CONCURRENCY", min(8*runtime.GOMAXPROCS(0), 96))

However, with recent changes such as scheduler sharding (#98854) and async IO (#94165), this may no longer be appropriate, and we should revisit the policy here.

Jira issue: CRDB-25685

@erikgrinaker erikgrinaker added C-performance Perf of queries or internals. Solution not expected to change functional behavior. T-kv-replication labels Mar 20, 2023
@blathers-crl
Copy link

blathers-crl bot commented Mar 20, 2023

cc @cockroachdb/replication

@erikgrinaker erikgrinaker changed the title kvnemesis: revisit COCKROACH_SCHEDULER_CONCURRENCY cap kvserver: revisit COCKROACH_SCHEDULER_CONCURRENCY cap May 30, 2023
@exalate-issue-sync exalate-issue-sync bot changed the title kvserver: revisit COCKROACH_SCHEDULER_CONCURRENCY cap kvnemesis: revisit COCKROACH_SCHEDULER_CONCURRENCY cap May 30, 2023
@exalate-issue-sync exalate-issue-sync bot changed the title kvnemesis: revisit COCKROACH_SCHEDULER_CONCURRENCY cap kvserver: revisit COCKROACH_SCHEDULER_CONCURRENCY cap May 30, 2023
@erikgrinaker
Copy link
Contributor Author

Tried dropping the 96 worker cap now that we have scheduler sharding. A few quick runs of kv0/enc=false/nodes=3 with 32 and 96 CPUs show similar results as the roachperf baseline, once kv flow control was disabled (#105508).

It might have more of an effect with 8 stores. Will try a few runs.

@erikgrinaker
Copy link
Contributor Author

erikgrinaker commented Jun 25, 2023

I've run a few experiments. Setup:

  • kv0 for 20 minutes, with workload concurrency of 8 per CPU per node (e.g. 96 for 3x 4-CPU nodes).
  • Single runs, to reduce costs.
  • 3-node clusters, with single local SSD store unless otherwise noted.
  • Raft scheduler sharding left at default 16 workers per shard.
  • Disabled kv flow control, due to known mutex contention: kvflowcontroller: mutex contention #105508.

Results showing throughput as a function of Raft scheduler concurrency per CPU. Additional runs with a fixed 96 workers were also run to compare with the current default value. Current defaults are highlighted in bold.

CPUs / Raft workers 2/cpu 4/cpu 6/cpu 8/cpu 10/cpu 96
2 3,940 3,692 3,995 3,721 3,566
4 8,521 8,681 8,131 8,485 7,738
8 17,240 17,291 16,746 16,833 17,218
32 48,420 48,815 51,081 49,060 51,491 50,957
32, 8 SSDs 45,686 50,704 47,851 45,786 47,875 47,562
96 106,142 95,195 92,158 89,168 95,231 112,619
96, 10k ranges 86,404 85,842 83,105 81,033 79,331 87,953
96, 10k ranges, 8 SSDs 99,638 99,443 100,769 101,123 99,284 95,943

Overall, the number of workers per CPU does not significantly affect the results.

With 2/4 CPUs we see some possible degradation at 10 workers/CPU, but 8 workers/CPU seems reasonable.

With 96 cores, we see significant degradation. The nodes were only running at about 50% CPU. Mutex profiles mostly showed contention on per-replica mutexes (see below), so another set of benchmarks were run with 10k rather than 1k ranges, but this showed similar degradation. Another set of benchmarks were run with 8 stores, which removed the performance degradation with increasing worker counts, but failed to significantly increase throughput -- at this point the nodes were running at about 65-70% CPU.

Given the above, it seems reasonable to retain 8 workers/CPU with a hard cap. However, I'll increase the cap from 96 to 128 -- this did not appear to impact results for 32 cores and below (we don't recommend running beyond 32 cores anyway), and it somewhat reduces the starvation risk with sharding and multi-store.

Mutex profile of 96-core 1k range single-store run with 6 workers/CPU primarily shows per-replica and Pebble contention, as well as Raft transport (likely introduced by recent Raft flow control work). This goes some way towards explaining why multi-store performed better.

96-workers-576

@erikgrinaker
Copy link
Contributor Author

erikgrinaker commented Jun 26, 2023

Another set of benchmarks were run with 8 stores, which removed the performance degradation with increasing worker counts, but failed to significantly increase throughput -- at this point the nodes were running at about 65-70% CPU.

I suppose it's possible that we hit a resource ceiling on the worker node at this point. I didn't examine it. We were able to push 10% more throughput with 96 CPUs on 96 Raft workers though, which indicates this wasn't the case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-performance Perf of queries or internals. Solution not expected to change functional behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant