-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: improve AddSSTable
throttling
#73979
Labels
A-admission-control
A-kv-server
Relating to the KV-level RPC server
C-performance
Perf of queries or internals. Solution not expected to change functional behavior.
Comments
erikgrinaker
added
C-performance
Perf of queries or internals. Solution not expected to change functional behavior.
A-kv-server
Relating to the KV-level RPC server
T-kv-replication
labels
Dec 17, 2021
Related to #75066. |
We're going to do/evaluate a lot of what this issue text describes in the coming few months, using #82556 to track the work (also being referenced in milestone docs/etc.) Closing as duplicate. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-admission-control
A-kv-server
Relating to the KV-level RPC server
C-performance
Perf of queries or internals. Solution not expected to change functional behavior.
This issue came out of #73904.
When ingesting
AddSSTable
requests, e.g. during index backfills, we have to throttle incoming requests at the store level to avoid overwhelming Pebble. Otherwise we would see steadily increasing read amplification as data is ingested faster than it can be compacted away.cockroach/pkg/kv/kvserver/store_send.go
Lines 278 to 303 in e0d97ca
cockroach/pkg/storage/engine.go
Lines 1055 to 1074 in e05a787
However, following the merge of #73904, there are a number of settings that affect this throttling:
kv.bulk_io_write.concurrent_addsstable_requests
: concurrent regularAddSSTable
requests per store (1)kv.bulk_io_write.concurrent_addsstable_as_writes_requests
: concurrentAddSSTable
requests ingested as normal write batches per store (10)rocksdb.ingest_backpressure.l0_file_count_threshold
: number of files in L0 before delayingAddSSTable
requests (20)rocksdb.ingest_backpressure.max_delay
: maximum delay to apply perAddSSTable
request (5 seconds)kv.bulk_io_write.small_write_size
: SST size below which it will be ingested as a write batch (400 KiB)This can be quite hard to tune optimally. Also, even when Pebble is struggling with very high read amplification and delaying SSTs, because the delay is capped to
max_delay
, if the size and rate of the incoming SSTs are large enough then they will hit the max delay and go through at a rate that's still faster than Pebble can keep up with, leading to read amp growing out of control.We should simplify this throttling model, and consider more sophisticated policies. A very simple model might be for operators to configure a max cap on read amplification (or
l0_file_count_threshold
), and then block all incomingAddSSTable
requests until it drops back below the threshold -- but this might lead to starvation. We would probably want to limit or stagger concurrent request too. It would be great if we could come up with an adaptive policy here.Additionally, we also have below-Raft throttling of
AddSSTable
ingestion, which we should aim to remove as this produces head-of-line blocking for the range and holds onto Raft scheduler goroutines. See #57247.Jira issue: CRDB-11838
The text was updated successfully, but these errors were encountered: