kvserver: improve `AddSSTable` throttling #73979

erikgrinaker · 2021-12-17T16:52:29Z

This issue came out of #73904.

When ingesting AddSSTable requests, e.g. during index backfills, we have to throttle incoming requests at the store level to avoid overwhelming Pebble. Otherwise we would see steadily increasing read amplification as data is ingested faster than it can be compacted away.

cockroach/pkg/kv/kvserver/store_send.go

Lines 278 to 303 in e0d97ca

 case *roachpb.AddSSTableRequest: 

 // Limit the number of concurrent AddSSTable requests, since they're 

 // expensive and block all other writes to the same span. However, don't 

 // limit AddSSTable requests that are going to ingest as a WriteBatch. 

 if t.IngestAsWrites { 

 return nil, nil 

 } 

 before := timeutil.Now() 

 res, err := s.limiters.ConcurrentAddSSTableRequests.Begin(ctx) 

 if err != nil { 

 return nil, err 

 } 

 beforeEngineDelay := timeutil.Now() 

 s.engine.PreIngestDelay(ctx) 

 after := timeutil.Now() 

 waited, waitedEngine := after.Sub(before), after.Sub(beforeEngineDelay) 

 s.metrics.AddSSTableProposalTotalDelay.Inc(waited.Nanoseconds()) 

 s.metrics.AddSSTableProposalEngineDelay.Inc(waitedEngine.Nanoseconds()) 

 if waited > time.Second { 

 log.Infof(ctx, "SST ingestion was delayed by %v (%v for storage engine back-pressure)", 

 waited, waitedEngine) 

 } 

 return res, nil

cockroach/pkg/storage/engine.go

Lines 1055 to 1074 in e05a787

 func calculatePreIngestDelay(settings *cluster.Settings, metrics *pebble.Metrics) time.Duration { 

 maxDelay := ingestDelayTime.Get(&settings.SV) 

 l0ReadAmpLimit := ingestDelayL0Threshold.Get(&settings.SV) 

 const ramp = 10 

 l0ReadAmp := metrics.Levels[0].NumFiles 

 if metrics.Levels[0].Sublevels >= 0 { 

 l0ReadAmp = int64(metrics.Levels[0].Sublevels) 

 } 

 if l0ReadAmp > l0ReadAmpLimit { 

 delayPerFile := maxDelay / time.Duration(ramp) 

 targetDelay := time.Duration(l0ReadAmp-l0ReadAmpLimit) * delayPerFile 

 if targetDelay > maxDelay { 

 return maxDelay 

 } 

 return targetDelay 

 } 

 return 0 

 }

However, following the merge of #73904, there are a number of settings that affect this throttling:

kv.bulk_io_write.concurrent_addsstable_requests: concurrent regular AddSSTable requests per store (1)
kv.bulk_io_write.concurrent_addsstable_as_writes_requests: concurrent AddSSTable requests ingested as normal write batches per store (10)
rocksdb.ingest_backpressure.l0_file_count_threshold: number of files in L0 before delaying AddSSTable requests (20)
rocksdb.ingest_backpressure.max_delay: maximum delay to apply per AddSSTable request (5 seconds)
kv.bulk_io_write.small_write_size: SST size below which it will be ingested as a write batch (400 KiB)

This can be quite hard to tune optimally. Also, even when Pebble is struggling with very high read amplification and delaying SSTs, because the delay is capped to max_delay, if the size and rate of the incoming SSTs are large enough then they will hit the max delay and go through at a rate that's still faster than Pebble can keep up with, leading to read amp growing out of control.

We should simplify this throttling model, and consider more sophisticated policies. A very simple model might be for operators to configure a max cap on read amplification (or l0_file_count_threshold), and then block all incoming AddSSTable requests until it drops back below the threshold -- but this might lead to starvation. We would probably want to limit or stagger concurrent request too. It would be great if we could come up with an adaptive policy here.

Additionally, we also have below-Raft throttling of AddSSTable ingestion, which we should aim to remove as this produces head-of-line blocking for the range and holds onto Raft scheduler goroutines. See #57247.

Jira issue: CRDB-11838

The text was updated successfully, but these errors were encountered:

erikgrinaker · 2022-01-19T11:41:33Z

Related to #75066.

irfansharif · 2022-10-03T21:14:56Z

We're going to do/evaluate a lot of what this issue text describes in the coming few months, using #82556 to track the work (also being referenced in milestone docs/etc.)

Closing as duplicate.

erikgrinaker added C-performance Perf of queries or internals. Solution not expected to change functional behavior. A-kv-server Relating to the KV-level RPC server T-kv-replication labels Dec 17, 2021

erikgrinaker mentioned this issue Dec 17, 2021

kvserver: throttle AddSSTable requests with IngestAsWrites #73904

Merged

irfansharif added the A-admission-control label Sep 16, 2022

irfansharif mentioned this issue Oct 3, 2022

kv,bulkio: throttle per-store column/index backfill requests #82556

Closed

irfansharif closed this as not planned Won't fix, can't repro, duplicate, stale Oct 3, 2022

exalate-issue-sync bot closed this as completed Oct 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver: improve `AddSSTable` throttling #73979

kvserver: improve `AddSSTable` throttling #73979

erikgrinaker commented Dec 17, 2021 •

edited by cockroach-jira-scripts

Loading

erikgrinaker commented Jan 19, 2022

irfansharif commented Oct 3, 2022

kvserver: improve AddSSTable throttling #73979

kvserver: improve AddSSTable throttling #73979

Comments

erikgrinaker commented Dec 17, 2021 • edited by cockroach-jira-scripts Loading

erikgrinaker commented Jan 19, 2022

irfansharif commented Oct 3, 2022

kvserver: improve `AddSSTable` throttling #73979

kvserver: improve `AddSSTable` throttling #73979

erikgrinaker commented Dec 17, 2021 •

edited by cockroach-jira-scripts

Loading