storage,admission: investigate read-only batch latency during high-volume snapshot ingest #89788
Labels
C-investigation
Further steps needed to qualify. C-label will change.
T-admission-control
Admission Control
Describe the problem
Experiment discussed internally here. When trying to reproduce snapshot-induced-latency-hits, using the roachtest added in #89191, we noticed that p99.9 latencies for read traffic over data that's not currently receiving snapshots see an increase. When looking at outlier traces, the time is spent entirely below pebble. There's little trace info from within pebble to understand why; this issue tracks investigating just that.
To Reproduce
Using #89191-ish:
First red annotation is leases for foreground load being transferred to the node that's going to start receiving snapshots. Second red annotation is when it starts receiving snapshots, and service latencies start going through the roof. A set of outlier traces can be found here: trace-snapshot-latency.tar.gz. They look roughly like the one below:
+cc @andrewbaptist, @sumeerbhola.
Jira issue: CRDB-20434
The text was updated successfully, but these errors were encountered: