roachtest: add admission/follower-overload #81516

tbg · 2022-05-19T10:29:31Z

This is a less ad-hoc version of the experiment in #81289, where I
messed with the EBS configuration. This can't be done programmatically,
and so here we use an IO nemesis on n3 instead.

open "$(roachprod adminui $c:1)/#/debug/chart?charts=%5B%7B%22metrics%22%3A%5B%7B%22downsampler%22%3A3%2C%22aggregator%22%3A3%2C%22derivative%22%3A0%2C%22perNode%22%3Afalse%2C%22source%22%3A%22%22%2C%22metric%22%3A%22cr.store.raft.quota_pool.percent_used-max%22%7D%2C%7B%22downsampler%22%3A3%2C%22aggregator%22%3A3%2C%22derivative%22%3A0%2C%22perNode%22%3Afalse%2C%22source%22%3A%22%22%2C%22metric%22%3A%22cr.store.raft.quota_pool.percent_used-p50%22%7D%5D%2C%22axisUnits%22%3A0%7D%2C%7B%22metrics%22%3A%5B%7B%22downsampler%22%3A1%2C%22aggregator%22%3A2%2C%22derivative%22%3A0%2C%22perNode%22%3Atrue%2C%22source%22%3A%22%22%2C%22metric%22%3A%22cr.store.storage.l0-sublevels%22%7D%5D%2C%22axisUnits%22%3A0%7D%2C%7B%22metrics%22%3A%5B%7B%22downsampler%22%3A1%2C%22aggregator%22%3A2%2C%22derivative%22%3A2%2C%22perNode%22%3Atrue%2C%22source%22%3A%22%22%2C%22metric%22%3A%22cr.node.sys.host.disk.write.bytes%22%7D%5D%2C%22axisUnits%22%3A1%7D%2C%7B%22metrics%22%3A%5B%7B%22downsampler%22%3A1%2C%22aggregator%22%3A2%2C%22derivative%22%3A0%2C%22perNode%22%3Afalse%2C%22source%22%3A%22%22%2C%22metric%22%3A%22cr.node.exec.latency-p50%22%7D%2C%7B%22downsampler%22%3A1%2C%22aggregator%22%3A2%2C%22derivative%22%3A0%2C%22perNode%22%3Afalse%2C%22source%22%3A%22%22%2C%22metric%22%3A%22cr.node.exec.latency-p99%22%7D%5D%2C%22axisUnits%22%3A2%7D%2C%7B%22metrics%22%3A%5B%7B%22downsampler%22%3A1%2C%22aggregator%22%3A2%2C%22derivative%22%3A2%2C%22perNode%22%3Afalse%2C%22source%22%3A%22%22%2C%22metric%22%3A%22cr.node.exec.latency-count%22%7D%5D%2C%22axisUnits%22%3A0%7D%2C%7B%22metrics%22%3A%5B%7B%22downsampler%22%3A1%2C%22aggregator%22%3A2%2C%22derivative%22%3A0%2C%22perNode%22%3Atrue%2C%22source%22%3A%22%22%2C%22metric%22%3A%22cr.store.storage.l0-num-files%22%7D%5D%2C%22axisUnits%22%3A0%7D%2C%7B%22metrics%22%3A%5B%7B%22downsampler%22%3A1%2C%22aggregator%22%3A2%2C%22derivative%22%3A2%2C%22perNode%22%3Atrue%2C%22source%22%3A%22%22%2C%22metric%22%3A%22cr.store.rocksdb.compacted-bytes-written%22%7D%5D%2C%22axisUnits%22%3A1%7D%2C%7B%22metrics%22%3A%5B%7B%22downsampler%22%3A1%2C%22aggregator%22%3A2%2C%22derivative%22%3A0%2C%22perNode%22%3Atrue%2C%22source%22%3A%22%22%2C%22metric%22%3A%22cr.store.replicas.leaseholders%22%7D%5D%2C%22axisUnits%22%3A0%7D%2C%7B%22metrics%22%3A%5B%7B%22downsampler%22%3A1%2C%22aggregator%22%3A2%2C%22derivative%22%3A2%2C%22perNode%22%3Atrue%2C%22source%22%3A%22%22%2C%22metric%22%3A%22cr.node.admission.wait_durations.kv-count%22%7D%5D%2C%22axisUnits%22%3A0%7D%2C%7B%22metrics%22%3A%5B%7B%22downsampler%22%3A1%2C%22aggregator%22%3A2%2C%22derivative%22%3A0%2C%22perNode%22%3Atrue%2C%22source%22%3A%22%22%2C%22metric%22%3A%22cr.node.sys.cpu.combined.percent-normalized%22%7D%5D%2C%22axisUnits%22%3A0%7D%2C%7B%22metrics%22%3A%5B%7B%22downsampler%22%3A1%2C%22aggregator%22%3A2%2C%22derivative%22%3A0%2C%22perNode%22%3Atrue%2C%22source%22%3A%22%22%2C%22metric%22%3A%22cr.node.sys.go.totalbytes%22%7D%5D%2C%22axisUnits%22%3A1%7D%2C%7B%22metrics%22%3A%5B%7B%22downsampler%22%3A1%2C%22aggregator%22%3A2%2C%22derivative%22%3A2%2C%22perNode%22%3Atrue%2C%22source%22%3A%22%22%2C%22metric%22%3A%22cr.store.raft.rcvd.admission_wait%22%7D%5D%2C%22axisUnits%22%3A2%7D%2C%7B%22metrics%22%3A%5B%7B%22downsampler%22%3A1%2C%22aggregator%22%3A2%2C%22derivative%22%3A2%2C%22perNode%22%3Atrue%2C%22source%22%3A%22%22%2C%22metric%22%3A%22cr.store.raft.rcvd.dropped_bytes%22%7D%5D%2C%22axisUnits%22%3A1%7D%2C%7B%22metrics%22%3A%5B%7B%22downsampler%22%3A1%2C%22aggregator%22%3A2%2C%22derivative%22%3A0%2C%22perNode%22%3Atrue%2C%22source%22%3A%22%22%2C%22metric%22%3A%22cr.store.raft.rcvd.queued_bytes%22%7D%5D%2C%22axisUnits%22%3A1%7D%2C%7B%22metrics%22%3A%5B%7B%22downsampler%22%3A1%2C%22aggregator%22%3A2%2C%22derivative%22%3A0%2C%22perNode%22%3Atrue%2C%22source%22%3A%22%22%2C%22metric%22%3A%22cr.store.raft.process.handleready.latency-p75%22%7D%2C%7B%22downsampler%22%3A1%2C%22aggregator%22%3A2%2C%22derivative%22%3A0%2C%22perNode%22%3Atrue%2C%22source%22%3A%22%22%2C%22metric%22%3A%22cr.store.raft.process.handleready.latency-p50%22%7D%5D%2C%22axisUnits%22%3A2%7D%5D"

open "http://$(roachprod ip --external $c:4):9090/graph?g0.expr=storage_l0_sublevels&g0.tab=0&g0.stacked=0&g0.range_input=1h&g1.expr=storage_l0_num_files&g1.tab=0&g1.stacked=0&g1.range_input=1h&g2.expr=rate(rocksdb_compacted_bytes_written%5B1m%5D)&g2.tab=0&g2.stacked=0&g2.range_input=1h&g3.expr=histogram_quantile(0.90%2C%20rate(workload_kv_write_duration_seconds_bucket%5B1m%5D))&g3.tab=0&g3.stacked=0&g3.range_input=1h&g4.expr=histogram_quantile(0.90%2C%20rate(workload_kv_read_duration_seconds_bucket%5B1m%5D))&g4.tab=0&g4.stacked=0&g4.range_input=1h&g5.expr=rate(workload_kv_write_duration_seconds_count%5B1m%5D)&g5.tab=0&g5.stacked=0&g5.range_input=1h&g6.expr=histogram_quantile(1.0%2C%20rate(raft_quota_pool_percent_used_bucket%5B1m%5D))&g6.tab=0&g6.stacked=0&g6.range_input=1h&g7.expr=histogram_quantile(1.0%2C%20rate(admission_wait_durations_kv_stores_bucket%5B1m%5D))&g7.tab=0&g7.stacked=0&g7.range_input=1h"

Part of #79215.
Closes #81834.

Release note: None

cockroach-teamcity · 2022-05-19T10:29:38Z

This change is

77481: sql: add multitenant fairness tests r=cucaroach a=cucaroach Informs: #65954 Roachtests intended to validate that kv and store admission control queues distribute database resources fairly. There are 8 tests: 2 for "kv" ie CPU stressing and 2 for "store" that are intended to stress the LSM. One test is "same" where each of N sql pods hits the kvserver with equal concurrency and another "concurrency-skew" which varies the concurrency from each pod. We measure the variation in througphput across the pod ("max_tput_delta") and the max/min throughput and latency across the pods. Sample results: ``` multitenant/fairness/kv/concurrency-skew/admission/run_1/1.perf/stats.json { "max_tput_delta": 0.340925, "max_tput": 483.200000, "min_tput": 256.056667, "max_latency": 3.282347, "min_latency": 0.771148} multitenant/fairness/kv/concurrency-skew/no-admission/run_1/1.perf/stats.json { "max_tput_delta": 0.330760, "max_tput": 205.740000, "min_tput": 108.903333, "max_latency": 7.151178, "min_latency": 1.618236} multitenant/fairness/kv/same/admission/run_1/1.perf/stats.json { "max_tput_delta": 0.245294, "max_tput": 293.990000, "min_tput": 197.026667, "max_latency": 0.831686, "min_latency": 0.762475} multitenant/fairness/kv/same/no-admission/run_1/1.perf/stats.json { "max_tput_delta": 0.031199, "max_tput": 132.443333, "min_tput": 124.676667, "max_latency": 1.915801, "min_latency": 1.776664} multitenant/fairness/store/same/admission/run_1/1.perf/stats.json { "max_tput_delta": 0.018095, "max_tput": 139.950000, "min_tput": 136.336667, "max_latency": 0.346295, "min_latency": 0.341212} multitenant/fairness/store/same/no-admission/run_1/1.perf/stats.json { "max_tput_delta": 0.001886, "max_tput": 149.296667, "min_tput": 148.878333, "max_latency": 0.306853, "min_latency": 0.303392} multitenant/fairness/store/concurrency-skew/admission/run_1/1.perf/stats.json { "max_tput_delta": 0.024872, "max_tput": 143.875000, "min_tput": 138.346667, "max_latency": 1.094262, "min_latency": 0.346674} multitenant/fairness/store/concurrency-skew/no-admission/run_1/1.perf/stats.json { "max_tput_delta": 0.005439, "max_tput": 148.848333, "min_tput": 147.500000, "max_latency": 1.007741, "min_latency": 0.313597} ``` Release note: None 81660: kvserver: fix raft handling stats when no ready present r=erikgrinaker a=tbg We were previously adding `time.Time{}.Sub(timeutil.Now())` to the metrics some of the time which produces bogus results. Noticed while working on #81516. Release note: None 81666: ui: Fix accidental commit of `it.only` r=jocrl a=jocrl This commit fixes an accidental commit of `it.only` to frontend tests. It also adds a comment explaining why the test does not go further. Release note: None Co-authored-by: Tommy Reilly <treilly@cockroachlabs.com> Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com> Co-authored-by: Josephine Lee <josephine@cockroachlabs.com>

tbg · 2022-05-24T12:28:25Z

Here's what this roachtest looks like in action (10min) https://www.loom.com/share/f6bec2f6ce26449f9bb04e269375b048

The interface used to take `*testing.T` could be trimmed down a bit, which helps with cockroachdb#81516. Release note: None

tbg · 2022-08-12T10:36:14Z

Verified that this test still works with the latest updates to pausing, and in particular #85739.

This commit makes the following changes: - track *all* IOThresholds in the store's map, not just the ones for overloaded stores. - improve the container for these IOThresholds to be easier to work with. - Rather than "hard-coding" a value of "1.0" to mean overloaded, use (and plumb) the value of the cluster setting. "1.0" is the value at which I/O admission control chooses to engage; but the cluster setting is usually smaller and determines when to consider followers on a remote store pausable. The API now reflects that and avoids this kind of confusion. - Rename all uses of the container away from "overload" towards "IOThreshold". - add a Sequence() method that is bumped whenever the set of Stores whose IOThreshold score indicates pausability changes. I originally started to work on this to address cockroachdb#84465, but realized that we couldn't "just" leave the set of paused followers untouched absent sequence changes. This is because the set of paused followers has additional inputs, most importantly the set of live followers. This set is per-Replica and subject to change, so we can't be too sure the outcome would be the same, and we do want to be reactive to followers becoming nonresponsive by, if necessary, unpausing followers. I think we will have to address cockroachdb#84465 by reducing the frequency at which the paused stores are revisited, but adding an eager pass whenever the sequence is bumped. Additionally, for cockroachdb#84252, we are likely also going to be able to rely on the sequence number to trigger unquiescing of ranges that were previously quiesced in the presence of a paused follower. Regardless of these future possible uses, this is a nice conceptual clean-up and a good last PR to land for pausing in the 22.2 cycle (unless we find time to add quiescence in presence of paused followers, in which case that would be worthy follow-up). I verified that with this commit, the [roachtest] still works and effectively avoids I/O admission control activation a large percentage of the time at a setting of 0.8. This gives good confidence - at least for this exact test - that with 0.5 we'd probably never see admission control throttle foreground writes. However, the test is fairly specific since it severely constrains n3's disk throughput, so 0.8 might be perfectly appropriate in practice still. We'll need some more experience to tell. [roachtest]: cockroachdb#81516 Touches cockroachdb#84465. Touches cockroachdb#84252. Release note: None

85735: streamingccl: multi-node unit tests r=samiskin a=samiskin None of our tests used to run with multiple nodes and a scattered table, so this PR re-enables the unavailable node test and creates a new basic multinode test. It also fixes a bug where if the stream creation statement was retried, it would now error on an existing tenant, since the tenant creation wasn't associated with the overall transaction. Release note: None 85739: kvserver: simplify and track entire set of gossiped IOThresholds r=erikgrinaker a=tbg This commit makes the following changes: - track *all* IOThresholds in the store's map, not just the ones for overloaded stores. - improve the container for these IOThresholds to be easier to work with. - Rather than "hard-coding" a value of "1.0" to mean overloaded, use (and plumb) the value of the cluster setting. "1.0" is the value at which I/O admission control chooses to engage; but the cluster setting is usually smaller and determines when to consider followers on a remote store pausable. The API now reflects that and avoids this kind of confusion. - Rename all uses of the container away from "overload" towards "IOThreshold". - add a Sequence() method that is bumped whenever the set of Stores whose IOThreshold score indicates pausability changes. I originally started to work on this to address #84465, but realized that we couldn't "just" leave the set of paused followers untouched absent sequence changes. This is because the set of paused followers has additional inputs, most importantly the set of live followers. This set is per-Replica and subject to change, so we can't be too sure the outcome would be the same, and we do want to be reactive to followers becoming nonresponsive by, if necessary, unpausing followers. I think we will have to address #84465 by reducing the frequency at which the paused stores are revisited, but adding an eager pass whenever the sequence is bumped. Additionally, for #84252, we are likely also going to be able to rely on the sequence number to trigger unquiescing of ranges that were previously quiesced in the presence of a paused follower. Regardless of these future possible uses, this is a nice conceptual clean-up and a good last PR to land for pausing in the 22.2 cycle (unless we find time to add quiescence in presence of paused followers, in which case that would be worthy follow-up). I verified that with this commit, the [roachtest] still works and effectively avoids I/O admission control activation a large percentage of the time at a setting of 0.8. This gives good confidence - at least for this exact test - that with 0.5 we'd probably never see admission control throttle foreground writes. However, the test is fairly specific since it severely constrains n3's disk throughput, so 0.8 might be perfectly appropriate in practice still. We'll need some more experience to tell. [roachtest]: #81516 Touches #84465. Touches #84252. Release note: None Release justification: low-risk improvement to new functionality 85933: opt: remove TableMeta.IsSkipLocked r=rytaft a=mgartner #### opt: fix duplicate join multiplicity test Test case 9 in `TestGetJoinMultiplicity` was a duplicate of test case 2. It has been updated to be similar, but use an `INNER JOIN` instead of a `LEFT JOIN`, which I believe it was originally supposed to be - there is no similar existing test case. Release note: None #### opt: remove TableMeta.IsSkipLocked `TableMeta.IsSkipLocked` was used by the multiplicity builder to determine whether a column was filtered or not by a `SKIP LOCKED` scan. However, this field was unnecessary. The `deriveUnfilteredCols` function in the multiplicity builder already traverses expressions all the way down to `ScanExpr`s, only collecting unfiltered columns if they originate from a scan where `ScanExpr.IsUnfiltered` returns true. `ScanExpr.IsUnfiltered` returns false if the scan is a `SKIP LOCKED` scan. Therefore, no additional mechanism is required to detect columns filtered by `SKIP LOCKED` scans. I noticed that `TableMeta.IsSkipLocked` was not needed because the multiplicity builder unit tests added in #85720 never set it, yet the tests passed. This commit also adds more multiplicity builder unit tests for self-joins with `SKIP LOCKED`. Release note: None #### opt: ensure min cardinality of zero for expressions with SKIP LOCKED This commit ensures that the minimum cardinality of a scan or lookup-join with `SKIP LOCKED` is zero. This shouldn't be needed because the logical to derive the cardinality for these expressions should never produce a non-zero minimum cardinality, but this provides extra safety. Release note: None #### opt: improve comments in multiplicity builder Release note: None 86053: util: optimize FastIntSet r=rytaft a=mgartner This PR contains several commits which speed up `util.FastIntSet` and reduce the allocations it makes in some cases. See the individual commit messages for details. The speedup and reduction in allocations can be observed in the optimizer's `BenchmarkPhases`: ``` name old time/op new time/op delta kv-read/Simple/OptBuildNorm-16 34.9µs ± 2% 35.0µs ± 4% ~ (p=1.000 n=5+5) kv-read/Simple/Explore-16 43.9µs ± 2% 43.7µs ± 1% ~ (p=0.222 n=5+5) kv-read/Prepared/ExecBuild-16 1.18µs ± 1% 1.13µs ± 1% -3.72% (p=0.008 n=5+5) kv-read-const/Simple/OptBuildNorm-16 35.2µs ± 2% 34.9µs ± 3% ~ (p=0.151 n=5+5) kv-read-const/Simple/Explore-16 45.2µs ± 5% 44.3µs ± 1% ~ (p=0.056 n=5+5) kv-read-const/Prepared/ExecBuild-16 1.02µs ± 4% 0.97µs ± 1% -5.18% (p=0.008 n=5+5) tpcc-new-order/Simple/OptBuildNorm-16 71.1µs ± 1% 70.8µs ± 0% ~ (p=0.548 n=5+5) tpcc-new-order/Simple/Explore-16 117µs ± 1% 116µs ± 2% ~ (p=0.095 n=5+5) tpcc-new-order/Prepared/ExecBuild-16 1.48µs ± 2% 1.44µs ± 1% -2.67% (p=0.008 n=5+5) tpcc-delivery/Simple/OptBuildNorm-16 65.5µs ± 2% 63.9µs ± 1% -2.37% (p=0.032 n=5+4) tpcc-delivery/Simple/Explore-16 96.2µs ± 1% 95.7µs ± 1% ~ (p=0.690 n=5+5) tpcc-delivery/Prepared/AssignPlaceholdersNorm-16 12.5µs ± 1% 12.3µs ± 1% ~ (p=0.095 n=5+5) tpcc-delivery/Prepared/Explore-16 40.5µs ± 1% 39.4µs ± 0% -2.75% (p=0.008 n=5+5) tpcc-stock-level/Simple/OptBuildNorm-16 291µs ± 2% 288µs ± 2% ~ (p=0.310 n=5+5) tpcc-stock-level/Simple/Explore-16 424µs ± 0% 417µs ± 2% -1.79% (p=0.016 n=5+5) tpcc-stock-level/Prepared/AssignPlaceholdersNorm-16 53.8µs ± 0% 53.3µs ± 1% ~ (p=0.151 n=5+5) tpcc-stock-level/Prepared/Explore-16 181µs ± 2% 176µs ± 3% -2.44% (p=0.032 n=5+5) many-columns-and-indexes-a/Simple/OptBuildNorm-16 309µs ± 0% 254µs ± 2% -17.78% (p=0.008 n=5+5) many-columns-and-indexes-a/Simple/Explore-16 467µs ± 1% 389µs ± 1% -16.65% (p=0.008 n=5+5) many-columns-and-indexes-a/Prepared/AssignPlaceholdersNorm-16 275µs ± 1% 223µs ± 1% -18.93% (p=0.008 n=5+5) many-columns-and-indexes-a/Prepared/Explore-16 434µs ± 2% 358µs ± 1% -17.68% (p=0.008 n=5+5) many-columns-and-indexes-b/Simple/OptBuildNorm-16 331µs ± 1% 281µs ± 2% -14.91% (p=0.008 n=5+5) many-columns-and-indexes-b/Simple/Explore-16 500µs ± 2% 424µs ± 1% -15.23% (p=0.008 n=5+5) many-columns-and-indexes-b/Prepared/AssignPlaceholdersNorm-16 279µs ± 1% 228µs ± 1% -17.98% (p=0.008 n=5+5) many-columns-and-indexes-b/Prepared/Explore-16 449µs ± 5% 367µs ± 0% -18.20% (p=0.008 n=5+5) many-columns-and-indexes-c/Simple/OptBuildNorm-16 13.7ms ± 1% 10.6ms ± 1% -22.68% (p=0.008 n=5+5) many-columns-and-indexes-c/Simple/Explore-16 45.7ms ± 2% 40.0ms ± 2% -12.42% (p=0.008 n=5+5) many-columns-and-indexes-c/Prepared/AssignPlaceholdersNorm-16 11.1ms ± 1% 8.6ms ± 2% -22.77% (p=0.008 n=5+5) many-columns-and-indexes-c/Prepared/Explore-16 43.3ms ± 0% 37.8ms ± 2% -12.65% (p=0.008 n=5+5) ored-preds-100/Simple/OptBuildNorm-16 2.59ms ± 1% 2.60ms ± 1% ~ (p=0.690 n=5+5) ored-preds-100/Simple/Explore-16 3.29ms ± 3% 3.31ms ± 2% ~ (p=0.548 n=5+5) ored-preds-100/Prepared/ExecBuild-16 85.5µs ± 1% 84.7µs ± 1% ~ (p=0.095 n=5+5) ored-preds-using-params-100/Simple/OptBuildNorm-16 1.88ms ± 1% 1.88ms ± 1% ~ (p=0.421 n=5+5) ored-preds-using-params-100/Simple/Explore-16 2.56ms ± 1% 2.57ms ± 1% ~ (p=0.413 n=4+5) ored-preds-using-params-100/Prepared/AssignPlaceholdersNorm-16 468µs ± 3% 472µs ± 2% ~ (p=0.421 n=5+5) ored-preds-using-params-100/Prepared/Explore-16 1.16ms ± 1% 1.17ms ± 1% ~ (p=0.222 n=5+5) name old alloc/op new alloc/op delta kv-read/Simple/OptBuildNorm-16 12.3kB ± 0% 12.3kB ± 0% ~ (all equal) kv-read/Simple/Explore-16 14.7kB ± 0% 14.7kB ± 0% ~ (all equal) kv-read/Prepared/ExecBuild-16 704B ± 0% 704B ± 0% ~ (all equal) kv-read-const/Simple/OptBuildNorm-16 12.4kB ± 0% 12.4kB ± 0% ~ (all equal) kv-read-const/Simple/Explore-16 14.8kB ± 0% 14.8kB ± 0% ~ (all equal) kv-read-const/Prepared/ExecBuild-16 520B ± 0% 520B ± 0% ~ (all equal) tpcc-new-order/Simple/OptBuildNorm-16 24.0kB ± 0% 24.0kB ± 0% ~ (p=0.881 n=5+5) tpcc-new-order/Simple/Explore-16 34.0kB ± 0% 33.9kB ± 0% -0.01% (p=0.032 n=5+5) tpcc-new-order/Prepared/ExecBuild-16 768B ± 0% 768B ± 0% ~ (all equal) tpcc-delivery/Simple/OptBuildNorm-16 19.8kB ± 0% 19.8kB ± 0% ~ (p=0.722 n=5+5) tpcc-delivery/Simple/Explore-16 26.3kB ± 0% 26.3kB ± 0% ~ (p=0.802 n=5+5) tpcc-delivery/Prepared/AssignPlaceholdersNorm-16 5.44kB ± 0% 5.44kB ± 0% ~ (all equal) tpcc-delivery/Prepared/Explore-16 13.1kB ± 0% 13.1kB ± 0% ~ (p=1.000 n=5+5) tpcc-stock-level/Simple/OptBuildNorm-16 89.8kB ± 0% 89.8kB ± 0% ~ (p=0.817 n=5+5) tpcc-stock-level/Simple/Explore-16 120kB ± 0% 120kB ± 0% ~ (p=0.937 n=5+5) tpcc-stock-level/Prepared/AssignPlaceholdersNorm-16 18.8kB ± 0% 18.8kB ± 0% ~ (p=1.000 n=5+5) tpcc-stock-level/Prepared/Explore-16 46.9kB ± 0% 46.9kB ± 0% ~ (p=0.262 n=5+5) many-columns-and-indexes-a/Simple/OptBuildNorm-16 16.4kB ± 0% 16.4kB ± 0% ~ (all equal) many-columns-and-indexes-a/Simple/Explore-16 23.1kB ± 0% 23.1kB ± 0% ~ (p=1.000 n=5+5) many-columns-and-indexes-a/Prepared/AssignPlaceholdersNorm-16 3.81kB ± 0% 3.81kB ± 0% ~ (all equal) many-columns-and-indexes-a/Prepared/Explore-16 10.5kB ± 0% 10.5kB ± 0% ~ (p=1.000 n=5+5) many-columns-and-indexes-b/Simple/OptBuildNorm-16 23.6kB ± 0% 23.6kB ± 0% ~ (p=0.730 n=5+5) many-columns-and-indexes-b/Simple/Explore-16 30.0kB ± 0% 30.0kB ± 0% ~ (p=0.159 n=5+5) many-columns-and-indexes-b/Prepared/AssignPlaceholdersNorm-16 6.05kB ± 0% 6.05kB ± 0% ~ (p=1.000 n=5+5) many-columns-and-indexes-b/Prepared/Explore-16 12.4kB ± 0% 12.4kB ± 0% ~ (p=0.540 n=5+5) many-columns-and-indexes-c/Simple/OptBuildNorm-16 5.43MB ± 0% 4.75MB ± 0% -12.49% (p=0.008 n=5+5) many-columns-and-indexes-c/Simple/Explore-16 27.0MB ± 0% 26.3MB ± 0% -2.58% (p=0.008 n=5+5) many-columns-and-indexes-c/Prepared/AssignPlaceholdersNorm-16 4.27MB ± 0% 3.65MB ± 0% -14.54% (p=0.008 n=5+5) many-columns-and-indexes-c/Prepared/Explore-16 25.8MB ± 0% 25.2MB ± 0% -2.48% (p=0.008 n=5+5) ored-preds-100/Simple/OptBuildNorm-16 979kB ± 0% 979kB ± 0% ~ (p=0.889 n=5+5) ored-preds-100/Simple/Explore-16 1.37MB ± 0% 1.37MB ± 0% ~ (p=0.690 n=5+5) ored-preds-100/Prepared/ExecBuild-16 29.6kB ± 0% 29.6kB ± 0% ~ (all equal) ored-preds-using-params-100/Simple/OptBuildNorm-16 568kB ± 0% 568kB ± 0% ~ (p=0.794 n=5+5) ored-preds-using-params-100/Simple/Explore-16 962kB ± 0% 962kB ± 0% ~ (p=0.841 n=5+5) ored-preds-using-params-100/Prepared/AssignPlaceholdersNorm-16 348kB ± 0% 348kB ± 0% ~ (p=0.548 n=5+5) ored-preds-using-params-100/Prepared/Explore-16 742kB ± 0% 742kB ± 0% ~ (p=0.841 n=5+5) name old allocs/op new allocs/op delta kv-read/Simple/OptBuildNorm-16 77.0 ± 0% 77.0 ± 0% ~ (all equal) kv-read/Simple/Explore-16 88.0 ± 0% 88.0 ± 0% ~ (all equal) kv-read/Prepared/ExecBuild-16 9.00 ± 0% 9.00 ± 0% ~ (all equal) kv-read-const/Simple/OptBuildNorm-16 79.0 ± 0% 79.0 ± 0% ~ (all equal) kv-read-const/Simple/Explore-16 90.0 ± 0% 90.0 ± 0% ~ (all equal) kv-read-const/Prepared/ExecBuild-16 6.00 ± 0% 6.00 ± 0% ~ (all equal) tpcc-new-order/Simple/OptBuildNorm-16 128 ± 0% 128 ± 0% ~ (all equal) tpcc-new-order/Simple/Explore-16 171 ± 0% 171 ± 0% ~ (all equal) tpcc-new-order/Prepared/ExecBuild-16 9.00 ± 0% 9.00 ± 0% ~ (all equal) tpcc-delivery/Simple/OptBuildNorm-16 120 ± 0% 120 ± 0% ~ (all equal) tpcc-delivery/Simple/Explore-16 169 ± 0% 169 ± 0% ~ (all equal) tpcc-delivery/Prepared/AssignPlaceholdersNorm-16 27.0 ± 0% 27.0 ± 0% ~ (all equal) tpcc-delivery/Prepared/Explore-16 77.0 ± 0% 77.0 ± 0% ~ (all equal) tpcc-stock-level/Simple/OptBuildNorm-16 552 ± 0% 552 ± 0% ~ (all equal) tpcc-stock-level/Simple/Explore-16 711 ± 0% 711 ± 0% ~ (all equal) tpcc-stock-level/Prepared/AssignPlaceholdersNorm-16 105 ± 0% 105 ± 0% ~ (all equal) tpcc-stock-level/Prepared/Explore-16 264 ± 0% 264 ± 0% ~ (all equal) many-columns-and-indexes-a/Simple/OptBuildNorm-16 69.0 ± 0% 69.0 ± 0% ~ (all equal) many-columns-and-indexes-a/Simple/Explore-16 89.0 ± 0% 89.0 ± 0% ~ (all equal) many-columns-and-indexes-a/Prepared/AssignPlaceholdersNorm-16 21.0 ± 0% 21.0 ± 0% ~ (all equal) many-columns-and-indexes-a/Prepared/Explore-16 41.0 ± 0% 41.0 ± 0% ~ (all equal) many-columns-and-indexes-b/Simple/OptBuildNorm-16 128 ± 0% 128 ± 0% ~ (all equal) many-columns-and-indexes-b/Simple/Explore-16 145 ± 0% 146 ± 0% ~ (p=0.167 n=5+5) many-columns-and-indexes-b/Prepared/AssignPlaceholdersNorm-16 36.0 ± 0% 36.0 ± 0% ~ (all equal) many-columns-and-indexes-b/Prepared/Explore-16 53.0 ± 0% 53.0 ± 0% ~ (all equal) many-columns-and-indexes-c/Simple/OptBuildNorm-16 72.3k ± 0% 61.7k ± 0% -14.66% (p=0.029 n=4+4) many-columns-and-indexes-c/Simple/Explore-16 295k ± 0% 284k ± 0% -3.68% (p=0.008 n=5+5) many-columns-and-indexes-c/Prepared/AssignPlaceholdersNorm-16 66.1k ± 0% 56.4k ± 0% -14.69% (p=0.008 n=5+5) many-columns-and-indexes-c/Prepared/Explore-16 289k ± 0% 279k ± 0% -3.46% (p=0.008 n=5+5) ored-preds-100/Simple/OptBuildNorm-16 16.5k ± 0% 16.5k ± 0% ~ (p=0.333 n=5+4) ored-preds-100/Simple/Explore-16 17.9k ± 0% 17.9k ± 0% ~ (all equal) ored-preds-100/Prepared/ExecBuild-16 413 ± 0% 413 ± 0% ~ (all equal) ored-preds-using-params-100/Simple/OptBuildNorm-16 4.08k ± 0% 4.08k ± 0% ~ (all equal) ored-preds-using-params-100/Simple/Explore-16 5.54k ± 0% 5.54k ± 0% ~ (all equal) ored-preds-using-params-100/Prepared/AssignPlaceholdersNorm-16 1.44k ± 0% 1.44k ± 0% ~ (all equal) ored-preds-using-params-100/Prepared/Explore-16 2.89k ± 0% 2.89k ± 0% ~ (all equal) ``` Co-authored-by: Shiranka Miskin <shiranka.miskin@gmail.com> Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com> Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>

This is is a less ad-hoc version of the experiment in cockroachdb#81289, where I messed with the EBS configuration. This can't be done programmatically, and so here we use an IO nemesis on n3 instead. Part of cockroachdb#79215. Closes cockroachdb#81834. Release note: None

tbg · 2022-08-29T09:19:08Z

pkg/cmd/roachtest/tests/admission_control_follower_overload.go

+ // └─md0 9:0 0 872.3G 0 raid0 /mnt/data1
+ //
+ // and so the actual write throttle is about 2x what was set.
+ c.Run(ctx, c.Node(3), "sudo", "systemctl", "set-property", "cockroach", "'IOWriteBandwidthMax={store-dir} 20971520'")


@pavelkalinnikov

This is just resuscitating cockroachdb#81516, tests we added when developing follower pausing but never checked in. Some of this work is applicable in the context of replication admission control, when we apply flow control for regular writes. The tests add add an IO nemesis on n3/s3, restricting disk bandwidth to 20MiB/s. They then observe what happens when remote nodes n1 and n2 issue follower write traffic to it, either across many ranges or just 1 (potentially triggering the per-replica proposal quota pool). We can observe LSM state on n3/s3 under such conditions, and also what would happen if the concentrated follower node also what happens if that very node also serves some foreground load. Part of cockroachdb#89208. Release note: None

This is just resuscitating cockroachdb#81516, tests we added when developing follower pausing but never checked in. Some of this work is applicable in the context of replication admission control, when we apply flow control for regular writes. The tests add an IO nemesis on n3/s3, restricting disk bandwidth to 20MiB/s. They then observe what happens when remote nodes n1 and n2 issue follower write traffic to it, either across many ranges or just 1 (potentially triggering the per-replica proposal quota pool). We can observe LSM state on n3/s3 under such conditions, and also what would happen if the concentrated follower node also serves some foreground load. Part of cockroachdb#89208. Release note: None

111070: roachtest: add admission-overload/follower-overload r=sumeerbhola a=irfansharif This is just resuscitating #81516, tests we added when developing follower pausing but never checked in. Some of this work is applicable in the context of replication admission control, when we apply flow control for regular writes. The tests add an IO nemesis on n3/s3, restricting disk bandwidth to 20MiB/s. They then observe what happens when remote nodes n1 and n2 issue follower write traffic to it, either across many ranges or just 1 (potentially triggering the per-replica proposal quota pool). We can observe LSM state on n3/s3 under such conditions, and also what would happen if the concentrated follower node also serves some foreground load. Part of #89208. Release note: None 112660: kvflowcontroller: fix logging of blocked streams r=pavelkalinnikov,aadityasondhi a=sumeerbhola There were two bugs that are fixed: - The blocked_stream_count metric was incorrectly capped to 100. - Streams were being logged with stats that were never blocked. Some additonal improvements/fixes: - Controller.mu was being unnecessarily acquired for read paths that don't care about concurrent additions to the map. - WorkClass.SafeFormat was calling redact.SafePrinter.Print in some cases so "elastic" was not being treated as unsafe. There is a unit test to test the overflow logic of the logs, and to verify that the metric is correct even when the logs overflow. Epic: none Release note: None Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com> Co-authored-by: sumeerbhola <sumeer@cockroachlabs.com>

tbg force-pushed the roachtest-admission-control-follower-overload branch 4 times, most recently from 936a217 to 2c20ed4 Compare May 19, 2022 21:01

tbg mentioned this pull request May 23, 2022

kvserver: fix raft handling stats when no ready present #81660

Merged

tbg force-pushed the roachtest-admission-control-follower-overload branch from 2c20ed4 to be0b4d3 Compare May 24, 2022 12:30

tbg mentioned this pull request May 24, 2022

kvserver: throttle writes on followers #79215

Closed

tbg force-pushed the roachtest-admission-control-follower-overload branch 7 times, most recently from 0b55925 to 8279ff4 Compare May 30, 2022 08:10

This was referenced May 31, 2022

storage: unexpected IO overload in admission/follower-overload/presplit-control #82109

Closed

admission,kvserver: subject snapshot ingestion to admission control #80607

Closed

tbg force-pushed the roachtest-admission-control-follower-overload branch 4 times, most recently from 3db6df0 to 24c8e5a Compare June 3, 2022 11:20

tbg mentioned this pull request Jun 7, 2022

sqlutils: narrow the testing shim #82505

Merged

tbg added a commit to tbg/cockroach that referenced this pull request Jun 7, 2022

sqlutils: narrow the testing shim

449345d

The interface used to take `*testing.T` could be trimmed down a bit, which helps with cockroachdb#81516. Release note: None

tbg force-pushed the roachtest-admission-control-follower-overload branch from 8ac037d to 90bff48 Compare June 7, 2022 07:00

tbg mentioned this pull request Jun 7, 2022

kvserver: "auto create stats" job should use lower priority for IO #82508

Open

tbg force-pushed the roachtest-admission-control-follower-overload branch 2 times, most recently from 0605b1b to a4b231b Compare June 8, 2022 20:19

tbg mentioned this pull request Aug 12, 2022

kvserver: simplify and track entire set of gossiped IOThresholds #85739

Merged

tbg mentioned this pull request Aug 18, 2022

ui: surface PausedReplicas in DB Console #84489

Closed

tbg force-pushed the roachtest-admission-control-follower-overload branch from 5ab5622 to a3f80e6 Compare August 18, 2022 08:44

tbg commented Aug 29, 2022

View reviewed changes

irfansharif mentioned this pull request Oct 12, 2022

admission: add integration tests #89208

Open

21 tasks

tbg closed this Feb 23, 2023

irfansharif mentioned this pull request May 31, 2023

roachtest: add admission-control/database-drop #104051

Merged

irfansharif mentioned this pull request Sep 21, 2023

roachtest: add admission-overload/follower-overload #111070

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

roachtest: add admission/follower-overload #81516

roachtest: add admission/follower-overload #81516

tbg commented May 19, 2022 •

edited

Loading

cockroach-teamcity commented May 19, 2022

tbg commented May 24, 2022

tbg commented Aug 12, 2022

tbg Aug 29, 2022

tbg Mar 2, 2023

roachtest: add admission/follower-overload #81516

roachtest: add admission/follower-overload #81516

Conversation

tbg commented May 19, 2022 • edited Loading

cockroach-teamcity commented May 19, 2022

tbg commented May 24, 2022

tbg commented Aug 12, 2022

tbg Aug 29, 2022

Choose a reason for hiding this comment

tbg Mar 2, 2023

Choose a reason for hiding this comment

tbg commented May 19, 2022 •

edited

Loading