kv/sql: index backfill and truncate temporarily crater write throughput on large tables #62672

nvanbenschoten · 2021-03-26T17:24:01Z

Note: This issue started off as an investigation into the disruptive impact of a TRUNCATE on foreground traffic. However, we later observed that this problem generalizes to index backfills as well. See #62672 (comment) below.

We've seen that TRUNCATE operations on large, write-heavy tables are very disruptive. They can cause a dramatic drop in throughput that takes 10s of minutes to recover from.

During a recent investigation, we determined that this was due to how TRUNCATE impacts the load distribution in KV. TRUNCATE replaces the existing table keyspace with a new one, initially backed by a single range. This causes all load that had been previously well-distributed over the cluster to be directed at a single leaseholder. This leaseholder struggles to serve the load while also splitting into multiple ranges (due to range size and load) and balancing these new ranges across the cluster.

While this hotspot persists, throughput is severely impacted.

Reproduction

roachprod create ${USER}-truncate --nodes 20 --clouds=aws --aws-machine-type-ssd=m5d.8xlarge
roachprod stage  ${USER}-truncate release v20.2.6
roachprod start  ${USER}-truncate:1-16 --store-count=2
roachprod ssh ${USER}-truncate:17 --    './cockroach workload init kv {pgurl:1}'
roachprod ssh ${USER}-truncate:17-20 -- './cockroach workload run kv --concurrency=256 {pgurl:1-16}'
# after 10 minutes in other terminal
roachprod ssh ${USER}-truncate:1 -- './cockroach sql --insecure -e "TRUNCATE TABLE kv.kv"'

This causes the following effect:

Notice that throughput does not return to its pre-truncated level for at least 10 minutes. Things are likely even worse with larger clusters.

Next steps

Determine why load-based splitting is not kicking in immediately after the truncate to creating many ranges.
Determine why load-based rebalancing is not kicking in immediately after the truncate to spread out the many ranges. Could the wave of range merges due to the loss of load on the old keyspace be starving out new rebalancing? (see storage: merge less aggressively after load-based splitting #41317)
Determine whether there's something better we can do to avoid creating a single-range hotspot in the first place. Can we use table statistics to prepare a more appropriate keyspace to switch over to? If we could pre-split and pre-scatter the keyspace before hitting it with load, we'd be in a much better position.

gz#8170

Epic: CRDB-2625

gz#8886

gz#9032

The text was updated successfully, but these errors were encountered:

ajwerner · 2021-03-26T17:51:04Z

Have you tried on master?

nvanbenschoten · 2021-03-26T17:54:15Z

No, I have not, but I can. What do we expect to be different on master?

ajwerner · 2021-03-26T18:15:17Z

I could have sworn that we mapped this slowdown to cockroachdb/pebble#981 months ago.

nvanbenschoten · 2021-03-26T18:22:42Z

Here is the behavior on 8b552c7. Looks mostly the same:

During this reproduction, I explored why ranges were being split after the TRUNCATE, as there had been some confusion over whether this was due to size or load. It does look like ranges are being split due to load (which is good!), but at a rate of only about 2 splits per range per minute.

I210326 18:07:19.110863 8560363 kv/kvserver/replica_command.go:398 ⋮ [n10,split,s19,r85/3:‹/{Table/53/1/9…-Max}›] 698  initiating a split of this range at key ‹/Table/53/2/-1186540398063376634› [r154] (‹load at key /Table/53/2/-1186540398063376634 (41252.18 splitQPS, 1609.92 batches/sec, 1417.86 raft mutations/sec)›)‹›
I210326 18:07:30.323017 9009701 kv/kvserver/replica_command.go:398 ⋮ [n10,split,s19,r154/3:‹/{Table/53/2/-…-Max}›] 835  initiating a split of this range at key ‹/Table/53/2/4882772393790623368› [r155] (‹load at key /Table/53/2/4882772393790623368 (50996.76 splitQPS, 23954.83 batches/sec, 1318.69 raft mutations/sec)›)‹; r154/4 is being probed (may or may not need a Raft snapshot)›
I210326 18:07:41.469750 9515716 kv/kvserver/replica_command.go:398 ⋮ [n10,split,s19,r155/3:‹/{Table/53/2/4…-Max}›] 940  initiating a split of this range at key ‹/Table/53/2/7071725178704415234› [r156] (‹load at key /Table/53/2/7071725178704415234 (23063.61 splitQPS, 11357.11 batches/sec, 935.31 raft mutations/sec)›)‹; r155/6 is waiting for a Raft snapshot›
I210326 18:07:41.772243 9531429 kv/kvserver/replica_command.go:398 ⋮ [n10,split,s19,r154/3:‹/Table/53/2/{-11865…-488277…}›] 942  initiating a split of this range at key ‹/Table/53/2/1964751474247208420› [r437] (‹load at key /Table/53/2/1964751474247208420 (27728.04 splitQPS, 15743.88 batches/sec, 1058.71 raft mutations/sec)›)‹›
I210326 18:07:43.768370 9621888 kv/kvserver/replica_command.go:398 ⋮ [n10,split,s19,r85/3:‹/Table/53/{1/91650…-2/-1186…}›] 959  initiating a split of this range at key ‹/Table/53/2/-5890102151327250091› [r438] (‹load at key /Table/53/2/-5890102151327250091 (43029.29 splitQPS, 19624.21 batches/sec, 1779.29 raft mutations/sec)›)‹›
I210326 18:07:52.533886 10161357 kv/kvserver/replica_command.go:398 ⋮ [n10,split,s19,r156/3:‹/{Table/53/2/7…-Max}›] 996  initiating a split of this range at key ‹/Table/53/2/8189182208545922397› [r439] (‹load at key /Table/53/2/8189182208545922397 (15987.60 splitQPS, 7103.84 batches/sec, 633.06 raft mutations/sec)›)‹; r156/4 is waiting for a Raft snapshot›
I210326 18:07:52.864334 10183632 kv/kvserver/replica_command.go:398 ⋮ [n10,split,s19,r437/3:‹/Table/53/2/{196475…-488277…}›] 1004  initiating a split of this range at key ‹/Table/53/2/3378926683732238032› [r440] (‹load at key /Table/53/2/3378926683732238032 (21990.95 splitQPS, 9876.53 batches/sec, 762.01 raft mutations/sec)›)‹›
I210326 18:07:55.058714 10303293 kv/kvserver/replica_command.go:398 ⋮ [n10,split,s19,r438/3:‹/Table/53/2/-{589010…-118654…}›] 1014  initiating a split of this range at key ‹/Table/53/2/-3419204553766014128› [r441] (‹load at key /Table/53/2/-3419204553766014128 (25009.98 splitQPS, 16445.23 batches/sec, 1273.94 raft mutations/sec)›)‹›
I210326 18:07:55.775786 10339893 kv/kvserver/replica_command.go:398 ⋮ [n10,split,s19,r155/3:‹/Table/53/2/{488277…-707172…}›] 1024  initiating a split of this range at key ‹/Table/53/2/6014787348318689137› [r442] (‹load at key /Table/53/2/6014787348318689137 (13105.03 splitQPS, 7185.43 batches/sec, 683.46 raft mutations/sec)›)‹; r155/6 is waiting for a Raft snapshot›
I210326 18:07:57.774800 10468756 kv/kvserver/replica_command.go:398 ⋮ [n10,split,s19,r154/3:‹/Table/53/2/{-11865…-196475…}›] 1043  initiating a split of this range at key ‹/Table/53/2/386339293792052538› [r443] (‹load at key /Table/53/2/386339293792052538 (27119.77 splitQPS, 10744.07 batches/sec, 893.66 raft mutations/sec)›)‹›
I210326 18:08:03.618528 10781201 kv/kvserver/replica_command.go:398 ⋮ [n10,split,s19,r439/3:‹/{Table/53/2/8…-Max}›] 1067  initiating a split of this range at key ‹/Table/53/2/8705645444285876232› [r444] (‹load at key /Table/53/2/8705645444285876232 (10842.10 splitQPS, 4383.45 batches/sec, 415.41 raft mutations/sec)›)‹›
I210326 18:08:03.954046 10798192 kv/kvserver/replica_command.go:398 ⋮ [n10,split,s19,r440/3:‹/Table/53/2/{337892…-488277…}›] 1071  initiating a split of this range at key ‹/Table/53/2/4157848410948475527› [r445] (‹load at key /Table/53/2/4157848410948475527 (15798.49 splitQPS, 6536.19 batches/sec, 530.17 raft mutations/sec)›)‹›
I210326 18:08:06.244836 10909176 kv/kvserver/replica_command.go:398 ⋮ [n10,split,s19,r441/3:‹/Table/53/2/-{341920…-118654…}›] 1074  initiating a split of this range at key ‹/Table/53/2/-2180130495874161662› [r446] (‹load at key /Table/53/2/-2180130495874161662 (23021.75 splitQPS, 10670.59 batches/sec, 880.91 raft mutations/sec)›)‹›
I210326 18:08:06.854375 10937864 kv/kvserver/replica_command.go:398 ⋮ [n10,split,s19,r442/3:‹/Table/53/2/{601478…-707172…}›] 1077  initiating a split of this range at key ‹/Table/53/2/6540827195747335702› [r777] (‹load at key /Table/53/2/6540827195747335702 (10540.73 splitQPS, 5180.68 batches/sec, 458.39 raft mutations/sec)›)‹›
I210326 18:08:15.019081 11277085 kv/kvserver/replica_command.go:398 ⋮ [n10,split,s19,r445/3:‹/Table/53/2/4{157848…-882772…}›] 1124  initiating a split of this range at key ‹/Table/53/2/4512809824461552487› [r778] (‹load at key /Table/53/2/4512809824461552487 (8171.23 splitQPS, 3895.43 batches/sec, 351.14 raft mutations/sec)›)‹›
I210326 18:08:17.907134 11391792 kv/kvserver/replica_command.go:398 ⋮ [n10,split,s19,r777/3:‹/Table/53/2/{654082…-707172…}›] 1145  initiating a split of this range at key ‹/Table/53/2/6795481456936418537› [r779] (‹load at key /Table/53/2/6795481456936418537 (5932.83 splitQPS, 2942.06 batches/sec, 293.04 raft mutations/sec)›)‹›
I210326 18:08:18.785298 11423467 kv/kvserver/replica_command.go:398 ⋮ [n10,split,s19,r155/3:‹/Table/53/2/{488277…-601478…}›] 1159  initiating a split of this range at key ‹/Table/53/2/5423887395245668889› [r780] (‹load at key /Table/53/2/5423887395245668889 (13188.91 splitQPS, 5903.44 batches/sec, 614.73 raft mutations/sec)›)‹›
I210326 18:08:19.111082 11435322 kv/kvserver/replica_command.go:398 ⋮ [n10,split,s19,r85/3:‹/Table/53/{1/91650…-2/-5890…}›] 1165  initiating a split of this range at key ‹/Table/53/2/-7642450099548430550› [r781] (‹load at key /Table/53/2/-7642450099548430550 (38065.34 splitQPS, 15501.95 batches/sec, 1982.71 raft mutations/sec)›)‹›
I210326 18:08:29.797078 11902284 kv/kvserver/replica_command.go:398 ⋮ [n10,split,s19,r440/3:‹/Table/53/2/{337892…-415784…}›] 1207  initiating a split of this range at key ‹/Table/53/2/3732974195458568389› [r782] (‹load at key /Table/53/2/3732974195458568389 (12008.18 splitQPS, 4861.11 batches/sec, 512.34 raft mutations/sec)›)‹›

I wonder whether there are any easy wins here to be had regarding not resetting a range's load-based splitting tracker during a load-based split:

cockroach/pkg/kv/kvserver/split_queue.go

Lines 261 to 262 in cafa371

    
           // Reset the splitter now that the bounds of the range changed. 
        
           r.loadBasedSplitter.Reset()

ajwerner · 2021-03-26T18:45:17Z

Not resetting but rather splitting out the samples makes a lot of sense.

nvanbenschoten · 2021-03-28T19:12:32Z

Determine whether there's something better we can do to avoid creating a single-range hotspot in the first place. Can we use table statistics to prepare a more appropriate keyspace to switch over to? If we could pre-split and pre-scatter the keyspace before hitting it with load, we'd be in a much better position.

I think we're going to need to do something like this if we want to make a TRUNCATE fully non-disruptive. As @lunevalex astutely pointed out, the ramp-up period after a TRUNCATE looks a lot like immediately after the load is started, though maybe a fixed multiple slower. Even if we could make this just as efficient as the initial ramp-up period, that likely still wouldn't be sufficient, as that still takes 2-3 minutes to reach peak throughput, and longer on larger clusters.

petermattis · 2021-03-29T01:28:54Z

Determine whether there's something better we can do to avoid creating a single-range hotspot in the first place. Can we use table statistics to prepare a more appropriate keyspace to switch over to? If we could pre-split and pre-scatter the keyspace before hitting it with load, we'd be in a much better position.

We could split the new range along the same boundaries as the pre-truncated table. Or some fraction of those boundaries (e.g. every Nth range boundary). I bet we could hack together something which did this using crdb_internal.ranges in order to verify that it did address the problem. This is presuming that the data inserted into the table post-truncate has the same distribution as the data pre-truncate, but we'd also need to make that assumption if we used table statistics.

nvanbenschoten · 2021-03-29T14:42:54Z

I like this idea a lot. Using the existing range boundaries instead of trying to recreate them from table statistics seems easier, more precise, and has the potential of piggybacking on top of existing load-based splits in case of uneven (but also not divergent) load distribution.

@aliher1911 do you have interest in doing some exploration here? We can walk through the relevant code during our pod meeting later today.

andreimatei · 2021-03-29T23:15:51Z

If we pre-split, would we use the range descriptor's "sticky bit" with some arbitrary expiration to keep the splits around for a while?

ajwerner · 2021-03-29T23:19:30Z

If we did add the sticky bit, it might make sense to jitter the expiration times.

nvanbenschoten · 2021-03-30T00:41:53Z

Yes, we'd set some split expiration, similar to what IMPORT does. I was thinking 10 minutes.

Jittering the expiration times is a good idea. We should do that in other places as well to avoid waves of simultaneous range merges.

See cockroachdb#62672 (comment).

nvanbenschoten · 2021-04-02T21:49:46Z

I hacked together something for this to test out whether pre-splitting based on existing range boundaries would help out, as suggested by @petermattis in #62672 (comment). I then realized @jordanlewis was already testing this out on his LARGE DATA BANK stream in a much cleaner way - sorry to step on your toes! I didn't realize you had actually taken the stream suggestion. I'll let you take over from here.

Even the hacky version demonstrated a major improvement. Here's a screenshot of a series of three TRUNCATE operations using the same setup as listed above. TRUNCATE operations occur under three different configurations:

9:24: pre-splitting enabled, range merging enabled
9:29: pre-splitting enabled, range merging disabled to avoid a few load-based range merges after load halts on the truncated table, which we found to be disruptive in kv/sql: GC after TRUNCATE temporarily craters write throughput #62700 and will likely be fixed by improving Raft snapshots
9:34: pre-splitting disabled

As we see, pre-splitting the new keyspace into some number of ranges and scattering before switching over the load made a very large difference. We should proceed with that approach.

jordanlewis · 2021-04-02T22:05:37Z

Haha, no worries @nvanbenschoten! I didn't tell you I was going to work on this on the stream. I also haven't been able to figure out how to nicely test as well as you have here, so hopefully we can merge our two approaches into something that is great.

63043: sql: TRUNCATE preserves split points of indexes r=jordanlewis a=jordanlewis Touches #62672 Previously, TRUNCATE unconditionally blew away all index split points when creating the replacement empty indexes. This manifested in some bad behavior when performing TRUNCATE on tables or indexes that had heavy load and were heavily split already, because all of a sudden all of the traffic that was nicely dispersed across all of the ranges would redirect to the single new range that TRUNCATE created. The bad performance would remediate over time as the database re-split the new ranges, but preserving the splits across the index swap boundaries is a faster way to get there. Release note (sql change): TRUNCATE is now less disruptive on tables with a lot of concurrent traffic. 64395: roachtest: improve varz regexp r=andreimatei a=andreimatei It was recognizing: follower_reads_success_count{store="1"} 27606 but not: sql_select_count 1.652807e+06 Release note: None Co-authored-by: Jordan Lewis <jordanthelewis@gmail.com> Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>

nvanbenschoten · 2021-05-05T19:03:08Z

I ran through the original reproduction steps on master. The result...

Things look significantly better. We do see a very minor latency blip during the pre-split-and-scatter and then later as the new table splits further due to load, but these are inconsequential compared to the latency spikes and corresponding throughput craters we were seeing before.

Nice job on this @jordanlewis!

aayushshah15 · 2021-05-17T02:34:37Z

I believe we're seeing a very analogous series of events play out due to an index backfill. The same customer that prompted this issue has seen similar impact to foreground traffic on a table the moment an index backfill is kicked off.

On 20.2.8, with @nvanbenschoten's repro steps from above, but instead of TRUNCATE, kicking off an index creation (1:44) gets us a very similar set of observable symptoms:

The rate of load-based range splits seems to be about the same as what @nvanbenschoten observed in his repro -- I'm seeing 1 split per range once every ~25 seconds.

The story is similar on master but we don't see a dip in Replicas per Node like we do on 20.2, presumably due to #64201.

From my very limited understanding of this area, it does seem to check out. The index backfiller only seems to create size based splits as it ingests SSTs, so the moment an index backfill is kicked off, all write traffic will be bottlenecked due to the (new) secondary index being consolidated to a small set of ranges until load based splitting catches up.

rp run ${USER}-backfill:1-16 -- grep -ER "/Table/53/3/.*initiating a split of this range" logs/ | ag -o "[0-9]+.[0-9]+ splitQPS"
>
...
3216.26 splitQPS
3216.26 splitQPS
3318.35 splitQPS
3318.35 splitQPS
25791.39 splitQPS
24843.94 splitQPS
11742.71 splitQPS
11575.02 splitQPS
...

I reckon a fix to this would be very similar to #63043 in theory. However, we might not want to do any downsampling in the case of an index backfill.

I'm re-opening this issue for now since it has a ton of valuable context.

cc @dt as you were looking into this.

nvanbenschoten · 2021-05-17T14:35:11Z

Thanks for writing this up @aayushshah15!

The story is similar on master but we don't see a dip in Replicas per Node like we do on 20.2, presumably due to #64201.

Beyond the impact on Replicas per Node, did this have any effect on latency or throughput?

I reckon a fix to this would be very similar to #63043 in theory. However, we might not want to do any downsampling in the case of an index backfill.

It seems a little less clear what we should do in this case, because we don't trivially know the distribution of rows that will land in the secondary index. With TRUNCATE, things were simpler, because we had some reason to believe that the indexes would look similar to how they looked before the TRUNCATE.

Have you given any thought to this? Perhaps we could sample the existing PK and perform the secondary index translation to get a sense for what the secondary index will look like ahead of time?

Also, have we confirmed that this is still a pressing issue for the customer? Have we seen the behavior of an index backfill now that the Pebble write stall issues have been resolved? I have to imagine that AddSSTable requests were some of the most impacted operations when file creation was taking 60ms and leading to full engine stalls. So it seems very possible that even though there is work to do here, it no longer has the same urgency for the customer.

As much as we can, I think we should re-assess all of the items this customer is interested in to determine whether they were being exacerbated by the write stall issues.

aayushshah15 · 2021-05-17T17:18:40Z

Beyond the impact on Replicas per Node, did this have any effect on latency or throughput

On master:

On 20.2.8:

So it seems like there's no real difference besides the less aggressive range-merges.

Have we seen the behavior of an index backfill now that the Pebble write stall issues have been resolved?

Just to confirm, you're referring to cockroachdb/pebble#1125, right? The run on master included the corresponding vendor bump for this change, so it seems like it did not have much of impact here. (That makes sense right? If the consolidation of all the write traffic to a handful of ranges[1] is the bottleneck then we wouldn't expect those pebble fixes to help, would we?)

because we don't trivially know the distribution of rows that will land in the secondary index

Have you given any thought to this? Perhaps we could sample the existing PK and perform the secondary index translation to get a sense for what the secondary index will look like ahead of time?

That's a good point. I'm having a hard time understanding your proposal though. Is the idea that, since the system already some idea of the mapping from a primary index's data distribution to its load distribution, we can somehow come up with a way to form a mapping between a secondary index's data distribution to its load distribution (and pre-split based on that)? Maybe I'm totally misunderstanding you and this is not what you mean at all.

[1]: Range counts on the new secondary index right after the backfill is kicked off

while true
do
        rp ssh ${USER}-backfill:1 -- './cockroach sql --insecure -e "SELECT count(*) FROM [ SHOW RANGES FROM INDEX kv.kv@test ]"';
        sleep 3
done
>
count
2
count
2
count
2
count
3
count
3
count
4
count
4
count
4
count
6
...

dt · 2021-05-17T17:35:32Z

If this hypothesis is correct, we should see this dip in SQL throughput just by replacing the index backfill itself with a sleep 300, and not reading/adding SSTables at all, right?

If so, then I wonder if it might help to instead ramp traffic to the index, e.g. making new DELETE_ONLY_RAMP and DELETE_AND_WRITE_ONLY_RAMP state that we step through where SQL writes only probabilistically opt to update the new index, and then let load-based splitting take care of it, based on real observed traffic. Then after the ramp period, we step to the current DELETE_ONLY, then DELETE_AND_WRITE_ONLY, wait for one version and do the regular backfill? During the ramp phase we could probabilistically choose to include the update based on time elapsed since the state change or something.

dt · 2021-05-17T17:38:17Z

That said, when we looked at a customer's cluster that had seen throughput fall to single-digit percentage of what it was before the CREATE INDEX, it stayed that way for a long time -- much longer than one or two minutes. While it was in that state, we enabled diagnostics collection for one of the slow queries on the table and observed it waited nearly 1s for quota pool (it usually ran in single-digit millis). Given it stayed in that state for a long period, maybe just waiting for load-based splitting wouldn't help after all.

aayushshah15 · 2021-05-17T17:47:44Z

That said, when we looked at a customer's cluster that had seen throughput fall to single-digit percentage of what it was before the CREATE INDEX, it stayed that way for a long time -- much longer than one or two minutes.

That's more a function of the size of the table and the overall traffic to that table, right? Like, in this reproduction, the entire cluster had ~400 ranges before the backfill. It seems like it would just be a longer climb back to the peak if the table had more data. I can run a much larger repro than this and confirm this.

While it was in that state, we enabled diagnostics collection for one of the slow queries on the table and observed it waited nearly 1s for quota pool (it usually ran in single-digit millis).

Would you mind pointing me to the trace event that you saw?

It also seems like what we saw here is mostly expected -- given that the proposal quota pool has a size of 8mb (per range) with the default log truncation threshold. If all write traffic was consolidated to a single range at the beginning, we would expect the quota pool to step in and backpressure write traffic. This would've been exacerbated by the write-heavy nature of the customer's workload.

dt · 2021-05-17T18:20:40Z

Unfortunately the trace was collected and viewed during a live debugging session with the customer. I don't know if it was retained, but can follow-up with you offline for details.

nvanbenschoten · 2021-05-17T18:21:58Z

Just to confirm, you're referring to cockroachdb/pebble#1125, right? The run on master included the corresponding vendor bump for this change, so it seems like it did not have much of impact here. (That makes sense right? If the consolidation of all the write traffic to a handful of ranges[1] is the bottleneck then we wouldn't expect those pebble fixes to help, would we?)

I'm also referring to the encryption=plain config that was causing file creation and deletion to take 60+ ms (which is what allowed cockroachdb/pebble#1125 to become such an issue). It makes sense that it wouldn't have an impact on this test, but another important question is whether the immediate motivation for this issue was due to index backfills triggering and/or waiting on write stalls. In other words, is this issue (even including the TRUNCATE part) motivated by the 3 minutes of reduced write throughput, or were things significantly worse on the customer's cluster due to the write stall issue? David's comment in #62672 (comment) provides some evidence for that. And if so, are they still this bad? Now that both of those issues are fixed, things may look very different on their cluster.

I'm not saying we shouldn't continue exploring how to improve this, just that we should re-evaluate (in part, for the customer's sake) how bad things remain now that the other potentially related issues have been resolved.

aayushshah15 · 2021-05-26T20:35:07Z

@lunevalex suggested an approach to mitigate the foreground impact that we seem to have previously considered for some other use cases as well.

The idea was that we would perform 2 backfills -- one that corresponds to when the index creation is kicked off, say with ts = 10. Until this first backfill completes, we would make it such that foreground write traffic on the table doesn't update this index. Say this first backfill completes at ts = 20. At this point, we start a second backfill that ingests mutations from timestamps [11, 20] while also having foreground write traffic update the index (i.e. what we currently do with the index when it is DELETE_AND_WRITE_ONLY). This second backfill should be able to use a time-bound iterator, and should be considerably faster because of that.

This should get us out of the main issue of load consolidation on a handful of ranges, as by the time ongoing writes have to update the new index, that index at least has a bunch of size based splits. Notably though, this approach wouldn't capture any load-based splits until write traffic has to start updating the index, so it probably warrants some prototyping to figure out if this'll be adequate.

cc @shermanCRL and @pbardea, we should discuss where the prototyping / implementation work items for this should be routed, as the work is mostly going to reside in areas owned by bulk.

dt · 2021-05-26T20:51:11Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv/sql: index backfill and truncate temporarily crater write throughput on large tables #62672

kv/sql: index backfill and truncate temporarily crater write throughput on large tables #62672

nvanbenschoten commented Mar 26, 2021 •

edited by RoachietheSupportRoach

Loading

ajwerner commented Mar 26, 2021

nvanbenschoten commented Mar 26, 2021

ajwerner commented Mar 26, 2021

nvanbenschoten commented Mar 26, 2021

ajwerner commented Mar 26, 2021

nvanbenschoten commented Mar 28, 2021

petermattis commented Mar 29, 2021

nvanbenschoten commented Mar 29, 2021

andreimatei commented Mar 29, 2021

ajwerner commented Mar 29, 2021

nvanbenschoten commented Mar 30, 2021

nvanbenschoten commented Apr 2, 2021

jordanlewis commented Apr 2, 2021

nvanbenschoten commented May 5, 2021

aayushshah15 commented May 17, 2021 •

edited

Loading

nvanbenschoten commented May 17, 2021

aayushshah15 commented May 17, 2021

dt commented May 17, 2021

dt commented May 17, 2021

aayushshah15 commented May 17, 2021 •

edited

Loading

dt commented May 17, 2021

nvanbenschoten commented May 17, 2021

aayushshah15 commented May 26, 2021

dt commented May 26, 2021

dt commented May 28, 2021 •

edited

Loading

aayushshah15 commented May 31, 2021

lunevalex commented Oct 12, 2021

kv/sql: index backfill and truncate temporarily crater write throughput on large tables #62672

kv/sql: index backfill and truncate temporarily crater write throughput on large tables #62672

Comments

nvanbenschoten commented Mar 26, 2021 • edited by RoachietheSupportRoach Loading

Reproduction

Next steps

ajwerner commented Mar 26, 2021

nvanbenschoten commented Mar 26, 2021

ajwerner commented Mar 26, 2021

nvanbenschoten commented Mar 26, 2021

ajwerner commented Mar 26, 2021

nvanbenschoten commented Mar 28, 2021

petermattis commented Mar 29, 2021

nvanbenschoten commented Mar 29, 2021

andreimatei commented Mar 29, 2021

ajwerner commented Mar 29, 2021

nvanbenschoten commented Mar 30, 2021

nvanbenschoten commented Apr 2, 2021

jordanlewis commented Apr 2, 2021

nvanbenschoten commented May 5, 2021

aayushshah15 commented May 17, 2021 • edited Loading

nvanbenschoten commented May 17, 2021

aayushshah15 commented May 17, 2021

dt commented May 17, 2021

dt commented May 17, 2021

aayushshah15 commented May 17, 2021 • edited Loading

dt commented May 17, 2021

nvanbenschoten commented May 17, 2021

aayushshah15 commented May 26, 2021

dt commented May 26, 2021

dt commented May 28, 2021 • edited Loading

aayushshah15 commented May 31, 2021

lunevalex commented Oct 12, 2021

nvanbenschoten commented Mar 26, 2021 •

edited by RoachietheSupportRoach

Loading

aayushshah15 commented May 17, 2021 •

edited

Loading

aayushshah15 commented May 17, 2021 •

edited

Loading

dt commented May 28, 2021 •

edited

Loading