backfill: high latency when index backfill adds SSTs with large spans

We've been seeing significantly reduced throughput and increased latency for OLTP foreground traffic during index backfills in certain cases, as well as long running times for the index backfill itself.

This is related to what @awoods187 has also seen while testing with TPCC 1k (as reported in #34744), but this issue focuses on a workload specifically intended to demonstrate this behavior. It's possible that TPCC's variety and complexity in queries obscures some of the impact of a single index backfill.

I set up a roachtest to run the `bulkingest` workload on 4 nodes, which creates a table with 20 million rows with the table schema
```
CREATE TABLE bulkingest (
	a INT,
	b INT,
	c INT,
	payload BYTES,
	PRIMARY KEY (a, b, c)
)
```
where `a` is populated with sequential values and `payload` is populated with random values. (To reproduce, run the test in https://github.com/cockroachdb/cockroach/compare/master...lucy-zhang:repro, with the environment variable `COCKROACH_IMPORT_WORKLOAD_FASTER` set to `true`.). The test creates an index on `(payload, a)` and then on `(a, payload)`, and issues updates to the table in parallel. The two backfills are visible below (and took 8min and 2min, respectively):

<img width="981" alt="Screen Shot 2019-03-21 at 6 34 05 PM" src="https://user-images.githubusercontent.com/3175520/54884420-3cba5600-4e47-11e9-9cfc-61fb2fc6bd99.png">

There are multiple issues here:

1. When an AddSSTable command is queued, all other writes to the same span will be blocked. When the index order is uncorrelated with the primary key order (since `payload` has random values), the SSTs that are generated will cover very wide spans relative to the number of keys they contain, which is why adding the `(payload, a)` index is worse than adding the `(a, payload)` index.

   This is confirmed by logging the number of existing keys in the span in `EvalAddSSTable`. For the `(payload, a)` index, the number of existing keys in each span is much higher. We add a lot of SSTs with only a few thousand keys that span an entire range that already contains 100,000s of keys.

   @dt has pointed out that this could be mitigated if we flushed an SST every time we detected a large gap in the index keys being written, to produce a larger number of denser SSTs.

2. When writing SSTs to nonempty ranges, repeatedly computing MVCC stats for the range is expensive, and we run into the known issue where a single node spends a lot of time computing stats while others node are blocked on AddSSTable calls (see https://github.com/cockroachdb/cockroach/pull/35016#issuecomment-465294381 and the accompanying PR).

   I ran this test with #35231, which skips the stats recomputation in AddSSTable (and defers accurate stats recomputation to when ranges are split). This makes the first backfill much faster (from 8min to 4min), and I saw a similar 2x improvement when running a larger version of this test as well as the inverted index roachtest (which went from ~50 minutes to ~25 minutes).

   The effect of the backfill on foreground traffic seems just as large (maybe larger?), but #35231 will still be an improvement if the total amount of time the backfill is holding up traffic goes down.

   <img width="980" alt="Screen Shot 2019-03-21 at 6 34 16 PM" src="https://user-images.githubusercontent.com/3175520/54885451-8bb9b880-4e52-11e9-90f5-a680b1a46656.png">

3. Even with the improvement offered by #35231 and on the "well-behaved" index on `(a, payload)`, the effect on foreground traffic is still significant, and I don't know of other reasons to pinpoint for this. Is this cluster underprovisioned?

Other notes:
- The inverted index roachtest, which also builds an index on a randomly generated payload, also consistently shows the severe drop in throughput for foreground traffic, and p99 latency increasing to 1s-10s.
- I tried running the test on @vivekmenezes's branch (#35855) to store more KVs in memory before flushing SSTs and it didn't make an appreciable difference, at least for this workload.

Epic CRDB-8816

Jira issue: CRDB-4530

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

backfill: high latency when index backfill adds SSTs with large spans #36091

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

backfill: high latency when index backfill adds SSTs with large spans #36091

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions