YCSB/E performance regression on December 4th #62198

erikgrinaker · 2021-03-18T12:18:58Z

the ycsb/E benchmarks had a sharp drop of 52% (1580→755) between December 3rd and 5th:

https://roachperf.crdb.dev/?filter=&view=ycsb%2FE%2Fnodes%3D3&tab=gce

The text was updated successfully, but these errors were encountered:

erikgrinaker · 2021-03-18T12:54:48Z

This seems to have been caused by defaulting to the vectorized SQL execution engine for all queries in #55713, which was merged on December 3rd. The vectorize_row_count_threshold was subsequently removed in #62164.

Using commit 5766c37 from March 17th, I ran ycsb/E/nodes=3 benchmarks on GCE with DefaultVectorizeRowCountThreshold set to 0 and 1000, representative results below:

`DefaultVectorizeRowCountThreshold = 0`

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  600.0s        0          24692           41.2     19.0     15.7     48.2     71.3    125.8  insert

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  600.0s        0         467824          779.7    122.1     75.5    419.4    570.4   1006.6  scan

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result
  600.0s        0         492516          820.9    116.9     67.1    419.4    570.4   1006.6

`DefaultVectorizeRowCountThreshold = 1000`

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  600.0s        0          56267           93.8     17.8      8.4     67.1    109.1    302.0  insert

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  600.0s        0        1068114         1780.2     53.0     37.7    159.4    234.9    570.4  scan

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result
  600.0s        0        1124381         1874.0     51.2     33.6    159.4    226.5    570.4

Reverting the setting from 0 to 1000 increases the ops/sec from 821 to 1874, in line with the YCSB/E graph drop on December 4th.

@yuzefovich @asubiotto You may want to have a look at this.

asubiotto · 2021-03-18T13:12:01Z

Hmm this is interesting and not good. AFAICT workload E is 95% reads, 5% writes which is the workload profile we tested with on KV95, I wonder what the difference is here. @yuzefovich do you mind taking a look at this? ~~I'll label as GA blocker for now.~~

yuzefovich · 2021-03-18T16:16:05Z

I can clearly see the difference there, but the profiles are quite puzzling to me - it appears as if things at the storage level become slower when the vectorized engine is used, as if the cFetcher somehow sends different batch requests (but I don't think it's the case). In 30s profile I'm looking at with threshold=0 this accounts for the largest chunk:

    1250ms  1.38% 53.00%    13610ms 15.04%  github.com/cockroachdb/cockroach/pkg/storage.(*pebbleMVCCScanner).nextKey

whereas in 30s profile of threshold=1000 that function call only accounts for 2s. Will look more.

yuzefovich · 2021-03-18T16:48:18Z

I think the difference is that in the vectorized engine we perform a deep copy of BYTES and STRING values (when calling coldata.Bytes.Set) whereas in the row engine we don't (in rowenc.EncDatumValueFromBufferWithOffsetsAndType we just take the relevant part of the slice and put it into the EncDatum directly). And I don't know what to do here, to me the latter approach seems like possibly-unsafe optimization that is not compatible with our flat bytes representation in the vectorized engine. cc @asubiotto @jordanlewis in case you have any ideas

Note that in both cases we also perform another deep copy when creating DString object from []byte to push into the DistSQLReceiver.

yuzefovich · 2021-03-18T16:52:12Z

vectorize_row_count_threshold=0: pprof.cockroach.samples.cpu.052.pb.gz
vectorize_row_count_threshold=1000: pprof.cockroach.samples.cpu.053.pb.gz
(I used the tip of release-21.1 branch)

yuzefovich · 2021-03-18T17:45:22Z

Never mind, that behavior is expected (that we allocate many STRING objects, the same goes for the row engine), so I deleted the comment.

yuzefovich · 2021-03-18T18:34:29Z

Simply adding a deep copy to rowenc.EncDatumValueFromBufferWithOffsetsAndType didn't change anything.

I think the problem might be that we still keep the reference to tree.Datum in VecToDatumConverter.convertedVecs, even when those objects are released.

yuzefovich · 2021-03-18T19:31:45Z

Hm, unsetting those values doesn't seem to help either. My best guess at the moment is that in the vectorized engine we create more pointers which puts pressure on the GC. In the profiles we do see significant increase in GC pressure, but I'm still not sure how to alleviate that.

jordanlewis · 2021-03-19T21:35:21Z

Investigating this further, I found a more minimal reproduction of the difference. The benchmarks were done on a GCEWorker machine, single node cluster.

./cockroach workload run ycsb  --workload=E --concurrency=1 --duration=1m --select-for-update=true --min-scan-length=500 --max-scan-length=500

Interestingly, pinning the scan-length value to 1000 detects no difference with and without vectorize, but pinning the scan-length to 1 also detects no difference.

So I'm hypothesizing that there is some kind of pathological behavior in the way that we're doing the batch resizing that is triggering some slowness at 500. 250 is also a terrible value: you see more than 2x difference with and without vectorize.

At 250, I see something like ~300 ops/s with vectorized off, and ~150 ops/s with vectorized on.

yuzefovich · 2021-03-24T05:48:55Z

I've just had a quick run of YCSB/E on 685751f, and it showed the following:

vectorize=off: 600.0s 0 1127317 1878.8 51.1 35.7 151.0 226.5 570.4
vectorize=on: 600.0s 0 1133363 1888.9 50.8 37.7 151.0 218.1 536.9

This shows that #62282 addressed the largest part of the regression on this benchmark that was caused by the switch to the vectorized engine back in December, so I'm considering this issue closed.

yuzefovich · 2021-03-24T15:22:29Z

Kicked off 10 runs of YCSB/E on 3 node n1-standard-8 cluster in gce with 96 concurrency (basically using the config from the roachtest):

vectorize=off: 1876.24
vectorize=on: 1886.24, +0.53%

So it looks like on this benchmark in particular the vectorized engine even slightly outruns the row-by-row engine with the recent improvement. I think this is somewhat expected given that YCSB/E performs 95% scans with each scan randomly reading up to 1000 rows, and we know that the vectorized read path is faster after a certain threshold on the number of rows is exceeded (I don't know the exact number of though, it was on the order of 1000 before - thus, the original default value of vectorize_row_count_threshold - but I imagine it has decreased).

I'd call it a major success :) cc @jordanlewis

erikgrinaker · 2021-03-24T15:25:59Z

Not so fast, I found the cause of the kv95 regression: #62524. Basically, if there is no estimate (i.e. 0), we're defaulting to allocating batches of size 1024, which is really expensive when we're just fetching a single row.

I'm trying out a patch that defaults to 1 if there's no estimate. It fixed kv95, I'm running ycsb/E now -- hopefully we have actual estimates there so it won't take a hit.

yuzefovich · 2021-03-24T16:06:27Z

Ah, indeed. I think we want to default to the previous behavior of exponentially increasing the batch sizes starting from 1.

erikgrinaker · 2021-03-24T16:13:19Z

Luckily didn't affect YCSB/E much: #62534

erikgrinaker added C-investigation Further steps needed to qualify. C-label will change. C-performance Perf of queries or internals. Solution not expected to change functional behavior. GA-blocker labels Mar 18, 2021

erikgrinaker self-assigned this Mar 18, 2021

erikgrinaker assigned yuzefovich Mar 18, 2021

yuzefovich unassigned erikgrinaker Mar 18, 2021

jordanlewis mentioned this issue Mar 19, 2021

sql: hint scan batch size by expected row count #62282

Merged

erikgrinaker changed the title ~~ycsb/E performance regression on December 4th~~ YCSB/E performance regression on December 4th Mar 20, 2021

This was referenced Mar 20, 2021

kv: gradual kv95 performance regression since December 1st #62156

Closed

20.2 → 21.1 roachperf benchmark regressions #62322

Closed

jordanlewis mentioned this issue Mar 22, 2021

release-21.1: sql: hint scan batch size by expected row count #62365

Merged

craig bot closed this as completed in a56498f Mar 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

YCSB/E performance regression on December 4th #62198

YCSB/E performance regression on December 4th #62198

erikgrinaker commented Mar 18, 2021

erikgrinaker commented Mar 18, 2021 •

edited

Loading

asubiotto commented Mar 18, 2021 •

edited

Loading

yuzefovich commented Mar 18, 2021

yuzefovich commented Mar 18, 2021 •

edited

Loading

yuzefovich commented Mar 18, 2021

yuzefovich commented Mar 18, 2021

yuzefovich commented Mar 18, 2021

yuzefovich commented Mar 18, 2021

jordanlewis commented Mar 19, 2021

yuzefovich commented Mar 24, 2021 •

edited

Loading

yuzefovich commented Mar 24, 2021

erikgrinaker commented Mar 24, 2021

yuzefovich commented Mar 24, 2021 •

edited

Loading

erikgrinaker commented Mar 24, 2021

YCSB/E performance regression on December 4th #62198

YCSB/E performance regression on December 4th #62198

Comments

erikgrinaker commented Mar 18, 2021

erikgrinaker commented Mar 18, 2021 • edited Loading

DefaultVectorizeRowCountThreshold = 0

DefaultVectorizeRowCountThreshold = 1000

asubiotto commented Mar 18, 2021 • edited Loading

yuzefovich commented Mar 18, 2021

yuzefovich commented Mar 18, 2021 • edited Loading

yuzefovich commented Mar 18, 2021

yuzefovich commented Mar 18, 2021

yuzefovich commented Mar 18, 2021

yuzefovich commented Mar 18, 2021

jordanlewis commented Mar 19, 2021

yuzefovich commented Mar 24, 2021 • edited Loading

yuzefovich commented Mar 24, 2021

erikgrinaker commented Mar 24, 2021

yuzefovich commented Mar 24, 2021 • edited Loading

erikgrinaker commented Mar 24, 2021

erikgrinaker commented Mar 18, 2021 •

edited

Loading

`DefaultVectorizeRowCountThreshold = 0`

`DefaultVectorizeRowCountThreshold = 1000`

asubiotto commented Mar 18, 2021 •

edited

Loading

yuzefovich commented Mar 18, 2021 •

edited

Loading

yuzefovich commented Mar 24, 2021 •

edited

Loading

yuzefovich commented Mar 24, 2021 •

edited

Loading