-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
YCSB/E performance regression on December 4th #62198
Comments
This seems to have been caused by defaulting to the vectorized SQL execution engine for all queries in #55713, which was merged on December 3rd. The Using commit 5766c37 from March 17th, I ran
|
Hmm this is interesting and not good. AFAICT workload E is 95% reads, 5% writes which is the workload profile we tested with on KV95, I wonder what the difference is here. @yuzefovich do you mind taking a look at this? |
I can clearly see the difference there, but the profiles are quite puzzling to me - it appears as if things at the storage level become slower when the vectorized engine is used, as if the
whereas in 30s profile of |
I think the difference is that in the vectorized engine we perform a deep copy of Note that in both cases we also perform another deep copy when creating |
|
Never mind, that behavior is expected (that we allocate many |
Simply adding a deep copy to I think the problem might be that we still keep the reference to |
Hm, unsetting those values doesn't seem to help either. My best guess at the moment is that in the vectorized engine we create more pointers which puts pressure on the GC. In the profiles we do see significant increase in GC pressure, but I'm still not sure how to alleviate that. |
Investigating this further, I found a more minimal reproduction of the difference. The benchmarks were done on a GCEWorker machine, single node cluster.
Interestingly, pinning the scan-length value to 1000 detects no difference with and without vectorize, but pinning the scan-length to 1 also detects no difference. So I'm hypothesizing that there is some kind of pathological behavior in the way that we're doing the batch resizing that is triggering some slowness at 500. 250 is also a terrible value: you see more than 2x difference with and without vectorize. At 250, I see something like ~300 ops/s with vectorized off, and ~150 ops/s with vectorized on. |
I've just had a quick run of YCSB/E on 685751f, and it showed the following:
This shows that #62282 addressed the largest part of the regression on this benchmark that was caused by the switch to the vectorized engine back in December, so I'm considering this issue closed. |
Kicked off 10 runs of YCSB/E on 3 node n1-standard-8 cluster in gce with 96 concurrency (basically using the config from the roachtest):
So it looks like on this benchmark in particular the vectorized engine even slightly outruns the row-by-row engine with the recent improvement. I think this is somewhat expected given that YCSB/E performs 95% scans with each scan randomly reading up to 1000 rows, and we know that the vectorized read path is faster after a certain threshold on the number of rows is exceeded (I don't know the exact number of though, it was on the order of 1000 before - thus, the original default value of I'd call it a major success :) cc @jordanlewis |
Not so fast, I found the cause of the kv95 regression: #62524. Basically, if there is no estimate (i.e. 0), we're defaulting to allocating batches of size 1024, which is really expensive when we're just fetching a single row. I'm trying out a patch that defaults to 1 if there's no estimate. It fixed kv95, I'm running ycsb/E now -- hopefully we have actual estimates there so it won't take a hit. |
Ah, indeed. I think we want to default to the previous behavior of exponentially increasing the batch sizes starting from 1. |
Luckily didn't affect YCSB/E much: #62534 |
the
ycsb/E
benchmarks had a sharp drop of 52% (1580→755) between December 3rd and 5th:https://roachperf.crdb.dev/?filter=&view=ycsb%2FE%2Fnodes%3D3&tab=gce
The text was updated successfully, but these errors were encountered: