-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: kv0/kv95 regression around Feb 25 [loosely coupled log truncation] #78412
Comments
The "winner" is #76902
"Without regression" looks consistently like this:
|
This comment was marked as resolved.
This comment was marked as resolved.
Based on extra logging, the loosely-coupled truncations are never being merged, isDeltaTrusted is always true, and ~11% of the calls to tryEnactTruncations fail because the raft applied index has not advanced sufficiently. So loosely-coupled truncation is behaving as expected. |
If you turn the loosely coupled truncations off, do you see the difference in qps? I'm still confused how this is even happening (as are you I assume) |
I think we should try to find the root cause based on |
The fact that we only see this on benchmarks with large writes is interesting. If we suspect that the lag in truncation is related, then we should look for effects which would be more pronounced with larger raft log entries. I was thinking that maybe the quota pool could be involved somehow, but once we're truncating up to a log index, the quota pool should have already advanced past that index. Looking at the raft entry cache hit rate is a good idea. However, once we're truncating up to a log index, even if that truncation hasn't been applied yet, we shouldn't ever need to go back and read entries below the truncation index, so those entries shouldn't be causing other entries above the truncation index to be getting evicted. So what effect could more entries in the entry cache have? One theory is that we efficiently evict entries from the entry cache once they are truncated (see |
0.036% when running with loosely-coupled for kv0/enc=false/nodes=3/size=4kb. The full profile is in https://drive.google.com/file/d/1JNJgnJseHeMEsqTBgyiHY_uU-uhB2lQ9/view?usp=sharing |
@sumeerbhola Will you have a chance to dig into this further? The release isn't that far off, so we should try to resolve this. |
I am out of ideas, and would appreciate some help. Both tests have low CPU and one of them does not overload the store with writes either. Loosely-coupled truncation is not adding additional writes and is adding a small amount of extra CPU on its own thread. So the small latency increase for the workload, that must be causing the small throughput drop, is hard to explain. |
No ideas off-the-cuff, but I'll have a closer look when I have time. |
The bottleneck here is disk throughput, which is probably significant. The first runs here are with 192 workers (default for this test), the latest with 384 workers: |
Doesn't seem related to the entry cache. I tried a couple of builds that either removed the |
This seems to be a storage issue. Running with
I wonder if this could have something to do with the fact that the old truncation uses the Raft batch for writes, while the loosely coupled truncation uses a new write batch (which adds about 1.4k additional batch commits per second). However, as far as I could tell, there didn't seem to be any other commands in the batches used for log truncation. Will dig into this a bit further. Charts below. Top one is always |
Another observation: I'm seeing the same slowdown even when I'm only using a single worker for the writes. In that case, we're not saturating the disks, nor do we see anything in l0. However, I do see a higher number of Pebble compactions: 1187 vs 1086 over a 3 minute benchmark. That's consistent with the other benchmarks, and could possibly explain this. |
FWIW, I tried using a common batch for all pending truncations, but didn't make any difference. |
Hmm. I disabled the Raft log truncation entirely (commented it out), and performance tanked, with L0 files and sublevels blowing up. Here's what I'm thinking: the added log truncation delay means that Raft log entries stick around in L0 for longer, increasing the cost of L0 compactions. Does that check out @sumeerbhola? |
Yes we can disable without affecting future correctness.
But I think we should do one more step to confirm that writes are being distributed across all 1000 replicas, since we don't have an explanation for the truncation before flush behavior. |
The writes do end up evenly distributed, at least (with the old log truncation):
I'll be away on vacation for the next week and won't be able to dig into this any further. But I think we should disable the loosely coupled truncation on 22.1 for now, and we can reenable it if we do find any other explanations. |
@sumeerbhola Looks like we're cutting rc1 on April 19th, can you make sure we flip the switch before then? |
…nabled to false This is due to the regression noticed in cockroachdb#78412 Release note: None
…nabled to false This is due to the regression noticed in #78412 Release note: None
…nabled to false This is due to the regression noticed in #78412 Release note: None
My understanding was flawed. The raft log can be truncated after
|
kv0:
![image](https://user-images.githubusercontent.com/5076964/159876130-c1d18e89-3a09-4eac-b1d0-ea2ec1875085.png)
kv95
![image](https://user-images.githubusercontent.com/5076964/159876152-ff7c5163-2ef7-4b00-b30c-1f477f3d9822.png)
I'm investigating this.
Bisecting via
StoreRebalancer
log message without verbosity levels set #76473 opt: fetch virtual columns during cascading delete #77052WriteAtRequestTimestamp
parameters #76982 roachtest: ignore flaky activerecord test #77042rules_go
to pick up cockroachdb/rules_go#4 #77045 dev: make sure we inheritstdout
andstderr
when appropriate #77047 sql: skip flaky schema_changer/drop_database_cascade test #77049ignore_if_excluded_from_backup
to SpanConfig #76831Jira issue: CRDB-14119
The text was updated successfully, but these errors were encountered: