-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvcoord: SingleRoundtripWithLatency performance regression #98887
Comments
Informs cockroachdb#98887. Avoids mixing logs with benchmark results, which breaks benchdiff. Release note: None
98792: kvserver: unskip `TestNewTruncateDecision` r=erikgrinaker a=erikgrinaker Passed after 10k stress runs. Has been skipped since 2019, issue seems to have been fixed in the meanwhile. Resolves #38584. Epic: none Release note: None 98855: roachtest: enable schema changes in acceptance/version-upgrade r=fqazi a=fqazi Previously, due to flakes we disabled schema changes inside the version update test. This patch re-enables them, since we are confident that the workload itslef is now stable in a mixed version state. Fixes: #58489 Release note: None 99023: kv: add log scope to BenchmarkSingleRoundtripWithLatency r=arulajmani a=nvanbenschoten Informs #98887. Avoids mixing logs with benchmark results, which breaks benchdiff. Release note: None 99033: storepool: set last unavailable on gossip dead r=andrewbaptist a=kvoli Previously, the `LastUnavailable` time was set in most parts of the storepool when a store was considered either `Unavailable`, `Dead`, `Decommissioned` or `Draining`. When `LastUnavailable` is within the last suspect duration (30s default), the node is treated as suspect by other nodes in the cluster. `LastUnavailable` was not being set when a store was considered dead due to the store not gossiping its store descriptor. This commit updates the `status` storepool function to do just that. Informs: #98928 Release note: None 99039: pkg/ccl/backupccl: Remove TestBackupRestoreControlJob r=benbardin a=benbardin This test has was marked skipped for flakiness, in 2018. Fixes: #24136 Release note: None Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com> Co-authored-by: Faizan Qazi <faizan@cockroachlabs.com> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com> Co-authored-by: Austen McClernon <austen@cockroachlabs.com> Co-authored-by: Ben Bardin <bardin@cockroachlabs.com>
Informs #98887. Avoids mixing logs with benchmark results, which breaks benchdiff. Release note: None
This regression looks real:
|
We've had some recent performance improvements around Raft which I think are showing up here, but there is still a regression. On my GCE worker:
On my m1 mac:
|
CPU profiles reveal that the new time is being spent in the raft scheduler. I took a look at whether async raft storage writes was the cause. However, I did not see an improvement when disabling Instead, I see that the regression was caused by #89632. That change made single-replica raft groups pass through the raft scheduler twice instead of once when performing a write. This benchmark is running with replication factor 1x, so it makes sense that we would see a small (~5µs) regression due to that change. Comparing 2c1c354 and
Comparing fad19d5 and
This impact is within the expectations of that change, so I'll close this issue. |
See #98068 and https://docs.google.com/spreadsheets/d/10GhYr_91CANCNKOM_gPy7Sk9hQkTyQGNgNwNgfHeUtI/edit#gid=4.
Jira issue: CRDB-25569
The text was updated successfully, but these errors were encountered: