-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: adopt new raft MaxCommittedSizePerReady config parameter #32387
storage: adopt new raft MaxCommittedSizePerReady config parameter #32387
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bdarnell should sign off on the default value here, then we should be good.
s/Ready wiht the/Ready with the/
in the commit message
Reviewable status: complete! 1 of 0 LGTMs obtained
pkg/storage/store.go, line 143 at r1 (raw file):
Settings: st, AmbientCtx: log.AmbientContext{Tracer: st.Tracer}, Clock: clock,
nit: are you running go1.11's gofmt? We're still using go1.10's at the moment.
Before this change, the size of log committed log entries which a replica could apply at a time was bound to the same configuration as the total size of log entries which could be sent in a message (MaxSizePerMsg) which is generally kilobytes. This limit had an impact on the throughput of writes to a replica, particularly when writing large amounts of data. A new raft configuration option MaxCommittedSizePerReady was adding to etcd/raft in (etcd-io/etcd#10258) which allows these two size parameters to be decoupled. This change adopts the configuration and sets it to a default of 64MB. On the below workload which is set up to always return exactly one entry per Ready with the old configuration we see a massive win in both throughput and latency. ``` ./workload run kv {pgurl:1-3} \ --init --splits=10 \ --duration 60s \ --read-percent=${READ_PERCENT} \ --min-block-bytes=8193 --max-block-bytes=16385 \ --concurrency=1024 ``` ``` name old ops/s new ops/s delta KV0 483 ± 3% 2025 ± 3% +319.32% (p=0.002 n=6+6) ``` Before: ``` _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 60.0s 0 29570 492.8 1981.2 2281.7 5100.3 5637.1 6442.5 write 60.0s 0 28405 473.4 2074.8 2281.7 5637.1 6710.9 7516.2 write 60.0s 0 28615 476.9 2074.3 2550.1 5905.6 6442.5 8321.5 write 60.0s 0 28718 478.6 2055.4 2550.1 5100.3 6442.5 7516.2 write 60.0s 0 28567 476.1 2079.8 2684.4 4831.8 5368.7 6442.5 write 60.0s 0 29981 499.7 1975.7 1811.9 5368.7 6174.0 6979.3 write ``` After: ``` _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 60.0s 0 119652 1994.0 510.9 486.5 1006.6 1409.3 4295.0 write 60.0s 0 125321 2088.4 488.5 469.8 906.0 1275.1 4563.4 write 60.0s 0 119644 1993.9 505.2 469.8 1006.6 1610.6 5637.1 write 60.0s 0 119027 1983.6 511.4 469.8 1073.7 1946.2 4295.0 write 60.0s 0 121723 2028.5 500.6 469.8 1040.2 1677.7 4160.7 write 60.0s 0 123697 2061.4 494.1 469.8 1006.6 1610.6 4295.0 write ``` Fixes cockroachdb#31511 Release note: None
ec36024
to
4787e3c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the commit message typo.
Reviewable status: complete! 1 of 0 LGTMs obtained
pkg/storage/store.go, line 143 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: are you running go1.11's gofmt? We're still using go1.10's at the moment.
Reformatted with gofmt from 1.10
Specifically, 64MB might be a bit high, but I'll defer to Ben's judgment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 2 of 0 LGTMs obtained
bors r+ |
32387: storage: adopt new raft MaxCommittedSizePerReady config parameter r=ajwerner a=ajwerner Before this change, the size of log committed log entries which a replica could apply at a time was bound to the same configuration as the total size of log entries which could be sent in a message (MaxSizePerMsg) which is generally kilobytes. This limit had an impact on the throughput of writes to a replica, particularly when writing large amounts of data. A new raft configuration option MaxCommittedSizePerReady was adding to etcd/raft in (etcd-io/etcd#10258) which allows these two size parameters to be decoupled. This change adopts the configuration and sets it to a default of 64MB. On the below workload which is set up to always return exactly one entry per Ready wiht the old configuration we see a massive win in both throughput and latency. ``` ./workload run kv {pgurl:1-3} \ --init --splits=10 \ --duration 60s \ --read-percent=${READ_PERCENT} \ --min-block-bytes=8193 --max-block-bytes=16385 \ --concurrency=1024 ``` ``` name old ops/s new ops/s delta KV0 483 ± 3% 2025 ± 3% +319.32% (p=0.002 n=6+6) ``` Before: ``` _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 60.0s 0 29570 492.8 1981.2 2281.7 5100.3 5637.1 6442.5 write 60.0s 0 28405 473.4 2074.8 2281.7 5637.1 6710.9 7516.2 write 60.0s 0 28615 476.9 2074.3 2550.1 5905.6 6442.5 8321.5 write 60.0s 0 28718 478.6 2055.4 2550.1 5100.3 6442.5 7516.2 write 60.0s 0 28567 476.1 2079.8 2684.4 4831.8 5368.7 6442.5 write 60.0s 0 29981 499.7 1975.7 1811.9 5368.7 6174.0 6979.3 write ``` After: ``` _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 60.0s 0 119652 1994.0 510.9 486.5 1006.6 1409.3 4295.0 write 60.0s 0 125321 2088.4 488.5 469.8 906.0 1275.1 4563.4 write 60.0s 0 119644 1993.9 505.2 469.8 1006.6 1610.6 5637.1 write 60.0s 0 119027 1983.6 511.4 469.8 1073.7 1946.2 4295.0 write 60.0s 0 121723 2028.5 500.6 469.8 1040.2 1677.7 4160.7 write 60.0s 0 123697 2061.4 494.1 469.8 1006.6 1610.6 4295.0 write ``` Fixes #31511 Release note: None Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
Build succeeded |
Once #32600 is merged, I think we should backport this to 2.1 as well. |
Before this change, the size of log committed log entries which a replica could
apply at a time was bound to the same configuration as the total size of log
entries which could be sent in a message (MaxSizePerMsg) which is generally
kilobytes. This limit had an impact on the throughput of writes to a replica,
particularly when writing large amounts of data. A new raft configuration option
MaxCommittedSizePerReady was adding to etcd/raft in (etcd-io/etcd#10258)
which allows these two size parameters to be decoupled. This change adopts the
configuration and sets it to a default of 64MB.
On the below workload which is set up to always return exactly one entry per
Ready wiht the old configuration we see a massive win in both throughput and
latency.
Before:
After:
Fixes #31511
Release note: None