Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: adopt new raft MaxCommittedSizePerReady config parameter #32387

Merged
merged 1 commit into from
Nov 15, 2018

Conversation

ajwerner
Copy link
Contributor

Before this change, the size of log committed log entries which a replica could
apply at a time was bound to the same configuration as the total size of log
entries which could be sent in a message (MaxSizePerMsg) which is generally
kilobytes. This limit had an impact on the throughput of writes to a replica,
particularly when writing large amounts of data. A new raft configuration option
MaxCommittedSizePerReady was adding to etcd/raft in (etcd-io/etcd#10258)
which allows these two size parameters to be decoupled. This change adopts the
configuration and sets it to a default of 64MB.

On the below workload which is set up to always return exactly one entry per
Ready wiht the old configuration we see a massive win in both throughput and
latency.

./workload run kv {pgurl:1-3} \
    --init --splits=10 \
    --duration 60s \
    --read-percent=${READ_PERCENT} \
    --min-block-bytes=8193 --max-block-bytes=16385 \
    --concurrency=1024
name       old ops/s  new ops/s  delta
KV0         483 ± 3%  2025 ± 3%  +319.32%  (p=0.002 n=6+6)

Before:

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
   60.0s        0          29570          492.8   1981.2   2281.7   5100.3   5637.1   6442.5  write
   60.0s        0          28405          473.4   2074.8   2281.7   5637.1   6710.9   7516.2  write
   60.0s        0          28615          476.9   2074.3   2550.1   5905.6   6442.5   8321.5  write
   60.0s        0          28718          478.6   2055.4   2550.1   5100.3   6442.5   7516.2  write
   60.0s        0          28567          476.1   2079.8   2684.4   4831.8   5368.7   6442.5  write
   60.0s        0          29981          499.7   1975.7   1811.9   5368.7   6174.0   6979.3  write

After:

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
   60.0s        0         119652         1994.0    510.9    486.5   1006.6   1409.3   4295.0  write
   60.0s        0         125321         2088.4    488.5    469.8    906.0   1275.1   4563.4  write
   60.0s        0         119644         1993.9    505.2    469.8   1006.6   1610.6   5637.1  write
   60.0s        0         119027         1983.6    511.4    469.8   1073.7   1946.2   4295.0  write
   60.0s        0         121723         2028.5    500.6    469.8   1040.2   1677.7   4160.7  write
   60.0s        0         123697         2061.4    494.1    469.8   1006.6   1610.6   4295.0  write

Fixes #31511

Release note: None

@ajwerner ajwerner requested review from a team November 15, 2018 16:15
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong: @bdarnell should sign off on the default value here, then we should be good.

s/Ready wiht the/Ready with the/ in the commit message

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained


pkg/storage/store.go, line 143 at r1 (raw file):

		Settings:                    st,
		AmbientCtx:                  log.AmbientContext{Tracer: st.Tracer},
		Clock:                       clock,

nit: are you running go1.11's gofmt? We're still using go1.10's at the moment.

Before this change, the size of log committed log entries which a replica could
apply at a time was bound to the same configuration as the total size of log
entries which could be sent in a message (MaxSizePerMsg) which is generally
kilobytes. This limit had an impact on the throughput of writes to a replica,
particularly when writing large amounts of data. A new raft configuration option
MaxCommittedSizePerReady was adding to etcd/raft in (etcd-io/etcd#10258)
which allows these two size parameters to be decoupled. This change adopts the
configuration and sets it to a default of 64MB.

On the below workload which is set up to always return exactly one entry per
Ready with the old configuration we see a massive win in both throughput and
latency.

```
./workload run kv {pgurl:1-3} \
    --init --splits=10 \
    --duration 60s \
    --read-percent=${READ_PERCENT} \
    --min-block-bytes=8193 --max-block-bytes=16385 \
    --concurrency=1024
```

```
name       old ops/s  new ops/s  delta
KV0         483 ± 3%  2025 ± 3%  +319.32%  (p=0.002 n=6+6)
```

Before:
```
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
   60.0s        0          29570          492.8   1981.2   2281.7   5100.3   5637.1   6442.5  write
   60.0s        0          28405          473.4   2074.8   2281.7   5637.1   6710.9   7516.2  write
   60.0s        0          28615          476.9   2074.3   2550.1   5905.6   6442.5   8321.5  write
   60.0s        0          28718          478.6   2055.4   2550.1   5100.3   6442.5   7516.2  write
   60.0s        0          28567          476.1   2079.8   2684.4   4831.8   5368.7   6442.5  write
   60.0s        0          29981          499.7   1975.7   1811.9   5368.7   6174.0   6979.3  write
```

After:
```
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
   60.0s        0         119652         1994.0    510.9    486.5   1006.6   1409.3   4295.0  write
   60.0s        0         125321         2088.4    488.5    469.8    906.0   1275.1   4563.4  write
   60.0s        0         119644         1993.9    505.2    469.8   1006.6   1610.6   5637.1  write
   60.0s        0         119027         1983.6    511.4    469.8   1073.7   1946.2   4295.0  write
   60.0s        0         121723         2028.5    500.6    469.8   1040.2   1677.7   4160.7  write
   60.0s        0         123697         2061.4    494.1    469.8   1006.6   1610.6   4295.0  write
```

Fixes cockroachdb#31511

Release note: None
Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the commit message typo.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained


pkg/storage/store.go, line 143 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: are you running go1.11's gofmt? We're still using go1.10's at the moment.

Reformatted with gofmt from 1.10

@nvanbenschoten
Copy link
Member

should sign off on the default value here

Specifically, 64MB might be a bit high, but I'll defer to Ben's judgment.

Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained

@ajwerner
Copy link
Contributor Author

bors r+

craig bot pushed a commit that referenced this pull request Nov 15, 2018
32387: storage: adopt new raft MaxCommittedSizePerReady config parameter r=ajwerner a=ajwerner

Before this change, the size of log committed log entries which a replica could
apply at a time was bound to the same configuration as the total size of log
entries which could be sent in a message (MaxSizePerMsg) which is generally
kilobytes. This limit had an impact on the throughput of writes to a replica,
particularly when writing large amounts of data. A new raft configuration option
MaxCommittedSizePerReady was adding to etcd/raft in (etcd-io/etcd#10258)
which allows these two size parameters to be decoupled. This change adopts the
configuration and sets it to a default of 64MB.

On the below workload which is set up to always return exactly one entry per
Ready wiht the old configuration we see a massive win in both throughput and
latency.

```
./workload run kv {pgurl:1-3} \
    --init --splits=10 \
    --duration 60s \
    --read-percent=${READ_PERCENT} \
    --min-block-bytes=8193 --max-block-bytes=16385 \
    --concurrency=1024
```

```
name       old ops/s  new ops/s  delta
KV0         483 ± 3%  2025 ± 3%  +319.32%  (p=0.002 n=6+6)
```

Before:
```
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
   60.0s        0          29570          492.8   1981.2   2281.7   5100.3   5637.1   6442.5  write
   60.0s        0          28405          473.4   2074.8   2281.7   5637.1   6710.9   7516.2  write
   60.0s        0          28615          476.9   2074.3   2550.1   5905.6   6442.5   8321.5  write
   60.0s        0          28718          478.6   2055.4   2550.1   5100.3   6442.5   7516.2  write
   60.0s        0          28567          476.1   2079.8   2684.4   4831.8   5368.7   6442.5  write
   60.0s        0          29981          499.7   1975.7   1811.9   5368.7   6174.0   6979.3  write
```

After:
```
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
   60.0s        0         119652         1994.0    510.9    486.5   1006.6   1409.3   4295.0  write
   60.0s        0         125321         2088.4    488.5    469.8    906.0   1275.1   4563.4  write
   60.0s        0         119644         1993.9    505.2    469.8   1006.6   1610.6   5637.1  write
   60.0s        0         119027         1983.6    511.4    469.8   1073.7   1946.2   4295.0  write
   60.0s        0         121723         2028.5    500.6    469.8   1040.2   1677.7   4160.7  write
   60.0s        0         123697         2061.4    494.1    469.8   1006.6   1610.6   4295.0  write
```

Fixes #31511

Release note: None

Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
@craig
Copy link
Contributor

craig bot commented Nov 15, 2018

Build succeeded

@craig craig bot merged commit 4787e3c into cockroachdb:master Nov 15, 2018
@nvanbenschoten
Copy link
Member

Once #32600 is merged, I think we should backport this to 2.1 as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants