Disk utilization fixes #9225

jwilder · 2017-12-13T22:49:47Z

Required for all non-trivial PRs

Rebased/mergable
Tests pass
CHANGELOG.md updated
Sign CLA (if not already signed)

This is a followup to #9206 to fix the higher disk utilization and lower write throughput in 1.4.2. One change in #9206 was to increase the snapshot size in order to create fewer and larger level 1 TSM files. This helped to slow down the frequency of compactions. Unfortunately, users with many databases or lots of active shards could OOM the process each shard is consuming more memory now.

This PR reverts that fix and fixes the issue slightly differently.

Snapshot concurrency is adjusted based on latency vs cardinality.
All TSM writing (compaction/snapshotting) is rate limited to reduce the impact of bursty snapshots and TSM fsyncs. The WAL is not part of this change.
Change compaction planning to run less frequently and adjusted the criteria for plans. Since compactions are completing roughly 2x-3x faster in 1.4 vs 1.3, we need to slow them down lot.
Fixed a compaction bug where only "fast" compactions would be run. This causes TSM files to be less compressed and bigger until a final full compaction runs.

This is a run on c4.4xlarge w/ 10B values, 2.5m series and 5 writers. The 1.4.2 run is only 1/5 of the way through, but the performance difference can be seen already. You can see in 1.4.2 that there are too many compactions getting run (combined w/ other issues) which increases disk utilization and lowers write throughput. This PR has perf that is similar or better to 1.3 in different areas. There are significantly fewer compactions, and write throughput is ~30% better than 1.3 now.

Fixes #9217 #9201

This changes the approach to adjusting the amount of concurrency used for snapshotting to be based on the snapshot latency vs cardinality. The cardinality approach could use too much concurrency and increase the number of level 1 TSM files too quickly which incurs more disk IO. The latency model seems to adjust better to different workloads.

This reverts commit 171b427.

stuartcarnie

LGTM 👍

This limits the disk IO for writing TSM files during compactions and snapshots. This helps reduce the spiky IO patterns on SSDs and when compactions run very quickly.

Increase level 1 min criteria, fix only fast compactions getting run, and fix very large generations getting included in optimize plans.

Backport #9225

PR #9204 introduced a maximum default concurrent compaction limit of 4. The idea was to reduce IO utilisation on large systems with many cores, and high write load. Often on these systems, disks were not scaled appropriately to to the write volume, and while the write path could keep up, compactions would saturate disks. In #9225 work was done to reduce IO saturation by limiting the compaction throughput. To some extent, both #9204 and #9225 work towards solving the same problem. We have recently begun to notice larger clusters to suffer from situations where compactions are not keeping up because they have been scaled up, but the limit of 4 has stayed in place. While users can manually override the setting, it seems more user friendly if we remove the limit by default, and set it manually in cases where compactions are causing too much IO on large boxes.

jwilder added 2 commits December 13, 2017 13:17

Revert "Increase cache-snapshot-memory-size default"

6e3602c

This reverts commit 171b427.

jwilder added the needs-backport/1.4 label Dec 13, 2017

jwilder requested a review from stuartcarnie December 13, 2017 22:49

ghost assigned jwilder Dec 13, 2017

ghost added the review label Dec 13, 2017

stuartcarnie approved these changes Dec 14, 2017

View reviewed changes

jwilder force-pushed the jw-snapshost-concurrency branch 6 times, most recently from ebb518d to 9b8929f Compare December 15, 2017 04:50

Rate limit disk IO when writing TSM files

749c9d2

This limits the disk IO for writing TSM files during compactions and snapshots. This helps reduce the spiky IO patterns on SSDs and when compactions run very quickly.

jwilder force-pushed the jw-snapshost-concurrency branch 2 times, most recently from e321ee9 to 5893d49 Compare December 15, 2017 05:23

Adjust compaction planning

2d85ff1

Increase level 1 min criteria, fix only fast compactions getting run, and fix very large generations getting included in optimize plans.

jwilder force-pushed the jw-snapshost-concurrency branch from 5893d49 to 2d85ff1 Compare December 15, 2017 05:41

jwilder merged commit 31f1ec2 into master Dec 15, 2017

ghost removed the review label Dec 15, 2017

jwilder deleted the jw-snapshost-concurrency branch December 15, 2017 06:07

jwilder mentioned this pull request Dec 15, 2017

Backport #9225 #9233

Merged

4 tasks

jwilder added a commit that referenced this pull request Dec 15, 2017

Merge pull request #9233 from influxdata/jw-14-backport

2e38315

Backport #9225

jwilder removed the needs-backport/1.4 label Dec 15, 2017

e-dard mentioned this pull request Jul 18, 2018

Remove max concurrent compaction limit #10102

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disk utilization fixes #9225

Disk utilization fixes #9225

jwilder commented Dec 13, 2017

stuartcarnie left a comment

Disk utilization fixes #9225

Disk utilization fixes #9225

Conversation

jwilder commented Dec 13, 2017

Required for all non-trivial PRs

stuartcarnie left a comment

Choose a reason for hiding this comment