Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disk utilization fixes #9225

Merged
merged 4 commits into from
Dec 15, 2017
Merged

Disk utilization fixes #9225

merged 4 commits into from
Dec 15, 2017

Conversation

jwilder
Copy link
Contributor

@jwilder jwilder commented Dec 13, 2017

Required for all non-trivial PRs
  • Rebased/mergable
  • Tests pass
  • CHANGELOG.md updated
  • Sign CLA (if not already signed)

This is a followup to #9206 to fix the higher disk utilization and lower write throughput in 1.4.2. One change in #9206 was to increase the snapshot size in order to create fewer and larger level 1 TSM files. This helped to slow down the frequency of compactions. Unfortunately, users with many databases or lots of active shards could OOM the process each shard is consuming more memory now.

This PR reverts that fix and fixes the issue slightly differently.

  1. Snapshot concurrency is adjusted based on latency vs cardinality.
  2. All TSM writing (compaction/snapshotting) is rate limited to reduce the impact of bursty snapshots and TSM fsyncs. The WAL is not part of this change.
  3. Change compaction planning to run less frequently and adjusted the criteria for plans. Since compactions are completing roughly 2x-3x faster in 1.4 vs 1.3, we need to slow them down lot.
  4. Fixed a compaction bug where only "fast" compactions would be run. This causes TSM files to be less compressed and bigger until a final full compaction runs.

This is a run on c4.4xlarge w/ 10B values, 2.5m series and 5 writers. The 1.4.2 run is only 1/5 of the way through, but the performance difference can be seen already. You can see in 1.4.2 that there are too many compactions getting run (combined w/ other issues) which increases disk utilization and lowers write throughput. This PR has perf that is similar or better to 1.3 in different areas. There are significantly fewer compactions, and write throughput is ~30% better than 1.3 now.

metrics

Fixes #9217 #9201

This changes the approach to adjusting the amount of concurrency
used for snapshotting to be based on the snapshot latency vs
cardinality.  The cardinality approach could use too much concurrency
and increase the number of level 1 TSM files too quickly which incurs
more disk IO.

The latency model seems to adjust better to different workloads.
@jwilder jwilder requested a review from stuartcarnie December 13, 2017 22:49
@ghost ghost assigned jwilder Dec 13, 2017
@ghost ghost added the review label Dec 13, 2017
Copy link
Contributor

@stuartcarnie stuartcarnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@jwilder jwilder force-pushed the jw-snapshost-concurrency branch 6 times, most recently from ebb518d to 9b8929f Compare December 15, 2017 04:50
This limits the disk IO for writing TSM files during compactions
and snapshots.  This helps reduce the spiky IO patterns on SSDs and
when compactions run very quickly.
@jwilder jwilder force-pushed the jw-snapshost-concurrency branch 2 times, most recently from e321ee9 to 5893d49 Compare December 15, 2017 05:23
Increase level 1 min criteria, fix only fast compactions getting run,
and fix very large generations getting included in optimize plans.
@jwilder jwilder force-pushed the jw-snapshost-concurrency branch from 5893d49 to 2d85ff1 Compare December 15, 2017 05:41
@jwilder jwilder merged commit 31f1ec2 into master Dec 15, 2017
@ghost ghost removed the review label Dec 15, 2017
@jwilder jwilder deleted the jw-snapshost-concurrency branch December 15, 2017 06:07
@jwilder jwilder mentioned this pull request Dec 15, 2017
4 tasks
jwilder added a commit that referenced this pull request Dec 15, 2017
e-dard added a commit that referenced this pull request Jul 18, 2018
PR #9204 introduced a maximum default concurrent compaction limit of 4.
The idea was to reduce IO utilisation on large systems with many cores,
and high write load. Often on these systems, disks were not scaled
appropriately to to the write volume, and while the write path could
keep up, compactions would saturate disks.

In #9225 work was done to reduce IO saturation by limiting the
compaction throughput. To some extent, both #9204 and #9225 work towards
solving the same problem.

We have recently begun to notice larger clusters to suffer from
situations where compactions are not keeping up because they have been
scaled up, but the limit of 4 has stayed in place. While users can
manually override the setting, it seems more user friendly if we remove
the limit by default, and set it manually in cases where compactions are
causing too much IO on large boxes.
e-dard added a commit that referenced this pull request Jul 18, 2018
PR #9204 introduced a maximum default concurrent compaction limit of 4.
The idea was to reduce IO utilisation on large systems with many cores,
and high write load. Often on these systems, disks were not scaled
appropriately to to the write volume, and while the write path could
keep up, compactions would saturate disks.

In #9225 work was done to reduce IO saturation by limiting the
compaction throughput. To some extent, both #9204 and #9225 work towards
solving the same problem.

We have recently begun to notice larger clusters to suffer from
situations where compactions are not keeping up because they have been
scaled up, but the limit of 4 has stayed in place. While users can
manually override the setting, it seems more user friendly if we remove
the limit by default, and set it manually in cases where compactions are
causing too much IO on large boxes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]Performance decline of the 1.4 version
2 participants