Skip to content

mv tuning5

Matthew Von-Maszewski edited this page Jan 30, 2014 · 7 revisions

Status

  • merged to develop
  • code complete
  • development started

History / Context

This page discusses a second set of tunings and fixes made to basho/leveldb as part of the final Riak 2.0 release preparation.

The previous tuning branch, mv-tuning4, raised the compaction threshold count from 4 to 6 for overlapped .sst table files in levels 0 and 1. The gLevelTraits table at the top of db/version_set.cc was adjust accordingly. This branch further adjusts the gLevelTraits table based upon execution results on various hardware platforms and analysis of leveldb's LOG files. Baseline tests indicate these new adjustments reduce the write amplification in specific tests from 6 to 4 … improving ingest speed and overall throughput.

2i testing showed that the new aggressive delete feature interacted poorly with 2i. Specifically the compactions triggered in levels 0 and 1 by aggressive delete had a horrible impact on overall leveldb performance. This branch restricts aggressive delete to levels 3 and above.

Branch description

db/version_set.cc

All mv-tuning5 edits are isolated to this file.

gLevelTraits adjustments

  • Levels 0 & 1: unchanged
  • Level 2: this is the "landing" level. This is the level that receives the first merge sorting of the extra large overlapping .sst files from level 1. m_MaxBytesForLevel was not raised sufficiently in mv-tuning4 branch. New logic in mv-tuning4 make it likely that eight level-1 files will often compact into level-2. The larger number keeps the system from applying throttle due to a "normal scenario".
  • Level 3: analysis of leveldb's LOG files showed that quite often compactions from level-2 to level-3 where requiring large percentages of the level 3 files to participate in each compaction. This happen mostly during early data loading where Google's grandfather calculation works poorly. The smaller size for level-3 minimized the impact.
  • Level 4, 5, 6: These levels are have m_MaxBytesForLevel and m_DesiredBytesForLevel at 20 times the level preceding them (previously was 10 times). Early tests show this to be good, but no multi-terabyte tests have yet executed to fully validate. The multiple may change again in the future.

extra Finalize() calls removed

The 1.4 threading model used Finalize() as a method to update a database's (vnode's) next choice for compaction and/or the next choice's compaction priority. The 2.0 threading model does not need these extra calls.

2.0 threading model is able to schedule multiple, simultaneous compactions for any database (vnode). There is no longer a need to update compaction decisions / priorities.

Shift delete_threshold logic in VersionSet::Finalize()

This change is a very simple cut / paste. However, the github diff view makes it look really, really complex. Here is a text description of the cut / paste based upon original source line numbers:

The original delete_threshold code comprised an "if" block from lines 1156 to 1175 in the original file. It trailed an "if/else" block that existed from 1113 to 1153. The cut removed 1156 to 1175, then pasted them before line 1153. This moved the delete_threshold code entirely within the "else" clause that preceded it.

Why? The goal was to limit the application of aggressive delete to only non-overlapped levels (2, 3, 4, 5, and 6). The "if/else" block from 1113 to 1153 applied logic to the overlapped levels within the "if" and applied logic to the non-overlapped levels within the "else". Moving the delete_threshold block within the "else" effectively applied it to only non-overlapped levels instead of all.

Clone this wiki locally