-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: L0->Lbase compactions not keeping up with flushing #203
Comments
Ah, I think I understand. RocksDB is not escaping from this trap. The L0->L0 compactions were just obscuring the issue. (Concurrent compactions are needed for L0->L0 compactions to take place). Here is what RocksDB looks like after a longer run:
Only 16 tables in L0. Yay! Except notice that the 16 tables consume 5.6 GB of disk space. Each table is huge. Also notice that L5 is much larger than L6. Debug logs show that the last L5->L6 compaction happened 51s into the 5m run. After that, all RocksDB was doing was L0->L0 and L0->L5 compactions. On the bright side, I understand what is happening in both Pebble and RocksDB. I need to think more about what can be done. #136 is a possibility, where we allow compactions from more than 2 levels, and also allow flushing to generate more than 1 table so that the L0 tables do not cover all of Lbase. @ajkr do you have an additional thoughts? |
It is odd that the RocksDB heuristics do not add another level at some point. On a whim, I disabled the "dynamic_level_bytes" option, which produced:
The structure of this LSM looks more sane than the one above, yet L0 is still too large in terms of both size and files. The overall throughput was also significantly lower in this run. It's interesting how the compaction heuristics are getting stuck in corners from which they don't seem to be able to break out of. |
Triaging with @jbowens we vote to close |
On a
c5d.4xlarge
instance, thepebble sync
workload shows good write performance, but a problematic behavior. L0->Lbase compactions are not keeping up with flushing, leading to an ever growing number of files in L0.(I tweaked the Pebble options to set the L0 stop writes threshold to 1000).
The behavior that is happening is that Pebble sees the large number of L0 sstables and decides to compact them into Lbase (L5 in this case). The workload is generating uniformly random keys, so the L0 sstables overlap all of Lbase. That means that an L0->Lbase compaction will have
111+374==485
input sstables, totaling 6 GB. That compaction necessarily takes a long time, and while it is proceeding further L0 tables build up. When the L0->Lbase compaction finishes, there are enough L0 tables to require another L0->Lbase compaction. The real starvation here is the L5->L6 compactions.RocksDB somehow avoids this egregiously bad behavior, though I'm not quite sure how yet. It seems to be a combination of L0->L0 compactions, and concurrent compactions. If I disable L0->L0 compactions, RocksDB sees the same behavior as Pebble. If I disable concurrent compactions, RocksDB sees the same behavior as Pebble. I'm somewhat suspicious it is also related to the lower write throughput of RocksDB I see on this workload. An interesting side-effect of L0->L0 compactions is that they lower the number of files in L0 which lowers the L0 compaction score. Perhaps that is allowing Lbase->Lbase+1 compactions to be scheduled.
A limitation both Pebble and RocksDB currently suffer, is that an L0->Lbase compaction locks out a concurrent Lbase->Lbase+1 compaction. This is mentioned in https://github.com/petermattis/pebble/issues/136.
The text was updated successfully, but these errors were encountered: