leveldb: introduce trivial version finalization #264

rjl493456442 · 2019-02-25T03:34:21Z

This PR introduces a bypass for quick version finalization.

For more context, checkout go-ethereum issue

We use leveldb as the storage engine in go-ethereum project. While people always complain about the long compaction pause for archive node(this type of node can have more than 1TB data now).

After some investigations, I found during the long compaction pause, almost I/O is idle while one CPU core is always full. And also from the pperf information from @karalabe, we can notice most of the time is spent on the bytes compare.

Finally I realize when the database size grow, the number of files per level also grow. In the go-ethereum project, we use default db setting now. It means for an archive node, it can have more than 500,000 sstable files.

After a compaction, leveldb will generate a new version by merging old version and change set.
And during the version generation, current code will apply qsort for every level, even most of them are unchanged. When the amount of data in the database increases, the number of files per layer also increases rapidly, so the overhead of qsort is very large.

The idea of this PR is:

Skip qsort for levels where content has not changed.
Make full use of the characteristics of compaction.
Because the new files generated by compaction are strictly ordered, and these new files will not have any overlap with other files of source+1 layer, so here we can use binary search to find the new file inserted index, then insert directly

This type of trivial version finalization is not suitable following events:

database version recover during the db open
journal recover
recover table when manifest is missing
transaction compaction
Since in these events, we cannot guarantee that the newly inserted file in the layer must not overlap with other files.

syndtr · 2019-02-26T15:32:08Z

LGTM.

I will merge this for now. But I think we need to tackle this either by reducing files count per level or find data structure that better handle the sheer amount of files or making compaction less frequent.

Anyway, out of curiosity have ever try set CompactionTableSizeMultiplier greater than 1.0? This should help reducing file count, however the downside it might increase disk I/O as the compaction would need to merge larger files on deeper level.

rjl493456442 · 2019-02-27T11:20:15Z

@syndtr A few good news to share here.

After 2 days benchmarking for ethereum archive syncing, now the experiment branch with my leveldb fix hasn't suffered any long time(If a write operation is paused more than 3 seconds, we will print some warning logs to users) write pause during the low-speed compaction. While the master branch will pause write operations about 30min per hour.

Now the master branch database size is about 442G and files number is 234,900.

And regarding the CompactionTableSizeMultiplier, if we can fix the problem of inundant files, it will be better to keep the CompactionTableSizeMultiplier as 1 since it is most precise for compaction and will not involve some unnecessary data entries into compaction.

More benchmarking information, please check ethereum/go-ethereum#19163

rjl493456442 force-pushed the speedup-commit-finailization branch from 49f90c4 to 034ae59 Compare February 25, 2019 05:29

leveldb: introduce trivial version finalization

c9bab57

rjl493456442 force-pushed the speedup-commit-finailization branch from 034ae59 to c9bab57 Compare February 25, 2019 05:37

rjl493456442 mentioned this pull request Feb 25, 2019

vendor: update leveldb ethereum/go-ethereum#19163

Merged

syndtr merged commit 7ca0152 into syndtr:master Feb 26, 2019

syndtr mentioned this pull request Mar 28, 2019

Compaction time can be quite big #226

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

leveldb: introduce trivial version finalization #264

leveldb: introduce trivial version finalization #264

rjl493456442 commented Feb 25, 2019 •

edited

Loading

syndtr commented Feb 26, 2019

rjl493456442 commented Feb 27, 2019 •

edited

Loading

leveldb: introduce trivial version finalization #264

leveldb: introduce trivial version finalization #264

Conversation

rjl493456442 commented Feb 25, 2019 • edited Loading

syndtr commented Feb 26, 2019

rjl493456442 commented Feb 27, 2019 • edited Loading

rjl493456442 commented Feb 25, 2019 •

edited

Loading

rjl493456442 commented Feb 27, 2019 •

edited

Loading