Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

leveldb: introduce trivial version finalization #264

Merged
merged 1 commit into from
Feb 26, 2019

Conversation

rjl493456442
Copy link
Contributor

@rjl493456442 rjl493456442 commented Feb 25, 2019

This PR introduces a bypass for quick version finalization.

For more context, checkout go-ethereum issue

We use leveldb as the storage engine in go-ethereum project. While people always complain about the long compaction pause for archive node(this type of node can have more than 1TB data now).

After some investigations, I found during the long compaction pause, almost I/O is idle while one CPU core is always full. And also from the pperf information from @karalabe, we can notice most of the time is spent on the bytes compare.

Finally I realize when the database size grow, the number of files per level also grow. In the go-ethereum project, we use default db setting now. It means for an archive node, it can have more than 500,000 sstable files.

After a compaction, leveldb will generate a new version by merging old version and change set.
And during the version generation, current code will apply qsort for every level, even most of them are unchanged. When the amount of data in the database increases, the number of files per layer also increases rapidly, so the overhead of qsort is very large.

The idea of this PR is:

  1. Skip qsort for levels where content has not changed.
  2. Make full use of the characteristics of compaction.
    Because the new files generated by compaction are strictly ordered, and these new files will not have any overlap with other files of source+1 layer, so here we can use binary search to find the new file inserted index, then insert directly

This type of trivial version finalization is not suitable following events:

  • database version recover during the db open
  • journal recover
  • recover table when manifest is missing
  • transaction compaction
    Since in these events, we cannot guarantee that the newly inserted file in the layer must not overlap with other files.

@syndtr
Copy link
Owner

syndtr commented Feb 26, 2019

LGTM.

I will merge this for now. But I think we need to tackle this either by reducing files count per level or find data structure that better handle the sheer amount of files or making compaction less frequent.

Anyway, out of curiosity have ever try set CompactionTableSizeMultiplier greater than 1.0? This should help reducing file count, however the downside it might increase disk I/O as the compaction would need to merge larger files on deeper level.

@syndtr syndtr merged commit 7ca0152 into syndtr:master Feb 26, 2019
@rjl493456442
Copy link
Contributor Author

rjl493456442 commented Feb 27, 2019

@syndtr A few good news to share here.

After 2 days benchmarking for ethereum archive syncing, now the experiment branch with my leveldb fix hasn't suffered any long time(If a write operation is paused more than 3 seconds, we will print some warning logs to users) write pause during the low-speed compaction. While the master branch will pause write operations about 30min per hour.

Now the master branch database size is about 442G and files number is 234,900.

And regarding the CompactionTableSizeMultiplier, if we can fix the problem of inundant files, it will be better to keep the CompactionTableSizeMultiplier as 1 since it is most precise for compaction and will not involve some unnecessary data entries into compaction.

More benchmarking information, please check ethereum/go-ethereum#19163

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants