Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boltdb takes up 3.5x the space compared to Pebble or Leveldb #863

Open
flywukong opened this issue Nov 21, 2024 · 3 comments
Open

Boltdb takes up 3.5x the space compared to Pebble or Leveldb #863

flywukong opened this issue Nov 21, 2024 · 3 comments

Comments

@flywukong
Copy link

flywukong commented Nov 21, 2024

In my use case, my system's data is stored in SST files of Pebble and Leveldb and I plan to use bbolt db to replace them to enhance read performance . My tool read all the data from the Pebble DB and then wrote the same data to Bolt (into the same default bucket). The total amount of data written was approximately 323GB. After completing the write, I compared the disk usage of Pebble and Bolt DB. Pebble DB used 228GB, while Bolt DB used 800GB. This data amplification seems quite unacceptable.

Additionally, I found that in my scenario, Bolt was expected to have better read performance compared to Pebble DB. When the data volume was under 10GB, Bolt did perform better in read tests. When the data volume was under 5GB, Bolt's read latency was under 100 microseconds, which was 50% better than Pebble. However, when the Bolt DB reached 800GB, the read latency increased to over 2 milliseconds (while Pebble remained under 1 millisecond). This dramatic performance drop seems strange, could it be related to storing all the data in a single bucket? Could you provide some suggestions?

@flywukong flywukong changed the title Boltdb takes up 2x the space compared to Pebble or Leveldb Boltdb takes up 3.5x the space compared to Pebble or Leveldb Nov 21, 2024
@ahrtr
Copy link
Member

ahrtr commented Nov 21, 2024

The total amount of data written was approximately 323GB. After completing the write, I compared the disk usage of Pebble and Bolt DB. Pebble DB used 228GB, while Bolt DB used 800GB.

The default fillpercent of each page is 50%. Increasing this field (i.e. 0.9 or 1.0) should can increase the disk usage. In other words, it should decrease the db file size. But it may hurt the write performance, as it may cause the page to be split on any single K/V insertion. If you have very few write operation, then it makes sense to set a big FillPercent (i.e. 1.0).

Also try to compact the db file , the command is bbolt compact path-2-db-file.

When the data volume was under 5GB, Bolt's read latency was under 100 microseconds, which was 50% better than Pebble. However, when the Bolt DB reached 800GB, the read latency increased to over 2 milliseconds (while Pebble remained under 1 millisecond).

The performance result is aligned with my understanding. Note that bbolt maps the whole db file into memory. When the db file is too big, i.e. far bigger than the physical memory size, then you may encounter frequent page faults. This is one of the areas that we may consider to improve.

@ahrtr
Copy link
Member

ahrtr commented Nov 21, 2024

could it be related to storing all the data in a single bucket?

Evenly distributing the data into different buckets may increase the performance a little bit, because it will decrease the hierarchy levels of a B+tree when reading the data. But if you have a small db file (i.e. 20GiB), then there is no difference.

@ahrtr
Copy link
Member

ahrtr commented Nov 21, 2024

Also setting a proper larger page size (i.e. 32KB or 64KB) may also improve the performance for the super large db size case (i.e. > 100GB)

Refer to #401 (comment)

Please feedback if you have any new performance data, thx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants