Boltdb takes up 3.5x the space compared to Pebble or Leveldb #863

flywukong · 2024-11-21T09:05:51Z

In my use case, my system's data is stored in SST files of Pebble and Leveldb and I plan to use bbolt db to replace them to enhance read performance . My tool read all the data from the Pebble DB and then wrote the same data to Bolt (into the same default bucket). The total amount of data written was approximately 323GB. After completing the write, I compared the disk usage of Pebble and Bolt DB. Pebble DB used 228GB, while Bolt DB used 800GB. This data amplification seems quite unacceptable.

Additionally, I found that in my scenario, Bolt was expected to have better read performance compared to Pebble DB. When the data volume was under 10GB, Bolt did perform better in read tests. When the data volume was under 5GB, Bolt's read latency was under 100 microseconds, which was 50% better than Pebble. However, when the Bolt DB reached 800GB, the read latency increased to over 2 milliseconds (while Pebble remained under 1 millisecond). This dramatic performance drop seems strange, could it be related to storing all the data in a single bucket? Could you provide some suggestions?

ahrtr · 2024-11-21T10:41:50Z

The total amount of data written was approximately 323GB. After completing the write, I compared the disk usage of Pebble and Bolt DB. Pebble DB used 228GB, while Bolt DB used 800GB.

The default fillpercent of each page is 50%. Increasing this field (i.e. 0.9 or 1.0) should can increase the disk usage. In other words, it should decrease the db file size. But it may hurt the write performance, as it may cause the page to be split on any single K/V insertion. If you have very few write operation, then it makes sense to set a big FillPercent (i.e. 1.0).

Also try to compact the db file , the command is bbolt compact path-2-db-file.

When the data volume was under 5GB, Bolt's read latency was under 100 microseconds, which was 50% better than Pebble. However, when the Bolt DB reached 800GB, the read latency increased to over 2 milliseconds (while Pebble remained under 1 millisecond).

The performance result is aligned with my understanding. Note that bbolt maps the whole db file into memory. When the db file is too big, i.e. far bigger than the physical memory size, then you may encounter frequent page faults. This is one of the areas that we may consider to improve.

ahrtr · 2024-11-21T10:47:56Z

could it be related to storing all the data in a single bucket?

Evenly distributing the data into different buckets may increase the performance a little bit, because it will decrease the hierarchy levels of a B+tree when reading the data. But if you have a small db file (i.e. 20GiB), then there is no difference.

ahrtr · 2024-11-21T11:01:20Z

Also setting a proper larger page size (i.e. 32KB or 64KB) may also improve the performance for the super large db size case (i.e. > 100GB)

Refer to #401 (comment)

Please feedback if you have any new performance data, thx

flywukong changed the title ~~Boltdb takes up 2x the space compared to Pebble or Leveldb~~ Boltdb takes up 3.5x the space compared to Pebble or Leveldb Nov 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Boltdb takes up 3.5x the space compared to Pebble or Leveldb #863

Boltdb takes up 3.5x the space compared to Pebble or Leveldb #863

flywukong commented Nov 21, 2024 •

edited

Loading

ahrtr commented Nov 21, 2024

ahrtr commented Nov 21, 2024

ahrtr commented Nov 21, 2024

Boltdb takes up 3.5x the space compared to Pebble or Leveldb #863

Boltdb takes up 3.5x the space compared to Pebble or Leveldb #863

Comments

flywukong commented Nov 21, 2024 • edited Loading

ahrtr commented Nov 21, 2024

ahrtr commented Nov 21, 2024

ahrtr commented Nov 21, 2024

flywukong commented Nov 21, 2024 •

edited

Loading