[storage] use binary compression format to store the metadata of fuse #10265

sundy-li · 2023-02-28T07:05:00Z

Summary

Loading hits dataset into databend, it will generate ~ 16 segments.

Each segment's metadata is stored in JSON format. The segment contains so many fields which makes the file too large.

Hits dataset is small (9kw rows, 20GB compressed data), but the metadata just took 16 * 12M = 172MB, we need to reduce the metadata size (Storage as Binary & compression format).

❯ du -sh  _data/1/208469/_sg/f48d4df4462144a3a234fef0d8cd28bf_v2.json
12M     _data/1/208469/_sg/f48d4df4462144a3a234fef0d8cd28bf_v2.json

If we set table_meta_segment_count to zero, each query will load the segment metadata multiple times which works very slowly!

Thought we already cached this, it could be better in smaller size.

The text was updated successfully, but these errors were encountered:

jun0315 · 2023-03-18T08:53:05Z

/assignme

sundy-li added the A-storage Area: databend storage label Feb 28, 2023

This was referenced Feb 28, 2023

Roadmap 2023 #9448

Open

Release proposal: Nightly v1.1 #10334

Closed

BohuTANG assigned jun0315 Mar 18, 2023

BohuTANG mentioned this issue Apr 14, 2023

Release proposal: Nightly v1.2 #11073

Closed

7 tasks

jun0315 mentioned this issue May 5, 2023

feat(query): table meta optimize #11015

Merged

jun0315 closed this as completed May 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[storage] use binary compression format to store the metadata of fuse #10265

[storage] use binary compression format to store the metadata of fuse #10265

sundy-li commented Feb 28, 2023 •

edited

Loading

jun0315 commented Mar 18, 2023

[storage] use binary compression format to store the metadata of fuse #10265

[storage] use binary compression format to store the metadata of fuse #10265

Comments

sundy-li commented Feb 28, 2023 • edited Loading

jun0315 commented Mar 18, 2023

sundy-li commented Feb 28, 2023 •

edited

Loading