Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]RocksDB磁盘占用问题 #2689

Open
1 task done
fantengfeiNana opened this issue Nov 5, 2024 · 4 comments
Open
1 task done

[Question]RocksDB磁盘占用问题 #2689

fantengfeiNana opened this issue Nov 5, 2024 · 4 comments
Labels
inactive question Further information is requested rocksdb RocksDB backend

Comments

@fantengfeiNana
Copy link

Problem Type (问题类型)

configs (配置项 / 文档相关)

Before submit

  • 我已经确认现有的 IssuesFAQ 中没有相同 / 重复问题 (I have confirmed and searched that there are no similar problems in the historical issue and documents)

Environment (环境信息)

  • Server Version: 1.0.0 (Apache Release Version)
  • Backend: RocksDB x nodes, HDD or SSD
  • OS: xx CPUs, xx G RAM, Ubuntu 2x.x / CentOS 7.x
  • Data Size: xx vertices, xx edges

Your Question (问题描述)

When I use RocksDB as the data storage for HugeGraph, I observe the following phenomenon: every time I create a knowledge graph, I dynamically use the graph name as the storage directory for the database. When the graph has no nodes or edges, the disk usage is nearly 400MB. Currently, my business scenario is similar to a consumer-facing application, and disk usage will be a bottleneck. Could you please advise on how to optimize the RocksDB configuration to reduce the metadata footprint of the graph and save disk space? Thank you. I have configured rocksdb.compression_per_level to SNAPPY, but the effect is not significant.
image
image

Vertex/Edge example (问题点 / 边数据举例)

No response

Schema [VertexLabel, EdgeLabel, IndexLabel] (元数据结构)

No response

Copy link

dosubot bot commented Nov 5, 2024

I found a similar issue that was previously discussed and closed. The issue was about RocksDB's memory occupation being out of control. The suggested solution included replacing the default ptmalloc with jemalloc to manage memory usage more effectively. This involves downloading and compiling jemalloc, then modifying the server's startup script to preload the jemalloc shared object file [1].

For optimizing RocksDB configuration to reduce the metadata footprint and save disk space, consider the following adjustments:

  1. Compression: Use a more aggressive compression algorithm like zstd for both the general and bottommost levels to reduce data size on disk.

  2. Manifest File Size: Reduce the maximum size of the manifest file to limit its disk usage.

  3. Memtable Size: Adjust the memtable size to control memory usage and potentially reduce disk writes.

  4. Number of Levels: Reduce the number of levels if the data size is small, which can help in reducing metadata.

  5. Direct Reads and Writes: Enable direct reads and writes to potentially reduce overhead.

  6. Optimize Compaction: Use optimized compaction settings to manage space more efficiently [2].

These changes should help in reducing the metadata footprint and saving disk space when using RocksDB with HugeGraph. Adjust these settings based on your specific use case and test the impact on performance and disk usage.

To continue talking to Dosu, mention @dosu.

@dosubot dosubot bot added question Further information is requested rocksdb RocksDB backend labels Nov 5, 2024
@fantengfeiNana fantengfeiNana changed the title RocksDB磁盘占用问题 [Question]RocksDB磁盘占用问题 Nov 6, 2024
@fantengfeiNana
Copy link
Author

@dosu 如何通过配置减小级别数

Copy link

dosubot bot commented Nov 11, 2024

要减少RocksDB的级别数,可以通过调整max_background_jobslevel0_file_num_compaction_trigger等配置项。具体步骤如下:

  1. max_background_jobs: 适当减少后台任务的数量,这可以影响压缩和合并的频率,从而影响级别数。

  2. level0_file_num_compaction_trigger: 增加此值可以减少触发压缩的频率,从而减少级别数。

  3. max_bytes_for_level_basemax_bytes_for_level_multiplier: 调整这些参数以控制每个级别的大小和增长速度。

rocksdb.properties文件中进行这些配置调整,然后重启服务以应用更改。请根据您的具体数据规模和性能需求进行测试和优化。

To continue talking to Dosu, mention @dosu.

Copy link

Due to the lack of activity, the current issue is marked as stale and will be closed after 20 days, any update will remove the stale label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
inactive question Further information is requested rocksdb RocksDB backend
Projects
None yet
Development

No branches or pull requests

1 participant