Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize memory usage for S3StreamMetadataImage. #618

Closed
superhx opened this issue Dec 29, 2023 · 2 comments · Fixed by AutoMQ/automq-for-rocketmq#881
Closed

Optimize memory usage for S3StreamMetadataImage. #618

superhx opened this issue Dec 29, 2023 · 2 comments · Fixed by AutoMQ/automq-for-rocketmq#881
Assignees
Labels
enhancement New feature or request

Comments

@superhx
Copy link
Collaborator

superhx commented Dec 29, 2023

Who is this for and what problem do they have today?

Why is solving this problem impactful?

In the scenario of a 10w partition, even if a stream object compaction is performed every 1 hour, running for a day will still generate 100000 * 24 stream objects, ultimately occupying at least 100+ MiB of metadata memory. If we consider longer retention time and multiple streams per Partition, the actual memory usage of metadata will be even higher.

Additional notes

@superhx superhx added the enhancement New feature or request label Dec 29, 2023
@superhx
Copy link
Collaborator Author

superhx commented Jan 3, 2024

Assuming the strategy of stream compaction is to compact the size of each individual object in a stream up to 1GiB, then for a cluster that stores 1PiB of data, the memory consumption per object would be around 50MiB, which is calculated as 1024 * 1024 * (memory usage per object's metadata).

Therefore, the optimization goals have changed to:

  • Optimize the memory structure of Image, saving memory overhead caused by data structures like Map.
  • Optimize the stream compaction strategy, compacting each stream object to 1GiB and eliminating unnecessary blocks based on the startOffset of the stream during the compaction process.

@superhx
Copy link
Collaborator Author

superhx commented Jan 5, 2024

image
1PB of data, 10w partitions, how many stream objects will there be in the end?
If it's accumulated in 10GiB batches, then there will be 100000 stream objects.

Simulating 1w => s3streamobject, 100 partitions, write for 100s is sufficient.

Result: 5000 stream objects generated.
5000, multiplied by 3, Controller Image + Broker Image + Controller processing layer

15000 s3streamobjects occupy 1MiB
s3objects occupy 1.5MiB

S3StreamsMetadataImage occupies 1.5MiB
S3ObjectsImage occupies 1.3MiB

Calculating: Estimated memory consumption is 2MiB for every 1w stream objects generated, and 20MiB for 10w stream objects.

daniel-y pushed a commit that referenced this issue Mar 14, 2024
Signed-off-by: Shichao Nie <niesc@automq.com>
ShadowySpirits pushed a commit that referenced this issue Mar 14, 2024
Signed-off-by: Shichao Nie <niesc@automq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant