Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compactor memory consumption #3127

Open
kolesnikovae opened this issue Mar 21, 2024 · 1 comment
Open

Compactor memory consumption #3127

kolesnikovae opened this issue Mar 21, 2024 · 1 comment
Assignees
Labels
backend Mostly go code

Comments

@kolesnikovae
Copy link
Collaborator

At compaction, we load symbolic information into memory, which often causes problems in large-scale deployments. The compaction process typically includes at least two steps – split and merge:

  • Split (optional, but essential for large-scale deployments): The entire block set for a given compaction interval (e.g., one hour) is split into G groups (e.g., 32), with each group assigned to one of the compactors. Compactors simultaneously open all group blocks and stream profiles into S shards (e.g., 64), deduplicating series. Because it's not known in advance which symbol partition is referenced by a profile, all partitions are lazily loaded into memory and never released. A rewrite table is built for each partition, mapping stack traces from the source block to the destination block. Thus, symbols from G*S blocks would be loaded into memory. As an optimization, at the "split" step, we merge all symbols from G blocks into a single symdb and then copy it to each of the S blocks, resulting in only G+1 block symbols being kept in memory. The resulting symbols section is large and includes all symbolic information from the source group, even if the entries are not referenced by the block profiles.
  • Merge: Shards are grouped and assigned to compactors. Each shard merge is handled by a single compactor to deduplicate series of profiles. To merge 32 blocks (each group having one block for that shard) into a single one, stack traces need to be remapped. Additionally, symbols not referenced by the profiles in the resulting block are removed.

For example, if we compact 160 blocks with 50 compactors, 32 groups, and 64 shards (a real-life example):

  • We initiate 32 split jobs, each handling 5 blocks.
  • 32 compactors (out of 50) then split 5 blocks into 64 blocks, potentially producing up to 2048 L2 blocks. Each of these blocks contains a copy of symbols merged from the 5 source blocks.
  • Now we have up to 32 blocks per shard that need to be compacted.
  • 64 compactors (out of 50) then merge 32 blocks of each shard into a single block. This requires loading symbols of 5 * 32 blocks (all the source 160 blocks) into memory. Depending on the block contents, the amount of required memory can exceed dozens of gigabytes, leading to out-of-memory errors.
@kolesnikovae
Copy link
Collaborator Author

kolesnikovae commented Apr 5, 2024

Another problem is high memory consumption at the split step:

image

However, it should be possible to mitigate this by configuring the stage size

@kolesnikovae kolesnikovae added the backend Mostly go code label Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Mostly go code
Projects
None yet
Development

No branches or pull requests

1 participant