Compactor memory consumption #3127

kolesnikovae · 2024-03-21T05:41:01Z

At compaction, we load symbolic information into memory, which often causes problems in large-scale deployments. The compaction process typically includes at least two steps – split and merge:

Split (optional, but essential for large-scale deployments): The entire block set for a given compaction interval (e.g., one hour) is split into G groups (e.g., 32), with each group assigned to one of the compactors. Compactors simultaneously open all group blocks and stream profiles into S shards (e.g., 64), deduplicating series. Because it's not known in advance which symbol partition is referenced by a profile, all partitions are lazily loaded into memory and never released. A rewrite table is built for each partition, mapping stack traces from the source block to the destination block. Thus, symbols from G*S blocks would be loaded into memory. As an optimization, at the "split" step, we merge all symbols from G blocks into a single symdb and then copy it to each of the S blocks, resulting in only G+1 block symbols being kept in memory. The resulting symbols section is large and includes all symbolic information from the source group, even if the entries are not referenced by the block profiles.
Merge: Shards are grouped and assigned to compactors. Each shard merge is handled by a single compactor to deduplicate series of profiles. To merge 32 blocks (each group having one block for that shard) into a single one, stack traces need to be remapped. Additionally, symbols not referenced by the profiles in the resulting block are removed.

For example, if we compact 160 blocks with 50 compactors, 32 groups, and 64 shards (a real-life example):

We initiate 32 split jobs, each handling 5 blocks.
32 compactors (out of 50) then split 5 blocks into 64 blocks, potentially producing up to 2048 L2 blocks. Each of these blocks contains a copy of symbols merged from the 5 source blocks.
Now we have up to 32 blocks per shard that need to be compacted.
64 compactors (out of 50) then merge 32 blocks of each shard into a single block. This requires loading symbols of 5 * 32 blocks (all the source 160 blocks) into memory. Depending on the block contents, the amount of required memory can exceed dozens of gigabytes, leading to out-of-memory errors.

kolesnikovae · 2024-04-05T09:41:55Z

Another problem is high memory consumption at the split step:

However, it should be possible to mitigate this by configuring the stage size

kolesnikovae self-assigned this Mar 21, 2024

kolesnikovae mentioned this issue Apr 5, 2024

feat: compactor rewriter LRU cache #3165

Merged

kolesnikovae added the backend Mostly go code label Apr 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compactor memory consumption #3127

Compactor memory consumption #3127

kolesnikovae commented Mar 21, 2024

kolesnikovae commented Apr 5, 2024 •

edited

Loading

Compactor memory consumption #3127

Compactor memory consumption #3127

Comments

kolesnikovae commented Mar 21, 2024

kolesnikovae commented Apr 5, 2024 • edited Loading

kolesnikovae commented Apr 5, 2024 •

edited

Loading