Skip to content

Compactor is unable to properly compact blocks #4677

Closed
@shybbko

Description

@shybbko

Describe the bug
I'm running two config-identical Cortex clusters, let's say: prod & nonprod.
Nonprod looks fine.
In prod it seems that compactor is unable to properly compact blocks and this leads to prod having ~11 000 blocks, while nonprod ~600 (the volume of data alone is not 20x bigger on prod, so this is unexpected).
Having 11k blocks causes problems with store gateway pods which tend to take a lot of time to load blocks and, until all blocks loaded, the cluster does not work great.
Why am I assuming that compacting does not work for prod? It seems that upon successful compaction there should be some entries such as compacted blocks and marking compacted block for deletion etc. There are none, only endless entries like the ones above. Also I'm running various dashboards, ie. https://github.com/monitoring-mixins/website/blob/master/assets/cortex/dashboards/cortex-compactor-resources.json which shows literally no compacted blocks for prod (and some for nonprod). Sharing my logs below.

I am aware that there are at least several issues that could be causing compactor not to work #4453 or #3569, but I'd very much welcome any hints that could allow me to unblock compacting as the current volume of blocks makes the cluster prone to not working properly (which is not great for production usage, obviously).

Expected behavior
Compacting works, the amount of blocks in prod in not ~20x the amount of blocks in nonprod (more like ~3-4 times at best).

Environment:
K8s in GKE v1.21, deployed by the official cortex Helm chart v1.4.0

Storage Engine
Blocks

Additional Context
Two sets of logs are here:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions