fix: Self Monitor storage volume fill-up despite retention rules #1373

hisarbalik · 2024-08-22T12:30:13Z

Description

Changes proposed in this pull request (what was done and why):

Recently we have observed on several clusters the self-monitor pod instances are getting evicted due to the error message Usage of EmptyDir volume exceeds the limit "100Mi". despite retention rules of 2 hours or 50MB of size. The TSDB storage size-based retention works in a way, it includes the WAL, checkpoint, m-mapped chunks, and persistent blocks, although TSDB counts all of those data to decide any deletion, WAL, checkpoints, and m-mapped chunks required for normal operation of TSDB. Only persistence blocks are deleted even if all those data blocks go beyond the configured retention size. The WAL segments can grow up to 128MB before compacting, and Prometheus will keep at least 3 WAL files. So to ensure Self Monitor doesn't get evicted due to volume exceeding the storage limit, we should increase volume size at 3 * WAL segment size + some space for the other data types.

Changes refer to particular issues, PRs or documents:

Traceability

The PR is linked to a GitHub issue.
The follow-up issues (if any) are linked in the Related Issues section.
If the change is user-facing, the documentation has been adjusted.
The feature is unit-tested.
The feature is e2e-tested.

update self monitor storage size

8f9de79

hisarbalik requested a review from a team as a code owner August 22, 2024 12:30

kyma-bot added cla: yes Indicates the PR's author has signed the CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Aug 22, 2024

hisarbalik added area/manager Manager or module changes do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Aug 22, 2024

hisarbalik added this to the 1.22.0 milestone Aug 22, 2024

hisarbalik added the kind/bug Categorizes issue or PR as related to a bug. label Aug 22, 2024

hisarbalik added 3 commits August 22, 2024 21:06

Merge branch 'main' into fix-self-monitor-pod-eviction

2df40cd

Merge branch 'main' into fix-self-monitor-pod-eviction

5d7be15

Merge branch 'main' into fix-self-monitor-pod-eviction

ffef9b0

hisarbalik removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 23, 2024

skhalash assigned rakesh-garimella Aug 23, 2024

Merge branch 'main' into fix-self-monitor-pod-eviction

fc1f7c9

rakesh-garimella approved these changes Aug 23, 2024

View reviewed changes

kyma-bot added the lgtm Looks good to me! label Aug 23, 2024

kyma-bot merged commit ddbc154 into kyma-project:main Aug 23, 2024
49 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Self Monitor storage volume fill-up despite retention rules #1373

fix: Self Monitor storage volume fill-up despite retention rules #1373

hisarbalik commented Aug 22, 2024

fix: Self Monitor storage volume fill-up despite retention rules #1373

fix: Self Monitor storage volume fill-up despite retention rules #1373

Conversation

hisarbalik commented Aug 22, 2024

Description

Traceability