Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Self Monitor storage volume fill-up despite retention rules #1373

Merged
merged 5 commits into from
Aug 23, 2024

Conversation

hisarbalik
Copy link
Contributor

Description

Changes proposed in this pull request (what was done and why):

  • Recently we have observed on several clusters the self-monitor pod instances are getting evicted due to the error message Usage of EmptyDir volume exceeds the limit "100Mi". despite retention rules of 2 hours or 50MB of size. The TSDB storage size-based retention works in a way, it includes the WAL, checkpoint, m-mapped chunks, and persistent blocks, although TSDB counts all of those data to decide any deletion, WAL, checkpoints, and m-mapped chunks required for normal operation of TSDB. Only persistence blocks are deleted even if all those data blocks go beyond the configured retention size. The WAL segments can grow up to 128MB before compacting, and Prometheus will keep at least 3 WAL files. So to ensure Self Monitor doesn't get evicted due to volume exceeding the storage limit, we should increase volume size at 3 * WAL segment size + some space for the other data types.

Changes refer to particular issues, PRs or documents:

Traceability

  • The PR is linked to a GitHub issue.
  • The follow-up issues (if any) are linked in the Related Issues section.
  • If the change is user-facing, the documentation has been adjusted.
  • The feature is unit-tested.
  • The feature is e2e-tested.

@hisarbalik hisarbalik requested a review from a team as a code owner August 22, 2024 12:30
@kyma-bot kyma-bot added cla: yes Indicates the PR's author has signed the CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Aug 22, 2024
@hisarbalik hisarbalik added area/manager Manager or module changes do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Aug 22, 2024
@hisarbalik hisarbalik added this to the 1.22.0 milestone Aug 22, 2024
@hisarbalik hisarbalik added the kind/bug Categorizes issue or PR as related to a bug. label Aug 22, 2024
@hisarbalik hisarbalik removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 23, 2024
@kyma-bot kyma-bot added the lgtm Looks good to me! label Aug 23, 2024
@kyma-bot kyma-bot merged commit ddbc154 into kyma-project:main Aug 23, 2024
49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/manager Manager or module changes cla: yes Indicates the PR's author has signed the CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm Looks good to me! size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants