Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kube-prometheus-stack] Retention problems #4869

Open
brancomrt opened this issue Sep 20, 2024 · 10 comments
Open

[kube-prometheus-stack] Retention problems #4869

brancomrt opened this issue Sep 20, 2024 · 10 comments
Labels
bug Something isn't working

Comments

@brancomrt
Copy link

Describe the bug a clear and concise description of what the bug is.

I am experiencing issues with the configuration of retention policies in the kube-prometheus-stack when installed via Helm chart version 61.7.1.

I set the parameter prometheus.prometheusSpec.retention to a value of 10m or 1h for testing data rotation purposes, but the storage PVC keeps growing and does not clean up the data.

What's your helm version?

version.BuildInfo{Version:"v3.14.4", GitCommit:"81c902a123462fd4052bc5e9aa9c513c4c8fc142", GitTreeState:"clean", GoVersion:"go1.21.9"}

What's your kubectl version?

Client Version: v1.27.10 Kustomize Version: v5.0.1 Server Version: v1.28.12+rke2r1

Which chart?

kube-prometheus-stack

What's the chart version?

61.7.1

What happened?

I am experiencing issues with the configuration of retention policies in the kube-prometheus-stack when installed via Helm chart version 61.7.1.

I set the parameter prometheus.prometheusSpec.retention to a value of 10m or 1h for testing data rotation purposes, but the storage PVC keeps growing and does not clean up the data.

What you expected to happen?

Automatic cleanup of Prometheus storage data on the PVC

How to reproduce it?

Waiting for the retention period defined in the values.yaml and checking the storage size of the PVC prometheus-kube-prometheus-stack-prometheus-db-prometheus-kube-prometheus-stack-prometheus-0 to see if it decreases.

Enter the changed values of values.yaml?

prometheus.prometheusSpec.retention

Enter the command that you execute and failing/misfunctioning.

helm upgrade kube-prometheus-stack -n monitoring ./

Local values.yaml chart.

Anything else we need to know?

No response

@brancomrt brancomrt added the bug Something isn't working label Sep 20, 2024
@brancomrt
Copy link
Author

I am using a storage class that stores data on NFS.

storageSpec:
volumeClaimTemplate:
spec:
storageClassName: "nfs-client"
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 200Gi

kubectl get storageclasses.storage.k8s.io

NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
nfs-client cluster.local/nfs-subdir-external-provisioner Delete Immediate true 131d

@chanakya-svt
Copy link

@brancomrt I am also facing the same issue with the retention. I set my retention to 15m but the metrics are cleared and the wal size keeps increasing consuming my disk to the point that I am missing metrics because of no space on device.

Were you able to resolve this?

TIA

Below are my args in the statefulset passed to prometheus v2.54.1

--web.console.templates=/etc/prometheus/consoles    
--web.console.libraries=/etc/prometheus/console_libraries 
--config.file=/etc/prometheus/config_out/prometheus.env.yaml                       
--web.enable-lifecycle                                     
--web.external-url=https://redacted.com/prometheus-metrics
--web.route-prefix=/prometheus-metrics                                                                
--log.level=debug                                                              
--storage.tsdb.retention.time=15m
--storage.tsdb.path=/prometheus
--storage.tsdb.wal-compression
--web.config.file=/etc/prometheus/web_config/web-config.yaml

@chanakya-svt
Copy link

It was mentioned here in a comment that its resolved in v2.21 but I am using v2.54 and issue still persists.

@DrFaust92
Copy link
Contributor

I cant find exact ref to this but because default block size is compacted every 2 hrs you cannot set retention to below that value without changing serveral other parameters as well.

regardless, this is a ticket is relevant for upstream prom/operator and not the chart repo

@brancomrt
Copy link
Author

Thank you @DrFaust92

@rouke-broersma
Copy link
Contributor

This should be closed because it is not a bug but rather a limit of default prometheus configuration.

@chanakya-svt
Copy link

With the following args configuration, I am seeing the the max-block-duration is set to 6m and min-block-duration is set to 2h(see the attached screenshot). The durations looks backwards, and the retentions are not happening and the wal keeps growing.

But when I pass storage.tsdb.min-block-duration set to 1h and storage.tsdb.max-block-duration set to 2h as additional args, I see the wal is compacted every 1h or when it reaches256MB size. (in my case its size limit)

I am not sure if the chart is defaulting the values or its a upstream prometheus issue.

--web.console.templates=/etc/prometheus/consoles    
--web.console.libraries=/etc/prometheus/console_libraries 
--config.file=/etc/prometheus/config_out/prometheus.env.yaml                       
--web.enable-lifecycle                                     
--web.external-url=https://redacted.com/prometheus-metrics
--web.route-prefix=/prometheus-metrics                                                                
--log.level=info                                                              
--storage.tsdb.retention.time=1h
--storage.tsdb.retention.size=256MB
--storage.tsdb.path=/prometheus
--storage.tsdb.wal-compression
--web.config.file=/etc/prometheus/web_config/web-config.yaml

Screenshot 2024-10-07 at 10 21 17 AM

@rouke-broersma
Copy link
Contributor

@chanakya-svt a minimum block duration that is longer than the maximum block duration doesn't make sense.

@chanakya-svt
Copy link

@rouke-broersma I tried to look into the charts to see if the chart is passing any args thats causing this, but I couldn't pinpoint to anything. Can you confirm if this is upstream prometheus issue? if so, I can create an issue in the prometheus repo. thank you.

@mehrdadpfg
Copy link

we have the same issue with 2.51

@zeritti zeritti changed the title kube-prometheus-stack - Retention problems [kube-prometheus-stack] Retention problems Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants