-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thanos Receive Memory Usage very high OOMKilled #6100
Comments
Maybe you could try out the newest |
It is hard to say whether this is a problem without knowing how much data you are sending to receivers. TSDB not being ready is highly unlikely to be the culprit. |
We have the OTEL Collector set up as a Daemonset, across 38 nodes (multiple different clusters), collecting metrics from pods, eks nodes and kubeapi, I don't have exact numbers on the volume unfortunately. |
It seems that it was related to the amount of data being received, I changed the OTEL Collector deployment from a Daemonset to a Replicaset with 5 nodes and the memory issues seem to have resolved themselves. Thanks for helping :) |
That makes sense, thanks for providing the reason 👍 |
Hi, I've been experiencing a lot of problems with Thanos Receive getting OOMKilled and going into a CrashBackupLoop anytime it tries to restart.
Thanos is deployed via Bitnami Helm Chart on EKS t3.2xlarge VMs (8vCPU 32GB RAM) across 6 nodes, the Thanos Receive deployment is configured in an AutoScaling group from 3-6 nodes and each replicas has 20Gi Memory, set to scale when Memory Usage hits 70%.
Almost instantly, Thanos Receive scales out to 6 pods and shortly after begins a CrashBackupLoop of OOMKilled for the termination reason.
In addition, when scraping pod metrics from OTEL Collector and sending to Thanos Receive, after Thanos Receive scales up to 6 nodes, the logs show issues with the TSDB not being ready which seems to occur as a result of the OOMKilled as well.
OTEL Collector
Thanos Receive
Please find the values.yaml config Thanos Receive using the Bitnami Helm Chart below.
The text was updated successfully, but these errors were encountered: