-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store: consumes lots of memory at startup and loop restart cause OOM #6643
Comments
I have experienced exactly the same issue. Closed my ticket as this one covers the same issue #6644 |
Same here. Can't get it to start up with 6GB of RAM when it used to run with ~3GB. |
I can only think about #6509 that might change the startup memory usage of store gateway. But ideally this change should improve the mem usage. Do you enable lazy index header? Can you share the config of store gateway? |
No, I use a pretty vanilla config I'd say. Failing config (formatted for readability):
This is currently running with |
@yeya24 I deploy the thanos store using kube-thanos. It starts with the following args in the manifest. Currently on v0.31.0
|
I have the same issue and upgraded to v31, all the other components works perfectly in 0.32 version but the store consume 1tb |
Maybe it would be possible for you to take a pprof memory profile during bootup of Thanos Store and share it here? Thanos Store exposes a pprof endpoint on |
the same as you |
@GiedriusS I'm sorry, but I can't actually try out and get pprof on production environment. Can someone share a pprof? |
@GiedriusS I have thanos running as container in kubernetes using kube-thanos. How can I find this endpoint? |
@rgarcia89 Can you port forward one of your store gateway pod using its http port, for example it is 8080.
You can get the heap profile by running the command above. Make sure to do it when store gateway starts... |
@yeya24 here you go - including the heap of running 0.31.0 thanos store vs the heap of a just started 0.32.0 thanos store which then crashed |
Oh thats the cuckoo filter completely. |
Do you have some figures around your cardinality? |
@MichaHoffmann anything specific you are looking for? Thanos is currently managing a time series of 1792044 entries |
This filter should scale ( theoretically ) with the amount of label names you have. But for 128gb it would need like 100s of millions ( again theoretically, maybe there is a bug somewhere ) |
Good question. I can see everything running smooth with v0.31.0 and also haven't seen any kind of alerts or issues in regards of high cardinality |
That cuckoo filter was introduced in 0.32.0 so that makes sense |
Maybe there is something counting wrong in that filter. Otherwise I am not sure how 128gb can be justified |
#6669 should address this |
How does it look like with v0.32.1? |
Very good. Issue is gone on my clusters |
Unfortunately still crashing with |
It looks like store queries around 50% more mem as before, but no escalation. |
thast not a oom crash!
|
Works for me now, I can check memory metrics later. |
Works perfectly and thank you guys to work hard to delivery the best to us |
@chalut01 can we close the issue? |
Fixed in v0.32.2 !! |
Thanos Version v0.32.0
Everything is set up the same way, only the version is different. Are there others similar to me?
The text was updated successfully, but these errors were encountered: