-
Notifications
You must be signed in to change notification settings - Fork 628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory issues: objects used constantly increasing, shutdown appears not to be graceful #2083
Comments
Thanks for your feedback @conorevans. You have really done a great job lining out what is going wrong and provided all needed context. (And a bonus point for investigate using profiles in Phlare 🙂 ) What you are seeing is mostly expected, as to how Phlare v0.1 works right now: Profiles are received and held in memory until either:
Then the profiles are written into a block on disk. As the estimated size is hardcoded at present, you will need to assign a memory limit of at least 4-8 times of that estimated size. We realize this is not good enough for everyone. We are planning to work on #2115 for the next release, that will significantly cut the memory consumption. As a workaround you can lower the Let me now how that goes and I will keep you updated in #2115 |
Hey @simonswine Ah, I see - thank you. Already since I last looked the docs have been fleshed out nicely and I can see that Might be worth noting above the Thanks! |
This is a very good idea and a PR from you with that would be very welcome as well 👍 |
Firstly, really cool software!
Describe the bug
Phlare doesn't seem to free any of the objects it uses
leading to a constantly growing memory profile (I've included some other StatefulSets like Prometheus/Loki within the test env for context) - value in MiB:
You can also observe that every time I tried to stop the Phlare StatefulSet, the node it was running on died (gaps in metrics due to node exporter dying) -- I would find kubelet to no longer be responsive. There were no reported events of any kind, and the instance had plenty of free memory beforehand (~1GiB), even with this Phlare issue. I had to reboot the machine to resolve.
To Reproduce
Run Phlare with a standard set-up
Expected behaviour
Memory can fluctuate but is freed appropriately
Environment
scrape_configs
- the only values I passed werestructuredConfig.storage.s3
to persist the data to S3.Additional Context
In the image above (I'll duplicate below), there was only one pod (in addition to Phlare itself) that Phlare was scraping in the first two or three lifecycles of the Phlare deployment. The final one had ~40 pods to be scraped. So the problem existed even with just one pod. The last lifecycle was >1h so I was waiting to see if there was some sort of headblock maybe à la Prometheus that needed to be uploaded and by default is done once per hour (I couldn't see docs on it), but that didn't seem to happen.
Thankfully I have heard of this cool new software called Phlare which can help us debug 😉
Goroutines show there was no real fluctuation in the work Phlare had to do
Alloc objects vs inuse objects:
As you can see almost all of it is in
convertSamples
The text was updated successfully, but these errors were encountered: