-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Smoketest setup can OOM #869
Comments
|
|
Revisiting this, I also haven't been able to reproduce another OOM since I opened this issue. The memory usage is still very close to the 536MB limit though, so I can increase the limit to 768MB. After I started two batches of recordings and archived them at ~4:43:30, my recording shows cryostat's heap slowly increasing without making any more API calls. Is this expected behaviour due to the additional recording information being stored in cryostat? |
That shouldn't really be causing any real memory consumption increase. The graph is a little suspect looking with such a smooth and consistent slope upward, and there is no data on the other charts either - is there just missing data overall between those points? Either way, the increased memory usage over time could be allocations occurring for background tasks that Cryostat is processing like periodic rule archivers or JDP discovery. If there is still enough free memory available for the process then you won't see that heap usage drop and the memory reclaimed until the garbage collector decides to run. |
Yeah, there's no other data points on that slope. The rest of the Memory charts also show normal behaviour too. |
Okay. Sounds like something for us to keep an eye on at least. It would also be good if we can spend a bit more time doing some heap analysis before release and make sure we aren't leaking memory or doing anything silly that generates a lot of garbage. |
I started a profiling recording for ~1hr after making a few requests with graphql and auto rules. Besides the spikes from my requests, the heap usage stays at ~48 MiB. Heap usage looks normal to me. Most of the allocations are coming from When comparing memory usage with ParallelGC vs G1, I’m seeing similar CPU and memory usage with similar GC frequency and pause times. Not sure if there’s any other incentives for switching to ParallelGC. Sometimes I get this automated analysis result for TLABs. Is there anything we can do about this? |
Revisiting this, I think you can trigger an OOM if you open the web client on localhost:9000, start an automated rule that periodically archives recordings, then use a GraphQL query to start a recording on all targets at the moment when the auto rule is in the middle of archiving several recordings. Is it worth increasing the container memory limit to 768M for this scenario? |
Probably. That isn't a very far-fetched scenario in a real deployment that actually sees some use. If that sequence can take it down then I'm sure there are many other similar sequences that can, too, with similar low traffic levels. |
Originally posted by @jan-law in https://github.com/cryostatio/cryostat/pull/868#discussion_r830231817
The text was updated successfully, but these errors were encountered: