-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
http.send
possible memory leak?
#5381
Comments
I briefly checked your code. Throwing in some random idea:
|
The entire input object to
Yes. We could do that. It would have to be added in here. You mentioned:
We added this fix in |
Is this issue valid for your setup? |
There are two parts to the problem:
This means It is easier than one might think to be affected by this issue. Consider the following setup:
If OPA in this scenario processes 100 active users, OPA would insert 100KB worth of keys into It would be great if OPA considered the size of cache keys😄 |
Bytes wise it might be easily 50% overhead in our case. Values we load over
It makes sense to me. Removing just enough items to free the cache space just for this particular item seems to be fragile solution. Anyway ultimately this would help a lot in our case #5320 (because we use small TTL around 10s only)
request 500M, limit 1000M - having it bigger or smaller has effect only on how long it takes to hit the issue
very probably it is, I also noticed that multiple concurrent requests are send when there is cache miss etc. I will check this issue in more detail later |
@asleire Thanks for the example. I think it is actually pretty much our situation which causes the memory problems and unnecessarily high memory usage. I would consider these things to be solved:
What do you think? |
I think this alone would solve the problem entirely
This would be nice when OPA shares memory with other applications. Otherwise, I don't think it matters if cache keys are taken into consideration
I don't think this would do any good. OPA would still need as much memory as it does when the cache is near-full. In comparison, the second bullet point might prevent OPA's cache from ever getting full to begin with |
It should solve memory size predictability and OOMs but we would still run into situation that OPA memory is maxed out and constant (never getting any lower because of missing (periodic) cache cleanup)
I guess it always matters if you pay for e.g. 1G memory for instances holding stale
Yeah, probably overengineering. The other two improvements should be enough 👍 |
I've created this issue for including the cache key in the cache size calculation. |
@ashutosh-narkar Hi, any updates on this one? Even after v0.47.3 bump (can be seen on the graph below) which fixes some cache related problems we are still experiencing this ever raising memory usage not respecting any cache limits (here is configured 100M cache limit): Thanks! |
💭 Do we have a more specific cache-size metric? The one here is including all heap allocations. Just wondering. |
Not likely https://www.openpolicyagent.org/docs/latest/monitoring/#prometheus ? However it would be nice to have some. I hope that this fix should help us to see memory usage stop raising at around 130MB. |
Ah I missed this one fix. I will bump OPA and keep it running for few days so we will know if it helped! Thanks a lot. |
@asleire so unfortunately, 0.48.0 didn't solve our issue. Cache size: 100M, growing way over that until OOM kill. I think the key size would have to be taken into account for cache size limit to fix it. Thoughts? |
Yep, that is the fix. The workaround however is to simply lower your cache limit. If for instance your average key size is 1kb and average response size is 200b, a cache limit of 100mb would mean 100mb of responses along with 500mb worth of keys for a total of 600mb of data |
Using #5385 to keep track of the changes needed in |
Short description
Our policy sends many cached
http.send
requests with lowmax-age
10s. OPA server is configured with 100M max cache size:I noticed that memory usage reported by OPA
go_memstats_alloc_bytes
looks like this:It is similar to what k8s reports to us. Also
gc
reports from GODEBUG are similar.It doesn't look like the max cache size has any effect and eventually the OPA pod will crash on OOM.
We do not experience any errors, all works as expected only the memory is still growing until pod gets killed.
Http send looks like:
Version:
Expected behavior
Memory usage should stay consistent around 100M-150M
Any ideas what could be wrong here? Thanks!
Possibly related: #5320
The text was updated successfully, but these errors were encountered: