-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OPA out of memory #6753
Comments
Thanks for the detailed issue @itayhac. I tried to reproduce this by running OPA on docker and setting a 4GB memory limit. I increased the number of go routines from your script to send more concurrent requests to OPA. The maximum amount of memory consumed by OPA did not cross 200 MB. Is there something different in your actual setup vs the mock bundle you've provided here? I would expect the CPU usage to spike while OPA handles these requests but it's still unclear why OPA runs OOM. |
Hi @ashutosh-narkar , thank you so much for you fast and detailed reply. please retry and it should be reproduced. |
One thing I noticed in the policy is you're using the |
any further thoughts? |
the problem is reproduced with our own OPA image (we compile latest), and with both latest public images (static and non-static) |
This could be related to #5946. In your policy you're referring to a large object and this can be replicated if you modify the policy to refer to the object w/o using the |
@ashutosh-narkar, the work in #6040 focused solely on the CPU time aspect, and did not look at how memory usage was affected. |
The data has some objects and arrays and I wonder if when referenced inside of the policy the interface-AST conversions are impacting performance in terms of CPU and memory. |
We're looking to implement something like discussed in #4147. This should probably help with performance as we'll avoid the interface to AST conversion during eval. |
This issue has been automatically marked as inactive because it has not had any activity in the last 30 days. Although currently inactive, the issue could still be considered and actively worked on in the future. More details about the use-case this issue attempts to address, the value provided by completing it or possible solutions to resolve it would help to prioritize the issue. |
@itayhac are you able to repro this with OPA v0.67.0? I was unable to repro this so would be good to verify incase I missed something. |
This issue has been automatically marked as inactive because it has not had any activity in the last 30 days. Although currently inactive, the issue could still be considered and actively worked on in the future. More details about the use-case this issue attempts to address, the value provided by completing it or possible solutions to resolve it would help to prioritize the issue. |
we are working with OPA as our policy agent.
we deploy multiple instances of OPA as docker containers on kubernetes.
Each OPA has k8s memory limit of 4GB.
also, each OPA loads a bundle with data.json file of about ~15Mb.
recently we have noticed that some of our OPA instances have been restarted due to OOM.
after further investigation we have found out that it happens when OPA is receiving frequent requests and memory fails to get free fast enough, which in turns results in OOM very fast (within 3 seconds).
Disclaimer:
the bundle i share here is a mock data that best mimics our use case.
i will share the heapdump that we got for the mimic data, and for actual production data (both with same rego code).
Please note, these are functions are taking almost 90 percent of the memory and the service gets OOMed out within seconds.
this is also true for our production memory profile.
Steps To Reproduce
run the following command to start OPA:
opa run --bundle itay_kenv_files/test_15mb.tar.gz --server --pprof --log-level=info
run the code to trigger OPA requests
Expected behavior
memory should remain low or at least get free shortly after the requests are being made.
Code that sends 100 request to OPA
test_15mb.tar.gz
memory profile.zip
if further information regarding our production setup is required ill be happy to provide it.
The text was updated successfully, but these errors were encountered: