-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OPA deadlock with decision logs enabled #3722
OPA deadlock with decision logs enabled #3722
Comments
Thanks for reporting this! I'll try to have a closer look this week, but maybe someone beats me to it. 🤔 0.28.0 is a bit old -- would you be able to repeat your tests easily with a more recent version? (Anyhow, skimming the release notes, no fixed deadlock springs at me, though; only one related to bundle downloads and shutdown...) |
Thanks, @srenatus! |
Have you had a chance to bring these up here? There have been cache-related fixes since 0.29.0, perhaps that issue has even been resolved? Outside of this concrete problem reported here, it would be unfortunate to be stuck on an old version. |
Yes, there is already an open ticket for the same. #2978 |
@amitkatyal thanks for filing a detailed issue! It does indeed look like there's a bug in the opa-envoy-plugin. I believe the issue is that the opa-envoy-plugin isn't keeping the policy query transaction open when it logs the decision. The bundle plugin should not be able to commit until the decision is logged. Assuming this is correct, we'll get this fixed ASAP. |
@tsandall thanks for confirming the root cause. Will for the fix. |
@amitkatyal we are investigating this and we should have a fix out by next week. |
@ashutosh-narkar thanks for the update. |
@ashutosh-narkar, Any update? |
Hey @amitkatyal we've started to look into this in this change. We should be able to make progress on it this week. Thanks for your patience. |
@ashutosh-narkar Thanks for the update! |
This commit attempts to fix the deadlock that happens when bundle and decision logging are both enabled. The opa-envoy plugin creates a new transaction during query evaluation and closes it once eval is complete. Then when it attempts to log the decision, the decision log plugin grabs mask mutex and calls the PrepareForEval function in the rego package which tries to open a new read transaction on the store since the log plugin does not provide one. This call gets blocked if concurrently the bundle plugin has a write transaction open on the store. This write invokes the decision log plugin's callback and tries to grab the mask mutex. This call gets blocked because the decision log plugin is already holding onto it for the mask query. To avoid this, we keep the transaction open in the opa-envoy plugin till we log the decision. Fixes: open-policy-agent/opa#3722 Signed-off-by: Ashutosh Narkar <anarkar4387@gmail.com>
This commit attempts to fix the deadlock that happens when bundle and decision logging are both enabled. The opa-envoy plugin creates a new transaction during query evaluation and closes it once eval is complete. Then when it attempts to log the decision, the decision log plugin grabs mask mutex and calls the PrepareForEval function in the rego package which tries to open a new read transaction on the store since the log plugin does not provide one. This call gets blocked if concurrently the bundle plugin has a write transaction open on the store. This write invokes the decision log plugin's callback and tries to grab the mask mutex. This call gets blocked because the decision log plugin is already holding onto it for the mask query. To avoid this, we keep the transaction open in the opa-envoy plugin till we log the decision. Fixes: open-policy-agent/opa#3722 Signed-off-by: Ashutosh Narkar <anarkar4387@gmail.com>
This commit attempts to fix the deadlock that happens when bundle and decision logging are both enabled. The opa-envoy plugin creates a new transaction during query evaluation and closes it once eval is complete. Then when it attempts to log the decision, the decision log plugin grabs mask mutex and calls the PrepareForEval function in the rego package which tries to open a new read transaction on the store since the log plugin does not provide one. This call gets blocked if concurrently the bundle plugin has a write transaction open on the store. This write invokes the decision log plugin's callback and tries to grab the mask mutex. This call gets blocked because the decision log plugin is already holding onto it for the mask query. To avoid this, we keep the transaction open in the opa-envoy plugin till we log the decision. Fixes: open-policy-agent/opa#3722 Signed-off-by: Ashutosh Narkar <anarkar4387@gmail.com>
We are running openpolicyagent/opa version 0.28.0-envoy in the k8s cluster and are facing a deadlock issue after enabling the decision logs.
Steps to reproduce
After a couple of days of stress test, we are seeing that OPA is getting stuck. We captured the call stack of all the goroutines and figured out that there is a deadlock between the policy bundle download and policy evaluation goroutine.
Policy evaluation goroutine acquires the maskMutex inside the log plugin and attempts to take (rmu) read lock inside the inmem storage plugin but gets blocked as rmu mutex is already acquired by the downloaded goroutine.
Download goroutine acquires the (rmu) inside inmem Commit() and tries to acquire maskMutex inside compilerUpdated of log plugin but get blocked as maskMutex is already acquired by the policy evaluation goroutine.
The above sequence of events results in the deadlock.
Download Goroutine call stack
Policy Evaluation Goroutine call stack
The text was updated successfully, but these errors were encountered: