-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix deadlock when bundle and decision logging enabled #279
Fix deadlock when bundle and decision logging enabled #279
Conversation
This change was tested by running the opa-envoy plugin with both the decision log and bundle plugin enabled. The max upload frequency and the max bundle download frequency was 1 sec. The test ran close to 6 hours. No deadlock was observed. |
The original deadlock as documented in open-policy-agent/opa#3722 was reproduced as below:
The decision log plugin would grab the mask mutex and sleep. Concurrently the bundle plugin would open a write transaction on the store and lock the read mutex. It would then try to grab the mask mutex which the decision log plugin is holding on to and get blocked. The decision log plugin would wake up and then try to open a read transaction on the store and get blocked. With the fix in this PR, this deadlock was not observed. |
internal/internal.go
Outdated
if txnErr != nil { | ||
p.Store().Abort(ctx, txn) | ||
} else { | ||
err = p.Store().Commit(ctx, txn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's any reason to commit here; the check handler is read-only so we can always just abort the transaction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
envoyauth/response.go
Outdated
} | ||
|
||
return &er, stop, nil | ||
} | ||
|
||
// WithTxn sets the transaction on the evaluation result | ||
func (result *EvalResult) WithTxn(txn storage.Transaction) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably just drop this function since the Txn is exported on the EvalResult anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
envoyauth/evaluation.go
Outdated
} | ||
|
||
result.TxnID = txn.ID() | ||
err := getRevision(ctx, evalContext.Store(), result.Txn, result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the Eval()
function is exported now (instead of being hidden under internal/
), someone could be calling. If someone is calling it, this change is going to break them because the function doesn't open a transaction anymore. I could see two approaches here:
- Change the function signature so that at least callers break and can't compile
- Make the change backwards compatible so that the code opens a txn if needed
I think that (2) wouldn't be too difficult with a bit of refactoring to avoid duplication, but I'd have to write the code myself to find out. Could you see if it's doable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored the code a bit to make changes backwards compatible. LMKWYT.
This commit attempts to fix the deadlock that happens when bundle and decision logging are both enabled. The opa-envoy plugin creates a new transaction during query evaluation and closes it once eval is complete. Then when it attempts to log the decision, the decision log plugin grabs mask mutex and calls the PrepareForEval function in the rego package which tries to open a new read transaction on the store since the log plugin does not provide one. This call gets blocked if concurrently the bundle plugin has a write transaction open on the store. This write invokes the decision log plugin's callback and tries to grab the mask mutex. This call gets blocked because the decision log plugin is already holding onto it for the mask query. To avoid this, we keep the transaction open in the opa-envoy plugin till we log the decision. Fixes: open-policy-agent/opa#3722 Signed-off-by: Ashutosh Narkar <anarkar4387@gmail.com>
f3b5dc6
to
67911b9
Compare
This commit attempts to fix the deadlock that happens
when bundle and decision logging are both enabled.
The opa-envoy plugin creates a new transaction during
query evaluation and closes it once eval is complete.
Then when it attempts to log the decision, the decision
log plugin grabs mask mutex and calls the PrepareForEval
function in the rego package which tries to open a new
read transaction on the store since the log plugin does
not provide one. This call gets blocked
if concurrently the bundle plugin has a write transaction
open on the store. This write invokes the decision log plugin's callback
and tries to grab the mask mutex. This call gets blocked because
the decision log plugin is already holding onto it for the mask query.
To avoid this, we keep the transaction open in the opa-envoy plugin
till we log the decision.
Fixes: open-policy-agent/opa#3722
Signed-off-by: Ashutosh Narkar anarkar4387@gmail.com