-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance issue when using optimize level 2 #6909
Comments
Hi there! And thanks for filing another issue :) The result of function calls should be cached by OPA for the scope a single evaluation, and as long as the input arguments are the same, repeating the call should be cheap. So at least to the extent that is true (and perhaps there are exceptions to that I'm not aware of), you're not decoding the token twice even if it looks like that's the case. If you're seeing evaluation time increase linearly to the number of identical decode calls made... yeah, that sounds like a bug. Either way, at least in this case the second call is redundant, and it would be good if the optimizer recognized that. |
Hi, But in my benchmarks the time is increasing linearly:D and sometimes even worse in the case when we were using glob.match there was a 10x to 100x increase in latency. I will investigate further:) but it correlates exactly with number of repetitions so It could be related to the caching mechanism |
Sounds good. Let us know what you find! |
@anderseknert I finally found time to take another look at it. The cache that you are referring to is nd_caching which is disabled by default, is poorly located in the documents and not mentioned in the performance section. So its safe to assume that someone uses optimize level 2 and doesn’t know about this configuration resulting in poor performance . I think improving the documentation would be a great step, but if the default behavior of nd_caching is preferred to be disabled and there is a reason for it, we can talk about fixing the optimizer. |
No, that's something else :) The Perhaps I'm wrong about caching for built-in function calls, I honestly don't remember. I know that evaluation of rules is cached for sure, but you've made me uncertain about built-in functions. @ashutosh-narkar might be able to provide some insights here as I don't do a whole lot of work on OPA internals myself. |
I actually read the code and ran the benchmark again , |
Yep I tested it. The cache on builtin function calls (other than http-call which has its own separate caching) is nd caching and jwt decode count as non deterministic and enabling the cache boosts the performance significantly. |
If you take a look at the code here you'll see the non-deterministic builtins are marked with the
Feel free to submit a PR if you think the docs can be improved. Thanks! |
@ashutosh-narkar what do you think about the optimizer problem creating duplicate function calls? Is that intentional? |
@srenatus @johanfylling any ideas about the repeated function calls in the optimized policy? If that's contributing to an increased eval latency, then we should investigate this. |
I think this can be worthwhile investigating. When the output from a non-deterministic built-in is reused, optimization is definitely not being optimal when it repeats the call rather than reuse its output. |
Hi, I took some time to investigate and this is my findings: I would like to solve the problem with some help do you by any chance know why do we use partialEval in optimize 2 instead instead of support version? what is the difference? I'd appreciate it:D |
Update: I’m currently working on a fix, the problem is on optimization level 2. Compiler tries to inline as much as it can but this result in repeated in-lines as a result of rego document-based format. Im trying to cache the in-linings through the eval and try to fetch index from there instead of evaluating if it exists there. |
``Hi again, so I'm a bit stuck, I'll try to explain the situation: package envoy.authz
import data.users
import input.attributes.request.http as http_request
import input.parsed_path as input_parsed_path
import rego.v1
default allow := false
default result := {}
# METADATA
# entrypoint: true
result := {
"allowed": allow,
"request_headers_to_remove": request_headers_to_remove,
"response_headers_to_add": response_headers_to_add,
}
allow if {
is_permitted
is_whitelisted
}
is_permitted if {
route_allowed
user.role in ["ADMIN", "OPERATOR"]
}
is_whitelisted if {
user.white_listed
}
request_headers_to_remove := ["x-user-id"]
response_headers_to_add := {
"x-user-id": user_id,
"x-user-status": user.status,
"x-route-permissions": route_allowed.permissions
}
user_id := http_request.headers["x-user-id"]
user := users[user_id]
route_allowed := { "permissions": ["GET_SOME"] } if {
http_request.method == "POST"
input_parsed_path == [input_parsed_path[0], "get", "smth", ]
object.subset(user.permissions, ["GET_SOME"])
}
route_allowed := { "permissions": ["SOME_OTHER"] } if {
http_request.method == "PUT"
input_parsed_path == [input_parsed_path[0], "some", "other", input_parsed_path[3], ]
object.subset(user.permissions, ["SOME_OTHER"])
}
route_allowed := { "permissions": ["FOO", "BAR"] } if {
http_request.method == "PUT"
input_parsed_path == [input_parsed_path[0], "foo", "bar", input_parsed_path[3], ]
object.subset(user.permissions, ["FOO", "BAR"])
} note that because of the in-lining and the route_allowed there should be 3 in our approach result tries to inline allow but can't because we need default value, so at first it tries to evaluate allow generating a new headref for it but as it saves the termbinding and tries to reuse that it no longer evaluates allow, allow is generated once with one of the route_allowed possible values but result gets generated correctly. so the problem arises from the fact that I save the in-lining and allow doesn't get evaluated again but I think it may be another flaw that the generation of multiple value inline I might need some help, cause up until now I've been trying to fix the in-lining design but I might not know something about the design or missing something thats why I decided to share and ask for some pointers or inside. (a little push would suffice) |
@johanfylling do you have any hints or ideas? |
I would need to put some time to familiarize myself a bit more with this issue to be able to give informed pointers. Let me get back to you on that. Some initial thought, though:
no 4 is probably the most salient point/question here. |
story time (you can skip this)
So long story short, Our company has been working on an API gateway authorization solution for our API's using OPA. Last week the project finally got to handle some real world request and 100ms latency ... It's CNCF we must've configured it wrong. We played with the configurations, turned off the logs, etc...
As a last resort I updated the version to v0.67.0 we used glob.match and things got much worse(#6908). So I ditched the glob.match used code-generation to generate the rules and got the response times down, But there has been an itch inside my mind ever since. The last 2 weekends I've been playing with the codebase and commands, trying to understand the problem and finally, I've got some results.
Short description
When using
opa build --optimize 2
to build a bundle, while having a rule like:this causes output to look something like this:
as you can see this results in decoding the jwt 2-times and consequently doubling the latency. I dug further to find where the exact location of the problem is this section I put some breakpoints and found out that it's treating the io package as a headref and thinking that dereferencing is always done in little time and isn't an expensive task.
In our case the optimization generated 3 sequential jwt decode and caused 3x time penalty compared to no-optimization
Possible Solutions
I enthusiastic and would like to contribute and learn more:D I'm open to ideas. and could open a PR.
Thanks for the great product, sorry about the rant:p
The text was updated successfully, but these errors were encountered: