-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rego policy has poor performance #2281
Comments
The data.json format can be described with this piece of GO code:
|
ping @tsandall for help |
@WeiZhang555 Hi! Thanks for filing this with info to help reproduce. One comment and one question right off the bat:
|
@tsandall Thanks very much for your quick response!
Nope, in the
The ExactMatch used map(in rego it's Object) and it has 2 elements, since map is quite efficient, this won't be the bottleneck. For each element in map, To calculate the conditions, each In the whole ======== For each condition:
and example json:
It means (a OR b OR (NOT c) OR (not d)), and among ===== |
Is this problem solvable? If true, do you have plan to improve WASM-compiled policy performance? @tsandall |
The current eval latency on my machine with provided policy/data/input is ~145ms...
A lot of this just comes from the interpretive overhead. Simply iterating over all # $ opa bench -d control.rego -d data.json -i input.json 'data.control.search'
# +-------------------------------------------+------------+
# | samples | 100 |
# | ns/op | 10699421 |
# | B/op | 6713139 |
# | allocs/op | 205417 |
# | histogram_timer_rego_query_eval_ns_75% | 11261398 |
# | histogram_timer_rego_query_eval_ns_90% | 11844537 |
# | histogram_timer_rego_query_eval_ns_95% | 12487026 |
# | histogram_timer_rego_query_eval_ns_99% | 14215295 |
# | histogram_timer_rego_query_eval_ns_99.9% | 14227405 |
# | histogram_timer_rego_query_eval_ns_99.99% | 14227405 |
# | histogram_timer_rego_query_eval_ns_count | 100 |
# | histogram_timer_rego_query_eval_ns_max | 14227405 |
# | histogram_timer_rego_query_eval_ns_mean | 10686602 |
# | histogram_timer_rego_query_eval_ns_median | 10957291 |
# | histogram_timer_rego_query_eval_ns_min | 9358892 |
# | histogram_timer_rego_query_eval_ns_stddev | 986429 |
# +-------------------------------------------+------------+
search {
data.exact["test0.com"][i].conditions[j]
} Adding inclusive and exclusive statements on conditions increases the latency
However, with 5,000 conditions to check, we're looking at roughly 225ms to check In theory wasm could help here however the older benchmarks (only) showed a speedup Given this, any solution with OPA will have to be able to leverage rule indexing.
Note that rule indexing relies on the conditions being encoded in the policy not inside
How many conditions do you expect to have to load into a single OPA instance? E.g., Thanks for the detailed writeup and for providing insight into the use case. If you |
I appreciate your time on this issue so much! @tsandall
Nope, this is the worst case I believe, in real world, I guess only several rules could match.
In our production system, we have more than 5000 hosts. In my previous test, if I add 5000 hosts in single data file, the data file would be too large and the data loading and json parsing could take too long. So we decided to only put 1 host in single data.json and we will have 5000 data.json in separate packages.
Recently I'm trying to introduce OPA to our internal system, this is the first try to push OPA to our production. We have several nginx servers as ingress network traffic routers between public and our web service, it's very sensitive to security and performance. The performance requirement is that >90% requests must be handled in 1ms, so this actually leaves less than 1ms (may be 300us) to OPA. In my preliminary test, WASM compiled REGO policy can handle 10~20 rules in less than 100us, this is inspiring, but after the rule numbers increase, it's performance turns bad so quickly. So now we're really stuck at the performance issue 😢 |
Today I got some progress, after I changed the initial instance memory of policy WASM to much bigger number, the latency quickly decreased from 1.6s to 20ms. As a contrast, It's closer to our target now 😄 |
@WeiZhang555 that sounds promising! Let us know if the performance is good enough with the larger heap size. If you need to reduce the time further, there is probably quite a bit of performance optimization that could happen in the query planner, wasm backend, and C code that would help. We'd have to do some profiling and analysis first. |
@tsandall I will try more to decrease the latency to 1ms, thanks for your suggestion, I will read the codes and investigate more. 😄 |
@WeiZhang555 I'm going to close this for now. Also, in #2328 we improved set/object implementations to use hashtables (they were using linked lists before) so the performance should be more inline with expectations (e.g., object lookup is constant-time now as it should be.) |
@tsandall OK. Thanks for your help any way 😄 |
Expected Behavior
We're trying to implement some network traffic routing functions in our system with REGO policies, for the best performance, we choose to compile REGO to WASM format and run it with wasmer project.
The reason we choose WASM is that in our preliminary tests we found that WASM has better performance, we hope to use REGO to match HTTP headers/cookies/queries etc, and expect it to handle 5000 rules in less than 1ms, but the performance can't reach our expectation.
Actual Behavior
time opa eval ...
only costs 1229ms, which is less thanwasm
format. So I think there should be some bugs or problems introduced while generating the WASM binary.Steps to Reproduce the Problem
I'll paste the rego file here:
===========================
control.rego:
===========================
data.json:
data.json.txt
===========================
input.json:
The integration code with
wasmer
is too many, so I won't paste it here unless it's really necessary.I want to know that if it's possible to finish evaluation in less than 1ms or it's actually impossible? Is that a bug that WASM runs slower than
opa eval ...
?The text was updated successfully, but these errors were encountered: