Comprehension Indexing / Optimization with Bundles and Rego Policies #349

msvechla · 2023-02-23T15:28:36Z

msvechla
Feb 23, 2023

Hi everyone,

I'm still a bit new to OPA, but already getting into the depths of performance optimization.
As I noticed a big difference in rego query evaluation time when comparing test and opa server runtime performance, I did some profiling and discovered a function with a bottleneck.

The problem is actually similar to the issue outlined in the styra blogpost here (slow iteration over lots of elements vs. lookup of object keys): https://www.styra.com/blog/how-to-shape-opa-data-for-policy-performance/

A working example (with a reduced dataset) can be found below.

The original dataset (role_permission_mappings below) contains a mapping of role_urns to a list of permissions.
The method with the bottleneck originally was role_urns_for_permission, which returned a set of all role_urns that contain a given permission (so basically the inverse of the original dataset).

The code below already contains the optimized comprehensions map_permission_to_role_urns_bundle and map_permission_to_role_urns_inline which creates a more efficient data structure for returning all role urns for a given permission.

Actually this map should only be computed once, and the query role_urns_for_permission should then be a simple key lookup.

package iam

# original data structure
role_permission_mappings := {
    "urn:iam:role:global::accounting": {
        "permissions":["A","B","C","D","E","F","G","H"]
    },
    "urn:iam:role:global::operations":{
        "permissions":["A","B","C","D"]
    }
   #... lots more, large dataset
}

role_urns_for_permission_bundle(requiredPermission) := roleUrns {
    roleUrns := map_permission_to_role_urns_bundle[requiredPermission]
}

role_urns_for_permission_inline(requiredPermission) := roleUrns {
    roleUrns := map_permission_to_role_urns_inline[requiredPermission]
}

# map_permission_to_role_urns_inline is a map of permissions to the set of role URNs that have that permission
# this is a precomputed map to avoid having to iterate over all roles for every permission check
map_permission_to_role_urns_inline := {perm: roleURNs |
    some i
    perm := role_permission_mappings[_].permissions[i]

        roleURNs := {roleURN | 
            some j
            role_permission_mappings[roleURN].permissions[j] == perm
        }
}

# same comprehension as above, but this time accessing bundle data
# 
# map_permission_to_role_urns_bundle is a map of permissions to the set of role URNs that have that permission
# this is a precomputed map to avoid having to iterate over all roles for every permission check
map_permission_to_role_urns_bundle := {perm: roleURNs |
    some i
    perm := data.role_permission_mappings[_].permissions[i]

        roleURNs := {roleURN | 
            some j
            data.role_permission_mappings[roleURN].permissions[j] == perm
        }
}

When I profile this rego policy, it has good performance using the data defined inline in the policy.rego file (role_permission_mappings above):

opa eval  -b /tmp/bundles/static-data-bundle.tar.gz -d rego/policy.rego 'data.iam.role_urns_for_permission_inline("A")' --format pretty --profile --count 10

+------------------------------+----------+----------+----------------+------------------------+---------------+
|            METRIC            |   MIN    |   MAX    |      MEAN      |          90%           |      99%      |
+------------------------------+----------+----------+----------------+------------------------+---------------+
| timer_rego_data_parse_ns     | 509917   | 902750   | 612196.3       | 898279.3               | 902750        |
| timer_rego_load_bundles_ns   | 777083   | 1308250  | 910758.3       | 1.2932834000000001e+06 | 1.30825e+06   |
| timer_rego_load_files_ns     | 6749750  | 8551000  | 7.5021625e+06  | 8.541625e+06           | 8.551e+06     |
| timer_rego_module_compile_ns | 26194583 | 30113542 | 2.68986626e+07 | 2.98035211e+07         | 3.0113542e+07 |
| timer_rego_module_parse_ns   | 6627250  | 8442750  | 7.3845084e+06  | 8.435e+06              | 8.44275e+06   |
| timer_rego_query_compile_ns  | 36959    | 79041    | 54712.6        | 77599.40000000001      | 79041         |
| timer_rego_query_eval_ns     | 12376500 | 13529959 | 1.28141918e+07 | 1.34962423e+07         | 1.3529959e+07 |
| timer_rego_query_parse_ns    | 37708    | 116541   | 49141.6        | 109632.80000000002     | 116541        |
+------------------------------+----------+----------+----------------+------------------------+---------------+
+------------+------------+------------+------------+------------+----------+----------+----------------------------------------------------+
|    MIN     |    MAX     |    MEAN    |    90%     |    99%     | NUM EVAL | NUM REDO |                      LOCATION                      |
+------------+------------+------------+------------+------------+----------+----------+----------------------------------------------------+
| 4.366466ms | 4.763146ms | 4.58778ms  | 4.758933ms | 4.763146ms | 1        | 3586     | rego/policy.rego:34                                |
| 3.974064ms | 4.554346ms | 4.241923ms | 4.543697ms | 4.554346ms | 3586     | 3586     | rego/policy.rego:32                                |
| 2.774088ms | 3.773265ms | 3.033539ms | 3.700024ms | 3.773265ms | 1        | 3586     | rego/policy.rego:30                                |
| 22.042µs   | 44.708µs   | 28.283µs   | 43.358µs   | 44.708µs   | 1        | 1        | rego/policy.rego:3                                 |
| 10.292µs   | 26.875µs   | 14.745µs   | 25.933µs   | 26.875µs   | 1        | 1        | data.iam.role_urns_for_permission_inline("A") |
| 7.332µs    | 11.751µs   | 10.021µs   | 11.688µs   | 11.751µs   | 2        | 2        | rego/policy.rego:28                                |
| 4µs        | 5.291µs    | 4.52µs     | 5.249µs    | 5.291µs    | 1        | 1        | rego/policy.rego:11                                |
+------------+------------+------------+------------+------------+----------+----------+----------------------------------------------------+

When I however profile the same query against the data stored in the bundle, the performance is a lot worse:

opa eval  -b /tmp/bundles/static-data-bundle.tar.gz -d rego/policy.rego 'data.iam.role_urns_for_permission_bundle("A")' --format pretty --profile --count 10

+--------------------------------+----------+----------+----------------+------------------------+---------------+
|             METRIC             |   MIN    |   MAX    |      MEAN      |          90%           |      99%      |
+--------------------------------+----------+----------+----------------+------------------------+---------------+
| timer_rego_data_parse_ns       | 502542   | 1134916  | 666458.4       | 1.1051662000000002e+06 | 1.134916e+06  |
| timer_rego_external_resolve_ns | 208      | 291      | 241.5          | 286.90000000000003     | 291           |
| timer_rego_load_bundles_ns     | 727375   | 1844208  | 1.0684084e+06  | 1.8091622000000002e+06 | 1.844208e+06  |
| timer_rego_load_files_ns       | 6567958  | 9518500  | 7.3844582e+06  | 9.33e+06               | 9.5185e+06    |
| timer_rego_module_compile_ns   | 25138667 | 27425834 | 2.57508333e+07 | 2.72994006e+07         | 2.7425834e+07 |
| timer_rego_module_parse_ns     | 6449583  | 9058667  | 7.2420292e+06  | 8.906767e+06           | 9.058667e+06  |
| timer_rego_query_compile_ns    | 40417    | 66666    | 48625          | 65478.600000000006     | 66666         |
| timer_rego_query_eval_ns       | 67324000 | 70243500 | 6.84484957e+07 | 7.02426208e+07         | 7.02435e+07   |
| timer_rego_query_parse_ns      | 34666    | 54833    | 40670.6        | 54545.5                | 54833         |
+--------------------------------+----------+----------+----------------+------------------------+---------------+
+-------------+-------------+-------------+-------------+-------------+----------+----------+----------------------------------------------------+
|     MIN     |     MAX     |    MEAN     |     90%     |     99%     | NUM EVAL | NUM REDO |                      LOCATION                      |
+-------------+-------------+-------------+-------------+-------------+----------+----------+----------------------------------------------------+
| 31.630525ms | 32.917451ms | 32.434213ms | 32.900568ms | 32.917451ms | 1        | 3586     | rego/policy.rego:22                                |
| 28.705949ms | 31.229429ms | 29.708367ms | 31.127831ms | 31.229429ms | 1        | 3586     | rego/policy.rego:18                                |
| 4.774106ms  | 6.582454ms  | 5.294152ms  | 6.468503ms  | 6.582454ms  | 3586     | 3586     | rego/policy.rego:20                                |
| 9.208µs     | 237.792µs   | 35.091µs    | 215.733µs   | 237.792µs   | 1        | 1        | data.iam.role_urns_for_permission_bundle("A") |
| 8.668µs     | 15.75µs     | 11.087µs    | 15.708µs    | 15.75µs     | 1        | 1        | rego/policy.rego:7                                 |
| 3.5µs       | 11.209µs    | 5.595µs     | 10.938µs    | 11.209µs    | 2        | 2        | rego/policy.rego:16                                |
+-------------+-------------+-------------+-------------+-------------+----------+----------+----------------------------------------------------+

You can see the MEAN difference of e.g. 4.58778ms via inline data and 32.434213ms via bundle data.
The bundle contains exactly the same data as inlined in the .rego file (see role_permission_mappings above, just lots more data).

Am I missing something here? Why is there such a big difference in evaluation time?
As mentioned in the beginning, should the results of the comprehension map_permission_to_role_urns_bundle not be cached or indexed, as it only has to be computed once?

One possible reason I could think of, would be that OPA tries to do some optimization with the rego policies and data upfront and this can only be done when rego policies and the data are combined (instead of split in different bundles for example). If this is the case, what could be done to fix this?

We split the policy rules and the static data into separate bundles, because they are distributed by different services during runtime. Is this the reason why we are seeing bad query evaluation performance during runtime with opa server?
I would expect OPA to still do these optimizations after all bundles have been loaded.

If you can share any insights here it would be greatly appreciated. I'm still learning.
Thanks for your help already!

anderseknert · 2023-02-24T11:36:08Z

anderseknert
Feb 24, 2023
Maintainer

That's an interesting observation! Thanks for sharing 👍 Certainly is a huge bump in timer_rego_query_eval_ns. We'll need to look into it.

0 replies

msvechla · 2023-02-27T13:45:08Z

msvechla
Feb 27, 2023
Author

Thanks a lot @anderseknert , can you already share some insights?

Also: Is my assumption correct that map_permission_to_role_urns_bundle should only be evaluated once on the opa server and from then on be completely served from the cache? Because for me even if I evaluate this query on the opa server multiple times, all request take a long time and I can not see any difference in evaluation time. Is there any way I can determine whether the caching works as expected?

EDIT: Looking at this issue it seems like opa will not cache map_permission_to_role_urns_bundle across requests, is that true? As map_permission_to_role_urns_bundle is simply a one-time transformation of existing data, what would be the best way to implement this efficiently? Do I have to pre-generate this upfront in the bundle generation?

5 replies

anderseknert Feb 27, 2023
Maintainer

No, I don't believe we cache the results across queries today, although that could potentially be an improvement to consider for future optimizations. @srenatus may correct me here if I'm wrong. If you have static data (i.e. not depending on input) that you want to transform, you could do that as part of the bundle creation process, and include the transformed structure in the bundle to avoid having it recomputed. I have added an "indexing" step to the build process for some bundles in the past (using just opa eval --format json) and stored the output as part of a bundle.

msvechla Feb 27, 2023
Author

Thanks for the response. Using opa eval --format json looks indeed like a feasible workaround. However it seems like an unnecessary extra manual step that could be easily implemented / cached directly in OPA. I think this would be a feature that would benefit lots of users.

anderseknert Feb 27, 2023
Maintainer

Yeah, tbh I'm not 100% sure if / how the indexer works across queries, so I'm hoping @srenatus can bring some enlightenment, but I agree that it could be useful to (perhaps optionally) persist the result of "static" evaluations across requests.

msvechla Jul 27, 2023
Author

Hi, do you have any update on this?

For now I went with pre-generating different structures of the same data during the bundle generation process, however this seems like it could be avoided when the before mentioned caching would work, or when opa would perform this performance improving re-structuring of the data itself during bundle load.

anderseknert Jul 31, 2023
Maintainer

@srenatus is on vacation. Perhaps @ashutosh-narkar knows more.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Open Policy Agent

Comprehension Indexing / Optimization with Bundles and Rego Policies #349

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 5 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Open Policy Agent

Comprehension Indexing / Optimization with Bundles and Rego Policies #349

msvechla Feb 23, 2023

Replies: 2 comments · 5 replies

anderseknert Feb 24, 2023 Maintainer

msvechla Feb 27, 2023 Author

anderseknert Feb 27, 2023 Maintainer

msvechla Feb 27, 2023 Author

anderseknert Feb 27, 2023 Maintainer

msvechla Jul 27, 2023 Author

anderseknert Jul 31, 2023 Maintainer

msvechla
Feb 23, 2023

Replies: 2 comments 5 replies

anderseknert
Feb 24, 2023
Maintainer

msvechla
Feb 27, 2023
Author

anderseknert Feb 27, 2023
Maintainer

msvechla Feb 27, 2023
Author

anderseknert Feb 27, 2023
Maintainer

msvechla Jul 27, 2023
Author

anderseknert Jul 31, 2023
Maintainer