Comprehension Indexing / Optimization with Bundles and Rego Policies #349
Replies: 2 comments 5 replies
-
That's an interesting observation! Thanks for sharing 👍 Certainly is a huge bump in |
Beta Was this translation helpful? Give feedback.
-
Thanks a lot @anderseknert , can you already share some insights? Also: Is my assumption correct that EDIT: Looking at this issue it seems like opa will not cache |
Beta Was this translation helpful? Give feedback.
-
Hi everyone,
I'm still a bit new to OPA, but already getting into the depths of performance optimization.
As I noticed a big difference in rego query evaluation time when comparing test and opa server runtime performance, I did some profiling and discovered a function with a bottleneck.
The problem is actually similar to the issue outlined in the styra blogpost here (slow iteration over lots of elements vs. lookup of object keys): https://www.styra.com/blog/how-to-shape-opa-data-for-policy-performance/
A working example (with a reduced dataset) can be found below.
The original dataset (
role_permission_mappings
below) contains a mapping ofrole_urns
to a list ofpermissions
.The method with the bottleneck originally was
role_urns_for_permission
, which returned a set of allrole_urns
that contain a givenpermission
(so basically the inverse of the original dataset).The code below already contains the optimized comprehensions
map_permission_to_role_urns_bundle
andmap_permission_to_role_urns_inline
which creates a more efficient data structure for returning all role urns for a given permission.Actually this map should only be computed once, and the query
role_urns_for_permission
should then be a simple key lookup.When I profile this rego policy, it has good performance using the data defined inline in the policy.rego file (
role_permission_mappings
above):When I however profile the same query against the data stored in the bundle, the performance is a lot worse:
You can see the
MEAN
difference of e.g.4.58778ms
viainline
data and32.434213ms
viabundle
data.The bundle contains exactly the same data as inlined in the
.rego
file (seerole_permission_mappings
above, just lots more data).map_permission_to_role_urns_bundle
not be cached or indexed, as it only has to be computed once?One possible reason I could think of, would be that OPA tries to do some optimization with the rego policies and data upfront and this can only be done when rego policies and the data are combined (instead of split in different bundles for example). If this is the case, what could be done to fix this?
We split the policy rules and the static data into separate bundles, because they are distributed by different services during runtime. Is this the reason why we are seeing bad query evaluation performance during runtime with opa server?
I would expect OPA to still do these optimizations after all bundles have been loaded.
If you can share any insights here it would be greatly appreciated. I'm still learning.
Thanks for your help already!
Beta Was this translation helpful? Give feedback.
All reactions