-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support faster loading of dependency-free bundles #6166
Comments
+1 on this and the direct link to OPA Dependency Manager (ODM) (taken from #3371 mentioned above) |
Depending on where all components are located, compression may also have an impact on speed... |
Just to add some context from a real world customer of OPA. I was the original customer that raised this as an issue. Our use case is we want to use a bundle per tenant (customer) as each customer has a discrete independent REGO policy + data lists which they change infrequently. We have 5000+ customers. Doing further analysis looking at the performance metrics OPA provide we could see as each bundle is loaded the compile time was increasing. As I say, our bundles are per tenant (customer) and are completely independent of each other - they have no inter-dependencies between them so any collision detection or dependency checking across bundles which may be cause of the slowdown is unnecessary in our use case - I wonder if we could have a flag to disable such functionality if that is the root cause. At the moment this is a blocker for use. At present we are having to load all customers into a single large bundle and rebuild that bundle every time a single customer makes a policy change. |
Hi @deezkay and @hpvd! And thanks for raising this 👍 It's an interesting use case for sure, altough not really one that OPA is currently built to deal with — at least not in the scope of a single instance. Setting the performance issue aside, there aren't any guarantees, or even attempts made, to isolate policy or data between "tenants", as OPA never considered the bundle model for the purpose of multi-tenancy. You could easily have one tenant (i.e. Not to mention — would you actually have OPA poll 5000 different remote endpoints for bundle updates? 😅 You'd face some challenges we haven't really accounted for. So while I think the use case is valid, I don't think even solving the performance issue reported here would get us anywhere near a scenario where I'd be comfortable having a single (or a few single) instance(s) try and meet these requirements. Given that all tenants run independently from each other anyway, what do you see as the benefit of having just a single OPA serve all of them? A single OPA per tenant would be ideal for the purpose of isolation, obviously, but even some scheme for partitioning, say 10 tenants per OPA, would go a long way to help solve the problems outlined here. OPA was ultimately built to be a distributed component, and it's not uncommon that organizations runs hundreds or thousands of instances inside of their clusters. |
@deezkay thanks for details and sharing state of investigation! I was looking at the same problem but from another perspective: |
Not really, as you'd probably have OPAs running all over the place.
...and so on. The distributed model comes with its own set of challenges, that's for sure — but so does the "one large instance" model, and OPA has ultimately been built primarily to solve the challenges of the former. That doesn't mean it can't be used in other configurations, but as I've tried to elaborate on, it's likely going to come with a whole lot of challenges, many of which we haven't even thought about. |
jep sure you can split it. The reason why we are thinking about the all in one thing is "continuous audit readiness". |
Thanks @anderseknert and @hpvd for your input and feedback. I totally agree our use case isn’t typically what OPA may be used for – but we have been impressed by OPA as a pure policy engine and how easy it is to implement new rules to support our application use case. To provide some more context, we are using OPA as a policy engine to evaluate our custom application policy and return an action to be taken. Regardless of the fact we do have 5000+ customers, and yes we are fully aware we may need to shard the policy across numerous OPA clusters to get the performance we need, the fundamental question we are trying to answer is why “ The current compiler implementation re-runs all stages on all modules each time the compiler is invoked”. |
In highly-scaled, multi-tenancy deployments of OPA, it's possible that operators might be loading many bundles into OPA containing policy from a large number of end users (e.g. O(5000)).
This poses challenges to the performance of OPA today as we compiles new modules from bundles alongside existing modules in the store. This is in order to check for: function references and path collisions (perhaps other factors). (Related: #3841). In theory some of these checked could be skipped if we knew the bundle was self contained.
We can see this code here:
opa/bundle/store.go
Lines 765 to 773 in f05ebba
Aside, this also applies to the REST API:
opa/server/server.go
Lines 2136 to 2142 in abb6cf2
I have done some experiments and found that 500 modules is around 50x slower and 1000 bundles is around 130x slower than a single bundle baseline measurement.
One solution to this might be to allow the labelling of bundles such that they don't need other modules before compilation and updating of the store. Another solution might be to have this as the default behaviour for a bundle using dependency management but without any dependencies (#3371) - when this functionality were to be available.
In the meantime, users could consider sharding OPA instances or using larger, aggregated bundles.
The text was updated successfully, but these errors were encountered: