Investigate storing AST values in in-memory store instead of maps/slices/etc. #4147

tsandall · 2021-12-15T22:47:00Z

Currently the in-memory store uses maps and slices to store base documents. In some cases, this can result in high latency during policy evaluation if a large amount of data has to be read and evaluated (because the evaluator has to convert the interface{} types into AST values.) We should run some experiments on different data sets to measure the performance improve we'd receive by simply storing AST values inside the in-memory store. We will need to monitor the memory usage because the AST values have fixed cost top of the standard map/slice types.

schneefisch · 2021-12-16T09:37:35Z

👍

mjungsbluth · 2021-12-16T17:40:30Z

Also on top, the transaction’s Read method does a deep copy of maps at least

stale · 2022-01-15T18:56:36Z

This issue has been automatically marked as inactive because it has not had any activity in the last 30 days.

mjungsbluth · 2022-02-09T19:28:13Z

From looking at the code the evaluation code can already handle AST Values returned from storage

Instead of rewriting large parts, would it work to implement the AST Value interfaces for maps and arrays in either a copy on write or read only fashion, backed by efficient golang maps? I am not sure if the using code places actually try to write to data that is originating in storage.

My assumption is that large data blocks need to be optimized for lookup speed anyways which will cut down the expected data size after each lookup step anyways.

tsandall · 2022-02-10T16:08:26Z

@mjungsbluth Read() only does a deep copy if it's a write transaction. Policy queries are read-only so no deep copy is performed inside the in-memory store.

My thinking for this issue was that we could simply have the server convert data into AST types before inserting into the store. This would apply to the bundle plugin as well as writes on the v1/data API. This way there would be no need for interface{} to ast.Value conversion during eval.

mjungsbluth · 2022-02-10T19:53:20Z

Overlooked that deepcopy only happens on read…

Would this approach have the potential to break storage implementations? Otherwise this sounds like a fair approach to try.

@tsandall you were also concerned about memory consumption but since the conversion via the ast.InterfaceToValue method happens anyways already, this would at least make memory usage more predictable or am I wrong?

tsandall · 2022-02-11T16:08:12Z

Would this approach have the potential to break storage implementations? Otherwise this sounds like a fair approach to try.

It would--but we can workaround that in OPA by just checking the type of the store... something like:

package inmem

func Is(s storage.Store) bool {
  _, ok := s.(*store)
  return ok
}

With this in place, the server handler and bundle plugin can ask if the store they're about to transact on is the in-memory one...

@tsandall you were also concerned about memory consumption but since the conversion via the ast.InterfaceToValue method happens anyways already, this would at least make memory usage more predictable or am I wrong?

I think it'll depend on the access pattern. Today, if you load a bunch of external data into OPA it's stored as maps/slices/etc. With this change, we'd still be using maps/slices/etc. but there's the additional overhead of the ast.Value types. If the policy queries load the entirety of the store on every query, then yes, it would make the memory usage more predictable, but if that doesn't happen, then we'd be paying the price all the time.

Which begs the question, perhaps the inmem store should just convert to the interface{} values into AST values lazily... we already do this in the evaluator but the problem is that we don't have a place to store them across queries. I'm not sure how much work that would be--and it would only be valuable if the steady-state memory usage was significantly higher.

😅 this is why I wrote "investigate storing ..." because it requires some investigation/experimentation to determine the best path forward. Hope this helps!

tsandall · 2022-02-19T15:51:47Z

@srenatus @anderseknert I'm adding this into the backlog because I think it's worth at least exploring/experimenting with to see how much of a win it could be... it's not immediately obvious how many repercussions there could be on different parts of OPA--anything reading out of the store would be impacted if the inmem implementation just started returning AST values all of a sudden. That said, I wonder if we could introduce this gradually so that store users could opt-in to receiving AST values back from the store. The store could then either return the AST or convert it into interface{} for the caller. This way we wouldn't break anything, but topdown would still benefit from the speedup. Thoughts?

szuecs · 2024-01-24T16:41:56Z

@tsandall if you want to see what a bored data plane with a single core does on bundle activation #6533 (comment). The 10m schedule time was by mistake, because control plane created a json with an updated timestamp every 10m. The flame graph shows clearly the DeepCopy as bottleneck of inmemory storage.
A list of structs would be much faster and if you use sync.Pool you might be able to speed up allocation even more.

A new optimized read mode has been added to the default in-memory store, where data written to the store is eagerly converted to AST values (the data format used during evaluation). This pre-converted data is faster to read, and won’t cause memory spikes during load; but comes with slower data writes (affects startup and bundle load/update time) and a larger lowest overall memory footprint for OPA. Can be enabled for `opa run`, `opa eval`, and `opa bench` by setting the `—optimize-store-for-read-speed`. See http://localhost:8888/docs/edge/policy-performance/#storage-optimization. Implements: #4147 Signed-off-by: Johan Fylling <johan.dev@fylling.se> Co-authored-by: Ashutosh Narkar <anarkar4387@gmail.com>

ashutosh-narkar · 2024-10-30T22:50:04Z

@johanfylling can this be closed?

szuecs · 2024-11-02T12:10:04Z

Looking flamegraphs the answer is no.

anderseknert · 2024-11-04T00:05:06Z

@szuecs did you intend to attach those graphs? 🙂

johanfylling · 2024-11-04T10:10:43Z

Yes, I think this can be closed.

Unless those flamegraphs @szuecs is referring to shows that there is some unintentional case where AST values aren't written into the store despite of configured flags ..
It should be noted that this feature doesn't guarantee better performance across all use cases; among other things, it depends on how much data is written during bundle updates, and to what size and quantity each unique piece of data is read per eval.

szuecs · 2024-11-04T16:29:19Z

You can see the flamegraph in #6533
I closed the other because the regular update was not necessary and fixed on our side. However the issue likely remains or did you fix it?

Ah I see you created a cache like thing, so I think you are completely right to close it.

Thanks for the work!

anderseknert · 2024-11-04T16:35:21Z

Yes, the fix is in https://github.com/open-policy-agent/opa/releases/tag/v0.70.0

tsandall added the optimization label Dec 15, 2021

stale bot added the inactive label Jan 15, 2022

stale bot removed the inactive label Feb 9, 2022

ashutosh-narkar added the requires-investigation Issues not under active investigation but requires one label Mar 31, 2023

ashutosh-narkar mentioned this issue May 28, 2024

OPA out of memory #6753

Open

ashutosh-narkar added this to Open Policy Agent Aug 5, 2024

ashutosh-narkar moved this to In Progress in Open Policy Agent Aug 5, 2024

johanfylling mentioned this issue Oct 18, 2024

Storage/eager ast #7125

Merged

anderseknert closed this as completed Nov 4, 2024

github-project-automation bot moved this from In Progress to Done in Open Policy Agent Nov 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate storing AST values in in-memory store instead of maps/slices/etc. #4147

Investigate storing AST values in in-memory store instead of maps/slices/etc. #4147

tsandall commented Dec 15, 2021

schneefisch commented Dec 16, 2021

mjungsbluth commented Dec 16, 2021

stale bot commented Jan 15, 2022

mjungsbluth commented Feb 9, 2022

tsandall commented Feb 10, 2022

mjungsbluth commented Feb 10, 2022 •

edited

Loading

tsandall commented Feb 11, 2022 •

edited

Loading

tsandall commented Feb 19, 2022

szuecs commented Jan 24, 2024 •

edited

Loading

ashutosh-narkar commented Oct 30, 2024

szuecs commented Nov 2, 2024

anderseknert commented Nov 4, 2024

johanfylling commented Nov 4, 2024

szuecs commented Nov 4, 2024 •

edited

Loading

anderseknert commented Nov 4, 2024

Investigate storing AST values in in-memory store instead of maps/slices/etc. #4147

Investigate storing AST values in in-memory store instead of maps/slices/etc. #4147

Comments

tsandall commented Dec 15, 2021

schneefisch commented Dec 16, 2021

mjungsbluth commented Dec 16, 2021

stale bot commented Jan 15, 2022

mjungsbluth commented Feb 9, 2022

tsandall commented Feb 10, 2022

mjungsbluth commented Feb 10, 2022 • edited Loading

tsandall commented Feb 11, 2022 • edited Loading

tsandall commented Feb 19, 2022

szuecs commented Jan 24, 2024 • edited Loading

ashutosh-narkar commented Oct 30, 2024

szuecs commented Nov 2, 2024

anderseknert commented Nov 4, 2024

johanfylling commented Nov 4, 2024

szuecs commented Nov 4, 2024 • edited Loading

anderseknert commented Nov 4, 2024

mjungsbluth commented Feb 10, 2022 •

edited

Loading

tsandall commented Feb 11, 2022 •

edited

Loading

szuecs commented Jan 24, 2024 •

edited

Loading

szuecs commented Nov 4, 2024 •

edited

Loading