-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow for adding objects to storage without round-tripping through JSON #4708
Comments
The benchmark I'm using: https://gist.github.com/willbeason/9c8e2a570baeeee946f9f9c7fd26359e |
For callers who have objects with a DeepCopy-like method defined, the savings are ~90% instead of ~99% if the caller is not able to guarantee the object will remain unmutated and so need to deep copy the object before calling
(This benchmark is for a function that adds a |
Thinking further a "good enough" solution for us would be to do the round-tripping validation without needing to acquire the write lock on the storage. That would enable both the validation and reduce the time that the lock needs to be held by 99%. The implementation for that might be more complex, though. |
Add option to inmem.store which allows disabling the round-tripping through JSON when adding data to the store. This option is intended for callers who can guarantee the objects they pass to Write are JSON objects, and have properly ensured the object will be only be accessed by store once added. See open-policy-agent#4708 Signed-off-by: Will Beason <willbeason@google.com>
IIRC we're doing the roundtrip to support feeding arbitrary structs into the storage and having them converted to native golang types (so, a Another approach to address the problem would be to make switch x.(type) {
case string, int, float: // ... more native types
// nothing
case map[string]interface{}, []interface{}:
// recurse for keys
default: // must be struct
// do the json roundtrip
} If what comes in is a native type, I don't think (but we should verify) that the overhead would be as bad as what we have now. I suppose that way, we'd improve the situations also for those folks unable to disable the roundtrip on write (#4709), namely OPA-as-server users. (Update: Uhr realized that this point is moot, the server runtime could disable the roundtrip, too, as there's no need for it.) |
This ticket is also interesting on the topic of JSON serialization/deserialization performance. Although obviously, nothing beats skipping it in the first place :) Do you have any experience working with that library, @willbeason? Would be interesting to see what difference it makes to the numbers you presented! |
The other side effect you get of round-tripping through JSON is an implicit deepcopy of the entire struct. So if we only round-tripped subfields which were not native JSON types, we would no longer be deepcopying objects put into OPA storage. (The rest of this exclusively deals with JSON-native types) My guess is what srenatus@ proposes would have similar performance gains to the DeepCopy. Whether we actually copy the data into a new object likely has little CPU performance impact (compared to not doing it but still traversing the tree) - but of course you'd just be holding on to more memory after. In either case, it's much faster/cheaper to do this than passing everything through JSON. |
From looking at open PRs and Issues - unfortunately json-iterator doesn't look like it is actively maintained. |
Confirmed it is ~10x faster to do what @srenatus suggests:
|
With the JSONify deepcopy, it is only 4x faster. It is possible that won't fully resolve our issue so I'll send a follow-up to reduce the time the write-lock is held. |
Thanks @willbeason, I had not checked that library since the ticket was created. Not sure to what extent it matters, unless there's some known issue that would e.g. be different from what we have today. I mean, the JSON library in the stdlib is probably not seeing a ton of updates either. Definitely agree that it would be nicer to see it actively maintained though. |
This issue has been automatically marked as inactive because it has not had any activity in the last 30 days. |
It didn't make it for 0.43.0, but I haven't forgotten about it. Let's aim for 0.44.0. |
Thanks! |
Add option to inmem.store which allows disabling the round-tripping through JSON when adding data to the store. This option is intended for callers who can guarantee the objects they pass to Write are JSON objects, and have properly ensured the object will be only be accessed by store once added. Fixes open-policy-agent#4708. This is continuance of open-policy-agent#4709, adding these bits: * storage/inmem: backwards-compat nitpicks, test adaptations I might have overshot here, but adding variable-length function parameters is not a backwards-compatible move. Concretely, if you had been using code like var x func() storage.Store = inmem.New going from New() to New(...Opts) would break it. * storage/inmem: use it where possible without roundtrip * storage/inmem: deal with nil map It looks like this is something the roundtrip had guarded us from. Now, we'll explicitly check this. This came up when running the bundle tests with roundtripping disabled. * loader: add StoreWithOpts convenience method Co-authored-by: Will Beason <willbeason@google.com> Co-authored-by: Philip Conrad <conradp@chariot-chaser.net> Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>
Add option to inmem.store which allows disabling the round-tripping through JSON when adding data to the store. This option is intended for callers who can guarantee the objects they pass to Write are JSON objects, and have properly ensured the object will be only be accessed by store once added. Fixes #4708. This is continuance of #4709, adding these bits: * storage/inmem: backwards-compat nitpicks, test adaptations I might have overshot here, but adding variable-length function parameters is not a backwards-compatible move. Concretely, if you had been using code like var x func() storage.Store = inmem.New going from New() to New(...Opts) would break it. * storage/inmem: use it where possible without roundtrip * storage/inmem: deal with nil map It looks like this is something the roundtrip had guarded us from. Now, we'll explicitly check this. This came up when running the bundle tests with roundtripping disabled. * loader: add StoreWithOpts convenience method Co-authored-by: Will Beason <willbeason@google.com> Co-authored-by: Philip Conrad <conradp@chariot-chaser.net> Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>
@maxsmythe If no unexpected regressions come up, this is expected to come out with 0.44.0. |
Thanks! |
What part of OPA would you like to see improved?
Motivating issue: open-policy-agent/gatekeeper#2060
When adding data to the in-memory OPA storage object,
inmem.store.Write
round-trips the incoming object through JSON to ensure the object may be validly added to OPA storage. Adding an object which cannot be serialized to JSON could cause all sorts of misbehaviors when running OPA policies.From
inmem.store.Write
:Unfortunately, this round-tripping is very slow compared to the rest of what needs to be done to store/update the new data.
Benchmark with round-tripping:
Benchmark without round-tripping (with the check commented out):
In the benchmark >99% of the time spend adding new objects is in round-tripping through JSON.
Describe the ideal solution
Ideally, there would be some way for callers to add data to
inmem.store
without needing to do this round-tripping. Callers may be able to do this validation themselves more cheaply, or are operating in an environment where all objects by definition are serializable to JSON (e.g. the incoming request was JSON, and that is the object to be put into OPA storage) and so this validation is unnecessary.This option could take many forms:
inmem.store
at creation time to not round-trip objects through JSONWrite()
to skip this checkWriteNoRoundTrip()
which is the same asWrite()
but does not include the checkAdditional Context
Per the motivating bug - the issue which (at least) one user of Gatekeeper is having is that the speed of these writes means that it takes a long time for queries to run since it takes too long to acquire a read lock for the Storage object (which is required for Query to run).
The text was updated successfully, but these errors were encountered: