Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow http.send cache deserialization #3599

Closed
lukyer opened this issue Jun 30, 2021 · 10 comments · Fixed by #3649
Closed

Slow http.send cache deserialization #3599

lukyer opened this issue Jun 30, 2021 · 10 comments · Fixed by #3649
Assignees

Comments

@lukyer
Copy link
Contributor

lukyer commented Jun 30, 2021

Expected Behavior

http.send application/json body cache lookup is as fast as stored as text/plain

Actual Behavior

application/json deserialization makes cache lookup much slower

Steps to Reproduce the Problem

OPA version 0.29.4, macos, run as standalone server

package app.rbac

slow {
    # Load and cache 370K json file
    http.send({"method": "get", "url": "http://127.0.0.1:8000/dataexport.json", "force_cache": true, "force_cache_duration_seconds": 60000})
}

fast {
    # Load and cache 370K text file
    http.send({"method": "get", "url": "http://127.0.0.1:8000/dataexport.txt", "force_cache": true, "force_cache_duration_seconds": 60000})
}
ab -n 1000 -c 10 -p input.json 127.0.0.1:8181/v1/data/app/rbac/slow

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.2      0       5
Processing:    40   83  25.1     78     195
Waiting:       39   83  25.0     78     195
Total:         40   83  25.1     78     195



ab -n 1000 -c 10 -p input.json 127.0.0.1:8181/v1/data/app/rbac/fast

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.5      0       7
Processing:     6   12   5.2     11      37
Waiting:        5   12   5.2     11      36
Total:          6   13   5.2     11      37

Additional Info

Hi all,
I am using http.send with caching. Caching works fine but the problem is that I'm loading bigger ~300 kB JSON file and it looks like the cache is storing serialized version (raw_body) instead of AST (body). Thus every cache lookup causes JSON deserialization which is way too slow for my usecase (~80ms instead of ~10ms when served as text/plain).

Is it possible to store in cache deserialized data instead of serialized one?

@ashutosh-narkar I think this json.Unmarshall might be causing those lags:
ashutosh-narkar@445d79f#diff-4a137f284e9009d157ac4553842cb1c6e9260173fa29d3ed0d779f79e5b7a316R710

Or am I doing something completely wrong? Thanks for any ideas!

@anderseknert
Copy link
Member

Hi @lukyer!

Serializing cached items was a feature added quite recently. The reason for this is that storing the raw structs in memory has an overhead of something like ~20x compared to the size of the JSON serialized data. Each ~300KB item in your example would hence come with a memory overhead of about ~6MB. As you probably can imagine, this quickly adds up with an increasing number of distinct items, and we had many reports complaining about this.

Possible improvements to help in your case could be to look into faster serialization, as suggested in #3254 , or perhaps allow for an option to decide whether to store cache entries serialized or not.

@lukyer
Copy link
Contributor Author

lukyer commented Jun 30, 2021

@anderseknert thanks for quick reply and confirmation of this issue. I think it would be best to add http.send option to store serialized/deserialized data. Is it feasible? If so, how can I help with pushing it forward?

@tsandall
Copy link
Member

@lukyer can you provide a bit more detail on your use case? E.g., how many distinct http.send calls do your policies make? Are you fetching external data per-user or per-resource? Or is there just some global data that is needed by the policy that happens to be fetched via http.send?

@lukyer
Copy link
Contributor Author

lukyer commented Jun 30, 2021

@tsandall the main idea is to load RBAC user-role-permission data in dynamic way (not just few static mappings). So basically the idea is to have memoized http.send call for endpoint providing this JSON for some time, e.g. 60s. I know there is a way how to do it in opposite direction (push policy data through API to all opa-agents) but it seems like overkill for us (discovery, figure out what data each opa-agent will need, ..) when simple http.send cache could solve it too.

@tsandall
Copy link
Member

@lukyer okay, that's helpful.

re: "the main idea is to load RBAC user-role-permission data in dynamic way (not just few static mappings)":

Is this data global or per user? It sounds like these are the role definitions but I just want to check. For example, if you have two roles (reader and admin) and your app has permissions "write.widgets", "read.widgets", "manage.users" then the data would look something like:

{
   "admin": ["write.widgets", "read.widgets", "manage.users"],
   "reader": ["read.widgets"]
}

Importantly, there's nothing user-specific in here so your policies are just making one distinct http.send calll.

@lukyer
Copy link
Contributor Author

lukyer commented Jun 30, 2021

@tsandall
It is not exactly per user, and also not one call too (it might get too big to store in memory). It is per something like group of users, so we can keep it in reasonable limits. Would be nice to decide by myself whether I prefer better latency or memory consumption. I understand it is not something people want to reason about but for some it would be really helpful to have an http.send configuration for that. In general RAM is not the biggest problem for me - we can spin more instances.

@tsandall
Copy link
Member

tsandall commented Jul 2, 2021

@ashutosh-narkar this would be a good one for you to pickup.

@tsandall
Copy link
Member

tsandall commented Jul 2, 2021

@lukyer that's fine. Thanks for explaining.

@ashutosh-narkar if you don't have time to work on this, you could spec out how the solution would be implemented and perhaps someone else could pick it up.

@ashutosh-narkar
Copy link
Member

We can add a new request parameter to http.send which controls whether data should be serialized before inserting in the inter-query cache. By default, we would serialize the data which is the current behavior. If the user wants does not want the cache data to be serialized (for ex. serializeCacheData: false), we could simply take the result returned by this function and insert it in the cache thereby avoiding the whole serializing/deserializing of the data. We need to be careful how we calculate the size of the response as the deserialized JSON body could end up consuming as much as 20x more memory compared
to the JSON encoded version of the same data. We could say that for serializeCacheData: false, we ignore the user-defined cache size limit as tracking memory consumption is not the primary goal in this mode.

@tsandall
Copy link
Member

tsandall commented Jul 7, 2021

Moving this issue to planned. We should be able to get this into the next release (v0.31.0).

ashutosh-narkar added a commit to ashutosh-narkar/opa that referenced this issue Jul 15, 2021
This commit adds a new parameter to http.send to
control how items are added to the inter-query cache.
Currently two modes are supported which allow users
to decide if they prefer cache memory conservation or low
latency during cache lookups.

Fixes open-policy-agent#3599

Signed-off-by: Ashutosh Narkar <anarkar4387@gmail.com>
ashutosh-narkar added a commit that referenced this issue Jul 15, 2021
This commit adds a new parameter to http.send to
control how items are added to the inter-query cache.
Currently two modes are supported which allow users
to decide if they prefer cache memory conservation or low
latency during cache lookups.

Fixes #3599

Signed-off-by: Ashutosh Narkar <anarkar4387@gmail.com>
dolevf pushed a commit to dolevf/opa that referenced this issue Nov 4, 2021
This commit adds a new parameter to http.send to
control how items are added to the inter-query cache.
Currently two modes are supported which allow users
to decide if they prefer cache memory conservation or low
latency during cache lookups.

Fixes open-policy-agent#3599

Signed-off-by: Ashutosh Narkar <anarkar4387@gmail.com>
Signed-off-by: Dolev Farhi <farhi.dolev@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants