Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: TTL cache #1751

Closed
Solverj opened this issue Sep 12, 2019 · 6 comments
Closed

Feature request: TTL cache #1751

Solverj opened this issue Sep 12, 2019 · 6 comments
Assignees

Comments

@Solverj
Copy link

Solverj commented Sep 12, 2019

Wanted feature

TTL cache for OPA(The only thing I see as missing to make OPA fully conform to ABAC(PIP)).

I was thinking in the line of:

  1. Policy polices a request
  2. Attributes needed on jwt content is checked in cache first, if cache has attribute retrieve it and goto Add failing test and source code #4, else goto Add failing test #3
  3. http.send and cache response
  4. Police request
  5. End

A possible implementation could be to use DATA(and non-updatable by put calls) as the Cache with a mere Json structured map, where a thread in the background is handling TTL's and deletion.

And ofcourse the above is only enabled through, e.g., opa run -s --cache-enabled --cache-ttl=3600(in seconds) or something like that.

Use-case

A rest-service handles a lot of customers, each customer is divided into sub-users stationed in various locations. Each of these sub-divisioned client needs fine-grained access-control due to different payment plans for different data from the rest-service. All the scalable data(attributes) at a point is several GBs in size and can't be stored in memory, so OPA uses instead a TTL cache for attributes needed aligned with each specific JWT attribute needed. If the cache doesn't have the attributes needed, OPA issues a http.send to the external resource and stores the response in cache aligned with the JWT attribute used. Now OPA fully supports the ABAC paradigm.

@tsandall
Copy link
Member

We've been talking about improving http.send to cache across queries. I just realized we did not have an issue filed to track that. This issues is related to it.

Ref #1753

@krotscheck
Copy link
Contributor

A syntax I'm thinking of implementing in a plugin; open to comments.

default myval = 'foo'

myval = v {
    cache.has('key')
    v = cache.get('key')
}

myval = v {
    not cache.has('key')
    // do expensive things.
    cache.put('key', value, ttl)
    v = cache.get('key')
}

@tsandall
Copy link
Member

hey @krotscheck, here are some thoughts.

Implementing this kind of caching as a set of built-in function (cache.has, cache.put, etc.) would be problematic because it means that statements in the rule body have side-effects and therefore order-of-execution is important.

If // do something expensive is a call-out to an external system via http.send or some other custom built-in function, the caching could be implemented inside the built-in function. For example:

myval = v {
  response := http.send({
     "method": "get",
     "https://example.com",
     "ttl": "5m"})
  v := response.body.some_value
}

If a more general-purpose caching mechanism is required (e.g., perhaps there's some compute-intensive operation happening on the result of http.send) then we could explore a more general-purpose caching mechanism...

cache myval = 5*60  # "cache the value of myval for no more than 5 minutes"

myval = v {
   # do expensive thing to get 'v'
}

Do you think the first approach would work for your use cases?

@krotscheck
Copy link
Contributor

I think that'd work - as long as the cache can (also?/optionally?) be somehow keyed to the input. For instance, if I use a JWT as an input, it's unlikely that the policy response - no matter how expensive its calculation is - will change for the lifetime of the JWT. Any suggestions on that?

@tsandall
Copy link
Member

I think that'd work - as long as the cache can (also?/optionally?) be somehow keyed to the input.

Yes, this would be up to the built-in function. We actually have some built-in functions whose outputs are cached for the duration of the top-level policy query (e.g., time.now_ns() and http.send work this way). We do this to ensure that calls deterministic (return the same output given the same input.) E.g., you would not want to have multiple time.now_ns() calls return different times if invoked multiple times inside the policy. For time.now_ns() it's trivial because there are no parameters. For http.send we just use ALL of the input parameters as the cache "key".

If the // expensive thing is implemented as some custom built-in function, it's up to that built-in function's implementation to do the right thing. Today we don't have a framework for caching across policy queries but it's something we're interested in (#1753).

@tsandall
Copy link
Member

tsandall commented Aug 6, 2020

Now that http.send supports caching across queries (#1753) I think we can close this. Can revisit other caching options in the future if needed.

@tsandall tsandall closed this as completed Aug 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants