Entity Slicing using Data Tries #74

oflatt · 2024-07-29T18:06:12Z

Signed-off-by: oflatt <oflatt@gmail.com>

oflatt · 2024-07-30T19:57:01Z

Code is now available: cedar-policy/cedar#1102

Signed-off-by: oflatt <oflatt@gmail.com>

cdisselkoen

Overall looks good, and I think this is a good idea. Some comments, particularly about the subset of Cedar that this RFC works with.

text/0073-entity-slicing.md

Co-authored-by: Craig Disselkoen <craigdissel@gmail.com> Signed-off-by: oflatt <oflatt@gmail.com>

Signed-off-by: oflatt <oflatt@gmail.com>

jeffsec-aws · 2024-08-02T15:33:32Z

What would it take to answer in Cedar schema?
Can we transform the following answers:

Request type: (User, Read, Document)
Data required:
resource.readers
resource.metadata.owner
principal.ancestors (where ancestors are the ancestors in the entity hierarchy)

On the other hand, if the User would like to Edit a Document, we only need one piece of data:

Request type: (User, Edit, Document)
Data required:
resource.metadata.owner

in the following proposal:

 Request type: (User, Read, Document)
 Data required:
    [
        {
             "resource": "entity Document = { readers: Set<User>, }"
        },
        {
             "resource": "entity Document = { metadata: Metadata, },
              "entities": [
                   "entity Metadata = { owner: User }",
                   "entity User = { }"
              ]
        },
        {
             "principal": "entity User in [UserGroup]",
              "entities": [
                   "entity UserGroup= {  }",
              ]
        },
    ]

 Request type: (User, Edit, Document)
    [
        {
             "resource": "entity Document = { metadata: Metadata, },
              "entities": [
                   "entity Metadata = { owner: User }",
                   "entity User = { }"
              ]
        }
    ]

Response should be able to handle sub element of type: principal, action, entities, and context

jeffsec-aws · 2024-08-02T15:36:58Z

Also remphasing that:

entity User in [User];

for :

// allow read if the user is a global admin
permit(
  principal in User::"GlobalAdmin",
  action in [Action::"Read"],
  resource
);

is a cumbersome example. one would prefer using UserGroup for a notion such as GlobalAdmin in the form of:

entity User in [UserGroup];

with policy shaped as:

// allow read if the user is a global admin
permit(
  principal in UserGroup::"GlobalAdmin",
  action in [Action::"Read"],
  resource
);

jeffsec-aws · 2024-08-02T15:40:59Z

I want to see an example with a when clause having multiple conditions connected with OR or AND operand, i.e.:

// allow documents to be read by anyone in the readers set
permit(
  principal,
  action == Action::"Read",
  resource
)
when
{
  resource.readers.contains(principal) &&
  resource.clearance < 7
};

or

// allow documents to be read by anyone in the readers set
permit(
  principal,
  action == Action::"Read",
  resource
)
when
{
  principal.title == "Security" ||
  principal.clearance >= resource.clearance
};

jeffsec-aws · 2024-08-02T15:42:05Z

I want to see an example with an unless clause

D-McAdams · 2024-08-06T00:26:39Z

As currently proposed, high-scale systems would struggle to use this approach. It has a couple of drawbacks. The first is the necessity for the cache of manifests, which is difficult to maintain at scale when policies change frequently and the volume of policies could potentially be huge. The second is that the manifests are course grained, such as requiring the loading all ancestors of an entity, which can be impractical when the ancestor set is unbounded. Third, it requires extra work on the client-side to piece together the instructions in the manifest with the actual principal/resource in the request in order to generate a concrete plan for data retrieval.

For the situations that I'm personally most familiar with, I'd prefer a different approach. I'd rather give Cedar a policy slice, and the parameters of specific request (principal, action, resource), and Cedar emits a document that tells exactly which data to load for the request.

In addition, this document would be in a well-defined interoperable representation like JSON that we can use as a standard interface to a Policy Information Point (PIP). We give this document to the PIP, and it returns JSON-formatted entity data.

An example manifest may look like this:

  {
    "EntityType": "User",
    "EntityId": "12345",
    "AttributesToLoad": ["name","email","manager"]
    "MembershipsToCheck": [
      {"EntityType": "UserGroup",
       "EntityId": "987654"},
      {"EntityType": "UserGroup",
       "EntityId": "987654"}
    ]
  },
  {
    "EntityType": "File",
    "EntityId": "image.jpg",
    "AttributesToLoad": ["size","type","lastUpdated"]
    "MembershipsToCheck": [
      {"EntityType": "Folder",
       "EntityId": "2902982"},
      {"EntityType": "Folder",
       "EntityId": "6297822"}
    ]
  }
]

If policies have multiple levels of references, like principal.manager.manager.location, I'd want to iterate multiple times, having Cedar tell me everything to load for Level 1, and then everything for Level 2, and so on, until complete.

To tie it all together, I would definitely use an API that returned an enum of either (1) the concrete authorization result, or (2) the necessary data to arrive at a concrete result. I would loop on this API and gather data until a concrete result is achieved, or the max level is exceeded.

jeffsec-aws · 2024-08-06T11:42:23Z

That is a better proposition of what I tried to propose with #74 (comment) @D-McAdams . Going a Cedar formatted answer is the way. Now:

I think we are still missing which part falls where. With your proposal the example is clear of what is the Principal and what is the Resource. But it might be less implicit in plenty of other cases, therefore we need to have hints onto that.
On the Iteration, I am not sure Iteration is a good approach. Slice is extrapolated from one policy that could apply. In this case:

permit(
  principal is User,
  action == Action::"ReadFile",
  resource is File
) when {
  principal.manager.manager.location == "Minas Tirith"
};

Therefore the slice answer should ask to resolve the whole equation. Having a first response only solving the Level 1 of condition, will make the next request for a slice more complex to evaluate. And in any cases, the isAuthorized() request will have to be perform in one call to match this policy.

The return slice might not have an entity id to specify and the format needs to encompass for that.

khieta

Nice write up! I think the overall idea makes sense, although I do wonder whether this should be in the Cedar library vs. in a separate tool. Will take a look at the implementation PR next.

text/0074/0074-entity-slicing.md

khieta · 2024-08-06T17:16:35Z

text/0074/0074-entity-slicing.md

+
+As written, entity manifests in this RFC do not support loading only parts of a Cedar Set, or only some of the ancestors in the entity hierarchy. This is because sets are loaded on the leaves of the access trie, with no way to specify which elements are requested.
+
+To support this feature, we recommend that we take a constraint-based approach. Constraints would be attached to nodes in the access trie, specifying which elements of sets or the parent hierarchy are needed. Constraints form a small query language, so this would increase the complexity of the entity manifest.


RFC 76 provides some additional ideas about how to support partial loading of ancestors. Do you still think the constraint-based approach is the right one?

Good question. I'm not a big fan of the @ancestors annotation proposed- I think a more automatic solution like constraint annotations would be better.
As for partial evaluation, it would still make entity manifests more precise.

khieta · 2024-08-06T17:18:31Z

text/0074/0074-entity-slicing.md

+Action::"CreateDocument"
+[
+ context.is_authenticated
+]


Since we expect entity manifests to be written and consumed by machines (right?), it might make more sense to use a structured JSON format. Our experience with writing parsers for human-readable formats that support comments has shown that they are easy to mess up 😅

I've implemented both a JSON format and a human-readable format. The human-readable format re-uses the parser for cedar expression so it's not super error prone.

Personally, I've found the human-readable format useful for understanding what's going on. For example, a human can diff a new entity manifest with an old one to understand how a new policy changes the manifest.

D-McAdams · 2024-08-06T18:29:24Z

That is a better proposition of what I tried to propose with #74 (comment) @D-McAdams . Going a Cedar formatted answer is the way. Now:

I think we are still missing which part falls where. With your proposal the example is clear of what is the Principal and what is the Resource. But it might be less implicit in plenty of other cases, therefore we need to have hints onto that.

On the Iteration, I am not sure Iteration is a good approach. Slice is extrapolated from one policy that could apply. In this case:
permit(
  principal is User,
  action == Action::"ReadFile",
  resource is File
) when {
  `principal.manager.manager.location` == "Minas Tirith"
};
Therefore the slice answer should ask to resolve the whole equation. Having a first response only solving the Level 1 of condition, will make the next request for a slice more complex to evaluate. And in any cases, the isAuthorized() request will have to be perform in one call to match this policy.

The return slice might not have an entity id to specify and the format needs to encompass for that.

Couple reasons I think the simple approach is preferable and iteration is OK.

One is that no matter what we design here, any real-world data gatherer will be looping and iterating. For example, given a policy that refers to principal.manager.manager.location, the machinery loading the data from a database has to ping back and forth with the database 3 times. First to look up the UID value for principal.manager. Then, to look up the UID value for principal.manager.manager, then to look up the string value of principal.manager.manager.location. As currently proposed, the RFC makes the customer work harder to figure out the execution plan from paths. I don't want to work hard.

It's going to be much easier for me as a customer of Cedar to say "Here's everything I know, tell me exactly what to load", and then repeat until there's nothing left to load. It's a dirt-simple code path I can run in a loop.

Such an approach also makes it consistent on how to load entity information, regardless of whether the entities are the principal/resource, or some other entity referenced in a policy, or some other entity literal. As a data gatherer, I simply don't care. Cedar tells me which entities to load, and I just go load them. This makes it dirt-simple to make a generic data loader.

shaobo-he-aws · 2024-08-08T16:42:04Z

One is that no matter what we design here, any real-world data gatherer will be looping and iterating. For example, given a policy that refers to principal.manager.manager.location, the machinery loading the data from a database has to ping back and forth with the database 3 times. First to look up the UID value for principal.manager. Then, to look up the UID value for principal.manager.manager, then to look up the string value of principal.manager.manager.location.

I agree that iterating works for most if not all DBs. That being said, this can be done via one query in a relational DB with JOINs?

jeffsec-aws · 2024-08-08T16:55:45Z

Also while I agree that getting data out of the PIP might need round trips, getting out of the policy store of the PDP like here does not. The information needed is stored in single dedicated objects (policies) which are extrapolated into what is missing.

shaobo-he-aws · 2024-08-08T18:01:57Z

For the situations that I'm personally most familiar with, I'd prefer a different approach. I'd rather give Cedar a policy slice, and the parameters of specific request (principal, action, resource), and Cedar emits a document that tells exactly which data to load for the request.

I think such document format should merit a RFC.

An example manifest may look like this:

  {
    "EntityType": "User",
    "EntityId": "12345",
    "AttributesToLoad": ["name","email","manager"]
    "MembershipsToCheck": [
      {"EntityType": "UserGroup",
       "EntityId": "987654"},
      {"EntityType": "UserGroup",
       "EntityId": "987654"}
    ]
  },
  {
    "EntityType": "File",
    "EntityId": "image.jpg",
    "AttributesToLoad": ["size","type","lastUpdated"]
    "MembershipsToCheck": [
      {"EntityType": "Folder",
       "EntityId": "2902982"},
      {"EntityType": "Folder",
       "EntityId": "6297822"}
    ]
  }
]

IIUC, entities in the MembershipsToCheck are the RHS of in operations. If so, the executor of this document essentially evaluate these operations, which are later evaluated again by the Cedar evaluator. Moreover, we have to perform a cheap analysis on the policy slice to have this kind of information, right?

patjakdev

I generally like this idea as it provides a neat way to do a just-in-time batched retrieval of entity data relevant to a particular type of request.

I'm curious why such a similar approach for finding the entities to load given a fully formed PARC wasn't included in the proposal.

patjakdev · 2024-08-12T22:49:13Z

text/0074/0074-entity-slicing.md

+pub struct RequestType {
+    pub principal: EntityType,
+    pub action: EntityUID,
+    pub resource: EntityType,
+}


Interesting. Why not allow EntityUIDs to be passed for the principal and resource?

The RequestType is used to index the entity manifest. When it is computed, the actual principal and resource are not known (but the types are). Action ids are known since there are a finite number.

text/0074/0074-entity-slicing.md

oflatt · 2024-08-26T16:53:08Z

As currently proposed, high-scale systems would struggle to use this approach. It has a couple of drawbacks....

Hi @D-McAdams, sorry for the late reply (I was out for a few weeks). You make some good points about the drawbacks of this proposal that I hope to address.

I agree that maintaining an entity manifest when policies are numerous and frequently change is a problem. Luckily, I think this proposal is compatible with the solution you outlined. Given a policy slice, the Entity Manifest can be computed dynamically with a single pass over all the policies. Then, you can use it as a "document" that tells exactly what data to load for the request.
Yes, as proposed in this RFC manifests are course-grained in terms of the ancestors required and elements of sets. I'll be working on this problem for the next few weeks. My guess is that it could be solved by annotating the entity manifest further, treating ancestors specially in the API.
I think the concrete plan of how to load the entities from the database is solved by my SimplifiedEntityLoader API. As you said, there is a loop that tells the user the necessary data to arrive at a concrete result. When all the data is retrieved, it stops and the authorization can proceed. I also think that we could expand on this API to make it more precise, giving details on which fields are required for each entity. We've also been thinking about designing algorithms that efficiently make use of this API to batch as many entities together as possible.

As for a JSON format, entity manifests map easily to JSON objects, and they look similar to what you have written.

Let me know what you think- perhaps I've missed something. In my mind, the entity manifests as proposed in this RFC are the most general form of entity slicing, and other solutions can be implemented on top.

oflatt · 2024-08-26T16:59:37Z

@jeffsec-aws asks:

What would it take to answer in Cedar schema?

That's a good question. Cedar schemas and entity manifest look quite similar, but entity manifests represent data paths through multiple types. While I think it's possible to represent an entity manifest as a cedar schema as you showed, it requires encoding the data somehow. In other words, any encoding using Cedar schemas would abuse the format.

Co-authored-by: Kesha Hietala <khieta@amazon.com> Signed-off-by: oflatt <oflatt@gmail.com>

Signed-off-by: oflatt <oflatt@gmail.com>

oflatt · 2024-08-28T18:59:45Z

@patjakdev asks:

I'm curious why such a similar approach for finding the entities to load given a fully formed PARC wasn't included in the proposal.

That's a good question. Having a fully formed PARC and having all the policies on hand does let you be a little more precise. You can do partial evaluation of the policies, get an entity manifest based on that, then do entity loading.

oflatt · 2024-09-16T19:49:42Z

@D-McAdams This PR implements an API that is similar to what you have outlined above:
cedar-policy/cedar#1208

Users can compute an entity manifest from a policy slice, then implement an EntityLoader.

oflatt force-pushed the oflatt-entity-slicing branch 2 times, most recently from 3eaa6db to f53f2d2 Compare July 29, 2024 18:12

add entity slicing rfc

0778bef

Signed-off-by: oflatt <oflatt@gmail.com>

oflatt force-pushed the oflatt-entity-slicing branch from f53f2d2 to 0778bef Compare July 29, 2024 18:14

khieta added the pending This RFC is pending; for definitions see the README label Jul 30, 2024

oflatt mentioned this pull request Jul 30, 2024

Add Entity Manifests to Cedar cedar-policy/cedar#1102

Merged

11 tasks

Update 0073-entity-slicing.md

d33585d

Signed-off-by: oflatt <oflatt@gmail.com>

oflatt force-pushed the oflatt-entity-slicing branch from 5f579f4 to d33585d Compare July 30, 2024 21:05

cdisselkoen reviewed Aug 1, 2024

View reviewed changes

oflatt force-pushed the oflatt-entity-slicing branch 2 times, most recently from 56b7f6f to ee696f1 Compare August 1, 2024 17:43

oflatt and others added 11 commits August 1, 2024 10:43

Update text/0073-entity-slicing.md

0b89ff3

Co-authored-by: Craig Disselkoen <craigdissel@gmail.com> Signed-off-by: oflatt <oflatt@gmail.com>

Update text/0073-entity-slicing.md

d634187

Co-authored-by: Craig Disselkoen <craigdissel@gmail.com> Signed-off-by: oflatt <oflatt@gmail.com>

Update text/0073-entity-slicing.md

01905a4

Co-authored-by: Craig Disselkoen <craigdissel@gmail.com> Signed-off-by: oflatt <oflatt@gmail.com>

Update text/0073-entity-slicing.md

e20d942

Co-authored-by: Craig Disselkoen <craigdissel@gmail.com> Signed-off-by: oflatt <oflatt@gmail.com>

Update text/0073-entity-slicing.md

31903b7

Co-authored-by: Craig Disselkoen <craigdissel@gmail.com> Signed-off-by: oflatt <oflatt@gmail.com>

Update text/0073-entity-slicing.md

afbb92a

Co-authored-by: Craig Disselkoen <craigdissel@gmail.com> Signed-off-by: oflatt <oflatt@gmail.com>

Update text/0073-entity-slicing.md

da15abc

Co-authored-by: Craig Disselkoen <craigdissel@gmail.com> Signed-off-by: oflatt <oflatt@gmail.com>

Update text/0073-entity-slicing.md

3d6b22d

Co-authored-by: Craig Disselkoen <craigdissel@gmail.com> Signed-off-by: oflatt <oflatt@gmail.com>

Changes based on great feedback from @cdisselkoen

3d0365e

Signed-off-by: oflatt <oflatt@gmail.com>

fix image

08727fc

Signed-off-by: oflatt <oflatt@gmail.com>

avoid synchronous

9ba753c

Signed-off-by: oflatt <oflatt@gmail.com>

oflatt force-pushed the oflatt-entity-slicing branch from ee696f1 to 9ba753c Compare August 1, 2024 17:43

oflatt mentioned this pull request Aug 1, 2024

entity slice validation #76

Merged

re-name to 0074

4dcbff9

Signed-off-by: oflatt <oflatt@gmail.com>

khieta approved these changes Aug 6, 2024

View reviewed changes

patjakdev reviewed Aug 12, 2024

View reviewed changes

oflatt and others added 2 commits August 26, 2024 10:16

Update text/0074/0074-entity-slicing.md

7b63a12

Co-authored-by: Kesha Hietala <khieta@amazon.com> Signed-off-by: oflatt <oflatt@gmail.com>

generalized version

d7ff80f

Signed-off-by: oflatt <oflatt@gmail.com>

oflatt force-pushed the oflatt-entity-slicing branch from eef7b00 to d7ff80f Compare August 26, 2024 17:17

oflatt added 4 commits August 26, 2024 10:20

add links to examples

bf5d525

Signed-off-by: oflatt <oflatt@gmail.com>

clarify option 2

f97fcdd

Signed-off-by: oflatt <oflatt@gmail.com>

nit

ca306d5

Signed-off-by: oflatt <oflatt@gmail.com>

move load fn out

5ab6a62

Signed-off-by: oflatt <oflatt@gmail.com>

khieta added pre-acceptance-experimental This RFC is pre-acceptance experimental; for definitions see the README and removed pending This RFC is pending; for definitions see the README labels Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Entity Slicing using Data Tries #74

Entity Slicing using Data Tries #74

oflatt commented Jul 29, 2024 •

edited

Loading

oflatt commented Jul 30, 2024

cdisselkoen left a comment

jeffsec-aws commented Aug 2, 2024 •

edited

Loading

jeffsec-aws commented Aug 2, 2024

jeffsec-aws commented Aug 2, 2024 •

edited

Loading

jeffsec-aws commented Aug 2, 2024

D-McAdams commented Aug 6, 2024

jeffsec-aws commented Aug 6, 2024

khieta left a comment

khieta Aug 6, 2024

oflatt Aug 26, 2024

khieta Aug 6, 2024

oflatt Aug 26, 2024

D-McAdams commented Aug 6, 2024

shaobo-he-aws commented Aug 8, 2024

jeffsec-aws commented Aug 8, 2024

shaobo-he-aws commented Aug 8, 2024

patjakdev left a comment

patjakdev Aug 12, 2024

oflatt Aug 26, 2024

oflatt commented Aug 26, 2024 •

edited

Loading

oflatt commented Aug 26, 2024

oflatt commented Aug 28, 2024

oflatt commented Sep 16, 2024


		As written, entity manifests in this RFC do not support loading only parts of a Cedar Set, or only some of the ancestors in the entity hierarchy. This is because sets are loaded on the leaves of the access trie, with no way to specify which elements are requested.

		To support this feature, we recommend that we take a constraint-based approach. Constraints would be attached to nodes in the access trie, specifying which elements of sets or the parent hierarchy are needed. Constraints form a small query language, so this would increase the complexity of the entity manifest.

Entity Slicing using Data Tries #74

Are you sure you want to change the base?

Entity Slicing using Data Tries #74

Conversation

oflatt commented Jul 29, 2024 • edited Loading

oflatt commented Jul 30, 2024

cdisselkoen left a comment

Choose a reason for hiding this comment

jeffsec-aws commented Aug 2, 2024 • edited Loading

jeffsec-aws commented Aug 2, 2024

jeffsec-aws commented Aug 2, 2024 • edited Loading

jeffsec-aws commented Aug 2, 2024

D-McAdams commented Aug 6, 2024

jeffsec-aws commented Aug 6, 2024

khieta left a comment

Choose a reason for hiding this comment

khieta Aug 6, 2024

Choose a reason for hiding this comment

oflatt Aug 26, 2024

Choose a reason for hiding this comment

khieta Aug 6, 2024

Choose a reason for hiding this comment

oflatt Aug 26, 2024

Choose a reason for hiding this comment

D-McAdams commented Aug 6, 2024

shaobo-he-aws commented Aug 8, 2024

jeffsec-aws commented Aug 8, 2024

shaobo-he-aws commented Aug 8, 2024

patjakdev left a comment

Choose a reason for hiding this comment

patjakdev Aug 12, 2024

Choose a reason for hiding this comment

oflatt Aug 26, 2024

Choose a reason for hiding this comment

oflatt commented Aug 26, 2024 • edited Loading

oflatt commented Aug 26, 2024

oflatt commented Aug 28, 2024

oflatt commented Sep 16, 2024

oflatt commented Jul 29, 2024 •

edited

Loading

jeffsec-aws commented Aug 2, 2024 •

edited

Loading

jeffsec-aws commented Aug 2, 2024 •

edited

Loading

oflatt commented Aug 26, 2024 •

edited

Loading