Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add any and all operators #21

Merged
merged 7 commits into from
Nov 8, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
190 changes: 190 additions & 0 deletions text/0021-any-and-all-operators.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,190 @@
# Add `all` and `any` operators

## Related issues and PRs

- Reference Issues:
- Implementation PR(s):

## Timeline

- Start Date: 2023-07-14
- Date Entered FCP: 2023-10-27
- Date Accepted: 2023-11-08
- Date Landed:

## Summary

This RFC proposes extending the language with `all` and `any` operators that allow checking if all elements or any element in a set satisfies a given predicate. The `all` operator returns `true` if the provided predicate is true for all elements in the set. The `any` operator returns true if the predicate is true for any element in the set.

## Basic example

Consider writing a policy that allows access when all port numbers in the `context` are between 8000 and 8999. This can't be expressed in Cedar today.

With the `all` operator, we can write this policy as follows:

```
permit(principal, action, resource)
when {
context.portNumbers.all(it >= 8000 && it <= 8999)
};
```

The same policy can be written using the `any` operator instead:

```
permit(principal, action, resource)
unless {
context.portNumbers.any(it < 8000 || it > 8999)
};
```

The two operators behave analogously (though [not equivalently](#extending-the-semantics)) to `&&` and `||`. Given negation, each can be written in terms of the other. We include both for ergonomics.

## Motivation

The `all` and `any` operators support a common use case: iterating over a set and checking if a predicate holds for any or all members of the set.

Cedar currently lacks a general way to express this use case. The only alternative is to expand the `all` or `any` checks into conjunctions or disjunctions. For example:

```
// Using all
["a", "ab", "abc"].all(it like "a*")

// Using &&
("a" like "a*") &&
("ab" like "a*") &&
("abc" like "a*")
```

This manual expansion works for literal sets but cannot be applied to non-literal sets like `context.portNumbers`.

The new operators provide a built-in way to concisely express these universal and existential quantifications over a set.

## Detailed design

The Cedar syntax, semantics, and type system are all extended to support the new operators. These extensions are backward compatible, and existing policies won't be affected. In particular, we will keep the `.containsAny` and `.containsAll` operators on sets, even though they can be expressed in terms of `any` and `all`.

### Extending the syntax

This RFC proposes extending the Cedar grammar as follows:

```
Access :=
'.' QUANTIFIER '(' Predicate ')'
| '.' IDENT ['(' [ExprList] ')']
| '[' STR ']'
QUANTIFIER := 'all' | 'any'
Predicate := ...
```

The `Predicate` expression differs from `Expr` in two ways:

1. Predicates can use the keyword `it`. A `Predicate` defines the body of function with a single parameter, and we use `it` as the parameter name. This is similar to [Kotlin's `it` keyword](https://kotlinlang.org/docs/lambdas.html#it-implicit-name-of-a-single-parameter).

2. Predicates cannot contain nested `any` or `all` expressions. We disallow nested quantifiers for both performance and analyzability reasons. Nested quantifiers allow writing expressions that run in $O(2^n)$ time, where $n$ is the number of nested quantifiers. For example:

```
[0,1].all(
[0,1].all(
...
[0,1].all(true)...))
```

Because nested quantifiers are prohibited, there is no need to support predicates with arbitrary parameter names. The `it` keyword sufficiently extends the grammar.

### Extending the semantics

The semantics of `all` and `any` are straightforward when there are no errors:
- `E.all(P)` evaluates to true when `P[e/it]` is true for every element `e` in `E`.
- `E.any(P)` evaluates to true when `P[e/it]` is true for some element `e` in `E`.

But what happens if evaluating `P` errors on some element `e`? In this case, the evaluation of `all` or `any` terminates abruptly with a distinguished `QuantifierError`.

The `QuantifierError` error may include additional diagnostic information, as long as this information is bounded in size and computed deterministically. For example, the `QuantifierError` error could specify the source location, or a fixed-length description of the smallest erroring value in the underlying set, according to some fixed total order on Cedar values. This error handling approach ensures that evaluating `all` and `any` _deterministic_: the result is the same regardless of iteration order. It also ensures that evaluation is _space efficient_: the size of the evaluator output remains constant in policy and input size.

For example, consider the expression `[1, true].all(it like "foo")`. Regardless of the order in which the predicate is applied, the expression returns the same `QuantifierError`.
emina marked this conversation as resolved.
Show resolved Hide resolved

Compare this to the [alternative semantics](#non-deterministic-error-behavior) that just propagates the error discovered, when it is discovered. Then evaluating this expression could lead to the error that "a Long was found where a String was expected" or it could lead to the error "A Boolean was found where a String was expected." Which one depends on evaluation order, but no clear order exists because set elements are specifically unordered.

The proposed semantics lets us think about `all` and `any` in terms of their expansion to conjunctions and disjunctions. The quantified expression errors if _any_ expansion would error.

For example:
```
// Throws QuantifierError
[1, true].all(it like "foo")

// Throws a TypeError: "like" can't be applied to an integer
1 like "foo" && true like "foo"

// Throws a TypeError: "like" can't be applied to an boolean
true like "foo" && 1 like "foo"
```

The only difference is the error being thrown: `QuantifierError` for the quantified expression versus an element-specific error for the explicit expansion.

### Extending the type system

The type system needs to be extended with rules for type checking quantified expressions `E.all(P)` and `E.any(P)`. This should be fairly standard: we know the type of `it` from the type of the set `E`, and we can use this to typecheck the predicate `P`. We also need to propagate effects through predicates.

For example, the following should typecheck when `context.limit` is an optional integer attribute, and `resource.things` is a set of entities with an optional integer attribute `.size`:

```
context has limit &&
resource.things.all(
it has size && it.size < context.limit
)
emina marked this conversation as resolved.
Show resolved Hide resolved
```

While the effect prior to a `any` or `all` is clear, what about the effect after one? The recommended solution is the simplest one: ignore the effects of `any` and `all` . But we could be more precise if needed.

For example, consider the following expression:

```
resource.things.all(context has limit && it.has size && it.size < context.limit) &&
context.limit > 5
```

Here, the redundant use of `context.limit` in the `all` could carry an effect outside to the `>` expression.

This extra precision isn't necessary in the sense that we can always rewrite expressions to avoid relying on it. For example, we can rewrite the above expression as follows:

```
context has limit &&
resource.things.all(it has size && it.size < context.limit) &&
context.limit > 5
```



## Drawbacks

- This proposal requires significant development effort. We'll need to modify the parser, the CST to AST conversion, CST/AST/EST, evaluator, validator, models, proofs, and DRT.
- The `any` and `all` operators may encourage writing complex expressions with poor performance. For example, `context.portNumbers.any(it == 8010)` is $O(n)$ versus $O(1)$ for `context.portNumbers.contains(8010)`.
- Supporting these operators will likely have a negative impact on analysis performance, especially for complex predicates.

## Alternatives

Comment on lines +165 to +166
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about this alternative:

You evaluate the predicate on every member of the set, and collect all of the errors to be returned in a deterministic order? So in your example [1, true].all(it like "foo") you'd return the equivalent of

[(true,TypeError("Boolean found where String expected"),
 (1,TypeError("Long found where String expected"))]

The order in the list is based on a global order defined on Cedar values, e.g., that all Booleans come before all Longs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, you could sort based on error type, and ignore the contents of the error message and the expression that accompanies it. I believe that matches the level of abstraction we have today when considering Cedar error messages in the model.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that using a list is a viable alternative because it forces the specification to impose an (arbitrary) order on values and/or error types. Then, every implementation has to enforce the same order.

If we want to go this route, how about returning a set of error messages instead?

Not that this a non-trivial change because it changes the return type of the evaluator from Result<Value, Error> to Result<Value, Set<Error>>.

Another alternative is to package this error set into the QuantifierError.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifying the result as a set feels like kicking the can; oftentimes sets are implemented as search trees, which require an ordering.

I feel that the loss in error specificity of the proposal is a potentially significant hit to the user experience. Without coupling errors with the actual expressions, we are still short of the best possible debugging experience.

Copy link
Contributor Author

@emina emina Jul 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can preserve the mapping: use a map from values to errors (instead of a set). That still avoids having to impose an arbitrary ordering in the specification, and allows implementations to use a HashMap if desired.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug mode probably is another evaluator implementation which we have to prove/test is equivalent to the production one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that this is still the sticking point of this RFC. To recap:

The proposal in the RFC is that any error that results from foo.any(...) or foo.all(...) is emitted from the evaluator as a QuantifierError with the location of the whole expression. The pro is a simple, efficient semantics, which admits short-circuiting when the first error occurs, and enjoys O(1) output size. The con is a lack of specific information about what element of foo was the source of the error, and what the error actually is (e.g., TypeError). @cdisselkoen and I are worried about the impact on user experience.

To overcome this con we could elect to package more details in the QuantifierError, e.g., as a map from element in foo to the associated error. Doing this adds a bit more complexity to the implementation, but not much. It eliminates the ability to do short-circuiting in evaluation once an error, and makes output size O(n) for the policy in the worst case. But these downsides are minimal, to my mind. Using a map does not impose an order on Cedar values (which would have added a lot of pain/complexity). The worst case (every element in the set produces an error) is unlikely to happen in practice, and it won't affect the performance of non-erroring evaluation. All of this is worth it, to my mind, because it values customer experience. Good errors are extremely important.

So: I propose we change the proposal to the above, and then accept it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds reasonable to me.

As an implementation note, the natural choice for the map from foo element to error would probably be a map from Expr to error. (Note that Expr implements Hash so it can be put in an unordered hashmap.) But I'm a little less clear on what type the keys should be if you are calling .any() or .all() on a set that originates from the entity data or Context. Expr is still one option (we could easily convert the RestrictedExprs to Expr and return them in that form), but RFC 34 proposes not keeping around the RestrictedExpr by the time we get to evaluation. At that point you'd have to have the key be Value, but those are not Hash. Values are totally ordered, so you can put them in a treemap, but then your concern above about appearing unordered is kind of a fiction. Moreover, if Values are indeed the key type, they'd have to be made public -- something which is also not currently the case but also proposed in RFC 34.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps an acceptable workaround to the above would be to have the key be the string representation of the Value; the string representation would be trivially total-ordered, hashable, and useful for debugging, while dodging the issues about whether Value is made public.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to keep the size of the evaluator output constant. Letting it grow linearly with input size (policies + request + entities) means that the authorizer's output is worst-case quadratic in input size. This is undesirable even if rare.

I've revised the RFC to relax the specification of QuantifierError so that it can include more information in the error message, as long as the additional information is computed deterministically and is bounded in size.

We considered two alternative semantics for these operators.

### Non-deterministic error behavior

emina marked this conversation as resolved.
Show resolved Hide resolved
Given `E.any(P)` or `E.all(P)`, we apply `P` to elements of `E` in arbitrary order, terminating on the first error and returning that error as the result. This alternative provides more precise error messages than the proposed approach.

But the downside is that the same policy could produce different errors depending on iteration order. For example, `[1, true].all(it like "foo")` may result in a type error that says `like` cannot be applied to an integer or a boolean. In other words, Cedar semantics would become non-determinstic.

We decided against this semantics for three reasons:
* Violates Cedar's design as a deterministic language. Error non-determinism would be visible to `is_authorized` callers.
* Applications may also become non-deterministic if they examine errors and act on them.
* Non-determinism complicates models and proofs.

Keeping deterministic semantics is important for Cedar as an authorization language.

### Treating errors as `false`
emina marked this conversation as resolved.
Show resolved Hide resolved

Given `E.any(P)` or `E.all(P)`, we apply `P` to elements of `E` in any order, treating errors as false and continuing evaluation on the remaining elements. Note that this is how `is_authorized` handles policy evaluation errors: they are turned into false and ignored.

With this approach, `any` and `all` become total functions: they never error. For example, `["one", 1].any(it < 2)` evaluates to true because there is one element for which the predicate holds. And `["one", 1].all(it < 2)` is simply false, because `"one" < 2` is treated as false.

The key advantage of this semantics is that it is more amenable to SMT-based analysis than either of the above alternatives. Both error-propagating alternatives need more complex and larger encodings.

We decided against this semantics because it breaks the interpretation of `all` and `any` as shorthands for conjunctions and disjunctions. If we expand `["one", true].all(it < 2)` to a conjunction, the result always errors rather than evaluating to false. This discrepancy between the quantified and expanded forms might be confusing and undesirable.