-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add resource constraints for CEL #3144
Changes from 1 commit
ae09cbb
16283e1
11be065
eea4fd8
ff38644
efda13f
fa0b979
bf2b434
76e4605
e4b49ba
8bd777a
e541ae3
6a550cb
ac9e0e7
4e440b0
3e36b22
8736077
49fe807
5f96b1c
5f44379
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -592,26 +592,149 @@ associativeList.all(e, e.val == 100) | |
|
||
### Resource constraints | ||
|
||
For Beta, per-request execution time will be constrained via a context-passed timeout. For Beta, we are looking at a range between 100 milliseconds and 1 second for the timeout. We want to see how | ||
CEL performs in benchmarks once our Beta optimizations are in place before deciding what specific | ||
duration to set the timeout to. If the timeout is exceeded, the error message will include how | ||
long each expression took to evaluate. | ||
|
||
We will provide time and space benchmarks to show what can be expected in typical and worst-case | ||
scenarios for CEL expressions, and that for most use cases, the timeout alone will be sufficient. | ||
The alternatives listed below would be more aimed at malformed/malicious expressions. | ||
|
||
If the context timeout proves to be insufficient, we have other alternatives to explore, such as | ||
limiting nested list comprehensions to a depth of 2. According to the performance section of | ||
[the CEL spec](https://github.com/google/cel-spec/blob/master/doc/langdef.md#performance), the | ||
time complexity of these operations are all O(n), so by limiting nesting to 2 deep, we limit | ||
expression complexity to O(n<sup>2</sup>). We can limit regular expression complexity | ||
as well. | ||
|
||
CEL also provides a [cost subsystem](https://github.com/google/cel-go/blob/dfef54b359b05532fb9695bc88937aa8530ab055/cel/program.go#L309) that could be used in the future, | ||
but the cost subsystem would need to know the length of any relevant lists in order to be useful. | ||
That information can be supplied using `maxLength`, but this is an optional field, and if not | ||
passed, CEL would not be able to provide a useful figure. | ||
CEL expressions have the potential to consume unacceptable amounts of API server resources. We intend to constrain the resource utilization in two ways: | ||
|
||
- Proactively by static analysis of the CEL expressions to compute a "cost", and limiting that cost | ||
- Reactively by enforcing CEL expression evaluation time limits using go context cancelation | ||
|
||
We intend to enforce the limits both on a per-expression and per-request basis. | ||
|
||
Per-expression limits are primarily targeted at CEL expression authors and are intended to provide actionable feedback about expression cost. They are insufficient for ensuring that requests will be processed in a bounded amount of time. For example, a validation rule for a field within a list of objects, might be well within bounds each time it is evaluated, but the sum of the costs across the list of objects may be unacceptable. This is why we also have a per-request limit. | ||
|
||
#### Cost limits | ||
|
||
We will use CEL's [cost subsystem](https://github.com/google/cel-go/blob/dfef54b359b05532fb9695bc88937aa8530ab055/cel/program.go#L309) to provide our proactive limits. This is protects against both | ||
accidental and malicious misuse. If an expression is analyzed and has a cost greater than | ||
a predefined limit, it will not be allowed. If an expression is rejected for this reason, the error | ||
message will include how much the limit was exceeded. | ||
|
||
A major problem with the cost system is assigning the cost of list iteration. The cost of a CEL expression is | ||
computed statically without any knowledge about the data that will be validated. Only the CEL expression is | ||
available when computing cost. This means that the sizes of variable length data types (lists, maps, sets, | ||
strings, byte arrays) are entirely unknown. This limits our options for computing the cost of CEL expressions | ||
that iterate across these list types. Setting high costs for iteration is a major problem because, in | ||
practice, lists are short. For example, a pod has a list of containers, but the list is usually quite short. | ||
|
||
There are a couple ways to address this problem: | ||
|
||
- We require CRD authors to provide `maxLength` on all variable length data types that they iterate across in CEL expressions so we know exactly how to compute cost | ||
- We do not attempt to statically compute a cost and instead rely entirely on time based limits | ||
- We prove that, in practice, the cost limits can be found that are operationally safe while still not being overly restrictive | ||
|
||
We have explored the cost system and realized (thanks @liggitt!) that O(n) iteration is bounded by custom | ||
resource size limits (specifically the 3MB request limit) and we can use this to establish more manageable | ||
worst case sizes. | ||
|
||
If a list contains small elements (e.g. an int64) then the worst case list size can be 3MB/4bytes=~786k | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: 2 bytes per item, right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, that's true, CRDs are stored in JSON. (Kermit and I were thinking about in-memory representation when writing this...) |
||
elements, but as the element size grows, the worst case list size quickly becomes manageable. We ran a [series of experiments](https://docs.google.com/document/d/1yR746Rf-rw-_zoq36Ypzu8LTqOa-LaCk1pv3e0kjF3A/edit?usp=sharing) to test the resource utilization of CEL expressions that iterate across lists. | ||
We found that: | ||
|
||
- O(n) iteration of a worst case list (list length == 786k) is <150ms | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. does that mean an iteration of a list of integers would not fit in the proposed 100ms limit? edit: oh, I guess an actual list of integers in a 3MB CR... also, I think worst-case is actually ~1572k (see above comment about items being 2 bytes) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right, there are some corner cases that wouldn't fit within the wall clock timeout, at least on the hardware tested. For instance, the 3-million-long list scenario in the performance doc. Though in that case iteration took nearly a full .8 seconds, hence the timeouts/cost limits being built at least in part over what would provide a good experience request-wise (as opposed to covering all CEL cases). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, the integer case is a bit problematic that way (and you're correct that it's 2x what we had calculated, so it's cost estimate would be closer to a worse case of 300ms). I suspect we can get the expected latency down for this once we optimize some of our CEL type integration code. If we get this under 100ms, then I think we can get to the point where all O(n) ops can be within our cost limits. But there is some risk that this specific case won't, in which case we'd either need to make a special exception to allow it anyway (since most list in practice won't be nearly this bad), or require users set maxLength on the field in order to use CEL iteration operations on the list. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I'd be ok with basically any cel expression touching things within lists requiring a maxLength be specified on the list |
||
- O(n) iteration of 10 byte data elements (list length == 300k) is < 70ms | ||
- O(n) iteration of 100 byte data elements (list length == 30k) is < 7ms | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the concrete times are heavily dependent on the system running the test. what safety factor are we building in for our estimated translation of abstract CEL cost to bounded run time to account for lower-powered servers and server-load? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If I'm understanding correctly (what happens when the cost doesn't account for a significantly underpowered server?), then the wall-clock timeout should catch those kinds of events. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's add the safety factor. I like the idea of having a convincing calculation of our "worst case". I think it helps ground the following calculations like showing that we can allow O(n) in most cases without requiring a maxLength. How about 5x as the safety factor? I.e. CEL is getting 20% of the available compute or the machine is 1/5 as powerful? (since there is not bound to how little CPU a process might be granted, we can't calculate a finite worst case, so we just need to pick something and document it) |
||
|
||
Consider an example: | ||
|
||
``` | ||
list.all(element, element.startsWith("prefix:")) | ||
``` | ||
|
||
The cost of this expression is: | ||
|
||
``` | ||
cost('startsWith()' function) * worst_case_length(list) | ||
``` | ||
|
||
If the `worst_case_length()` is 30k for the list (e.g. the list elements are objects where the minimum size of | ||
each object can be computed statically to be 100 bytes) and the cost of `statsWith()` is, say, 3, we know that | ||
this is < 21ms to evaluate. | ||
|
||
Note that we could also have a much lower `worst_case_length()` if the CRD author provided a `maxLength` limit on their list size. | ||
|
||
This implies that we can use cost to: | ||
|
||
- Set limits that ensure CEL expressions evaluate within known time bounds | ||
- Allow O(n) operations so long as the per-element cost is moderate (e.g. 100 cost) | ||
- And again, allow more expensive per-element cost when `maxLength` is set | ||
- Allow many (100s) CEL expressions in a CRD that are within these complexity bounds | ||
- Disallow O(n<sup>2</sup>) or worse operations based purely on their cost | ||
- But still allow O(n<sup>2</sup>) or worse so long as the CRD author adds a `maxLength` limit to the lists and the resulting cost is within cost bounds (e.g. O(n<sup>3</sup>) is still fine for a list size of 3). | ||
|
||
Also, not requiring `maxLength` on most O(n) CEL expressions keeps the cost system low friction; the majority | ||
of CEL expressions can be written and used without bumping into cost limits or needing to set `maxLength`. | ||
|
||
For the cases where a `maxLength` is needed to ensure cost is acceptable, encouraging CRD authors to include | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if the incoming object is determined to be invalid according to the openapi schema (violates maxLength, or omits required fields in items that informed our calculated minimum size, etc), do we still evaluate cel expressions anyway? I think we currently evaluate CEL in addition to openapi validation (@jpbetz can confirm). We may need to change that if we're counting on maxLength or required fields in our cost calculations. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a good point. We do run both openAPI validation and CEL validation. So at a minimum we would need to change the behavior and skip CEL expression validation when we fail maxLength validations that are relevant to that CEL expression. That needs to be stated here somewhere. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (this comment is non-blocking for the KEP update, but we'll need to resolve what we do here for the impl... I'm ok with seeing if the runtime cost bounds protect us sufficiently before committing to run or not run CEL rules in that scenario) |
||
`maxLength` on variable length data elements is generally considered good hygiene anyway, so having a cost | ||
system that incentivizes developers to set this field, has other benefits. | ||
liggitt marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
We will use the minimum possible size of list elements to estimate the size (e.g. objects with many fields | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. a couple edges to pay attention to:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good thinking, thanks! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (this doesn't need to be reflected in the KEP, but should be remembered for impl time) |
||
will have much larger minimum sizes). This helps us pick smaller worst case list sizes that we might pick if we had to assume all lists contain 1 byte elements. | ||
|
||
For example, the list element size `x` in this schema: | ||
|
||
``` | ||
spec: | ||
type: object | ||
properties: | ||
x: | ||
x-kubernetes-validation: | ||
- rule: "self.all(element, element.y.startsWith('prefix:'))" | ||
type: array | ||
items: | ||
type: object | ||
properties: | ||
name: y | ||
type: string | ||
name: z | ||
type: string | ||
``` | ||
|
||
The worst case list size of `x` would be based on the size of an object with two strings (`y` and `z`). | ||
|
||
We will also account for where in a schema CEL expression is placed when calculating its cost. For example: | ||
|
||
``` | ||
spec: | ||
type: object | ||
properties: | ||
x: | ||
type: array | ||
items: | ||
type: object | ||
properties: | ||
name: y | ||
type: string | ||
x-kubernetes-validation: | ||
- rule: "self.startsWith('prefix:')" | ||
``` | ||
|
||
In this example the validation rule is contained within a list, so its expression cost would be `cost(startsWith) * worst_case_length(x)`. Note that this means some of the cost calculation will | ||
happen outside of the CEL cost function (which would only calculate the `startsWith` cost). | ||
|
||
During CRD validation, when an expression exceeds the cost limit, the error message will include both the limit and the cost of the expression to facilitate debugging. The cost calculation rules will be documented. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. how confident are we in the stability of the cost calculations (the ones done in CEL and what we're planning to layer on top)? how will we ensure that we will not accept a CRD in one release and reject it in the future due to tweak in the cost heuristics? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. also, will publishing the limit encourage construction of expressions that squeak in just under the limit? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we can do some things to boost our confidence. I would like to write a fuzzer that tries to find the most expensive CEL expression for a given "cost" limit. If we could get something like that working and show that we don't have any gapping discrepancies between the cost heuristic and reality, I think we could start to feel a bit more confident. That said, regex matching worries me. re: expressions that squeak in just under the limit. Maybe? I'd like to see how developers respond to the ability to set maxLength, I suspect that the vast majority of cost problems will be most trivally solved by setting that to a value that is higher than they ever expect to actually hit, but still much lower than the max the cost heuristics would pick. |
||
|
||
#### Time limits | ||
|
||
We believe CEL cost will be sufficient to bound the resource utilization of the vast majority of CEL | ||
expressions. As a backstop we will use timeouts. We will set these timeouts high enough that they are | ||
not disruptive to all but the most extremely resource starved api-servers. We don't believe this will | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I dunno... I've seen some pretty resource-starved servers... if a server is starved and is context-switching between requests, is artificially failing the CEL evaluation at 100 ms or all expressions at 1 second helpful? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be slightly better if this were counting CPU time and not wall time, but off the top of my head I'm not sure how to do that easily. |
||
cause problems in practice because there are other timeouts that will trigger under those conditions. | ||
|
||
Per-request and per-expression execution time will be constrained via go cancellation and a context-passed timeout. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. at minimum, we should definitely constrain it to the request lifetime using the request context. I'm less certain about constraining it to 100ms/1s targets without knowledge of the server power and load There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I'll be okay with just using the request context timeout so long as we can build enough confidence that the cost heuristic working reasonably well, which I think we need to do anyway. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we have an idea on how CEL is going to implement cancellation? Does it check if the context is cancelled in every step of loop, a few steps, or something else? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't like that the time limit could make this inconsistent depending on how busy apiserver is. Does CEL offer a feature for counting up & limiting actual runtime cost? I'd really prefer a limit that was reproducible given the input object(s). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
agreed There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems doable to me. At compile time we can compute a cost heuristic with incomplete information. Running a cost counter at evaluation time seems no harder, and maybe quite a bit easier. I'd be interested in exploring this. If we can get it working, I'd be in favor of using it instead of wall clock time. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would strongly prefer this to a time-based approach if it is doable. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, hrmm… I wasn't thinking about calculating cost per request. The worst case pre-calculation at CRD edit time seems more palatable. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think you should compute cost-per-request, I think you should give CEL an expression budget and let it halt if the expression goes over its budget. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A case that seems relevant to this discussion:
Here the rule is cheap (I'm pretty sure the cost is 1). But it's contained in a list which might be huge so it might be evaluated many times. (edit: If we pre-compute this cost at CRD update time, we were planning to multiply the expression cost with the worst case length of the array. If we instead give the expression a budget, should the budget be for all evaluations of that expression that happen during a request?) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The current alternatives I'm seeing:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think you should keep two budgets, overall and per expression. When running an expression, its budget is the minimum of the remaining overall budget and of the per-expression budget. After running an expression, you should decrement whatever it used from the remaining overall budget. (Assuming you can do this with CEL) |
||
|
||
If the per-expression timeout is exceeded, the error message will include how long the expression in question took to execute. If the per-request timeout is exceeded, the error message will instead include how long every expression took to execute up to and including the one where the timeout was exceeded. | ||
|
||
We will continue to refine and improve this cost data. For example, optimizations to how CEL integrates with Kubernetes data may allow us to increase our cost bounds. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what are the compatibility implications of changing permitted cost bounds over time? if we tighten these in the future, we'll break existing CRDs. if we relax them, new CRDs that use the expanded capacity won't work on older versions. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's my understanding. This portability issue applies to pretty much anything we change about CEL. |
||
|
||
Based on our current data we believe the limits should be: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In a write request with both CEL expressions and webhooks, how many times will the CEL expressions be evaluated? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it's run with API server validation, so once per internal write attempt to etcd (if there's an etcd conflict that results in an internal retry, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We don't validate before sending to webhooks? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. mutating admission (including webhooks) run first, then validation, then validating admission (including webhooks), then we write to etcd. if there's an etcd conflict and the request can be retried (patch request or unconditional update with no resourceVersion precondition) the whole chain runs again There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks -- I cleared my entire cache while OOO :) |
||
|
||
- per-expression cost: 120,000 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just curious, do we know the unit of this cost? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe it maps roughly to how many calls to Interpretable.Eval are made. Of course, different |
||
- per-request cost: 12,000,000 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should feed this information into APF to throttle requests accordingly; this will incentivize authors to be stingy but also not prevent them from doing something expensive if it's truly necessary. May be worth mentioning as follow-up work. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Gotcha, I've amended the PR to include that APF will be taken into account for the finalized/refined bounds. I think we can definitely include APF as follow-up work. |
||
- per-expression timeout limit: 100 milliseconds | ||
- per-request timeout limit: 1 second | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should these be fixed or tunable by cluster admins? if tunable, should we define and exercise a minimum allowable expression and request cost in conformance to give CRD authors a portable target? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Portability I think is a big reason to keep the timeouts/cost limits fixed. Maybe in the future we could make these tunable if it turns out that sort of flexibility is something cluster admins need? It'll probably be easier to go from fixed > flexible than the other way around. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm inclined to not make them configurable since:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we use time limits, I don't think they should be tunable, it will make a CRD behave differently on different clusters. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unlike time spent on webhooks, time spent evaluating CEL expressions is likely to be very CPU intensive, so this sounds pretty high. This may require changes in e.g. APF configurations. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we do a runtime cost calculation in CEL like you suggested in https://github.com/kubernetes/enhancements/pull/3144/files#r798120466 and use that to bound CPU maybe this will be less bad? |
||
|
||
This is based on our evaluation of cost. The limits allow O(n) with a loop body with a cost of 4 (enough for a basic regex). Since basic CEL expressions have a cost of 1, this should be plenty in most cases, but `maxLength` can be set on any lists where more computation needs to be done. | ||
|
||
|
||
### Test Plan | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And don't forget boolean short circuit. basically it became n-SAT Problem, which is NP Complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. CEL is very constrained though. We're using a CEL runtime with a limit of 32 levels of expression nesting, 32 terms separated by
||
in a row, the expression can't define its own functions and has to fit within a CRD. I'm curious what the worst case CEL expression would be here? It might still be pretty expensive? If so we might want to tighten some bounds. It seems containable to me, but I'd love to see someone prove me wrong (or right).