-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unbounded length for identifiers #413
Comments
Is there no bound at all currently? I can't find any in the spec 🤔 |
I can't find one in the spec or the |
It looks like protobuf doesn't have a max identifier length, does json? |
JSON doesn't, but I would consider this a bug. |
Neither proto nor JSON define limits, though there are practical limits within different implementations. The guidance about operator counts is intended mostly to establish rough guidance on what parsers should reasonably be expected to support, though they may support more. I don't see a reason to be proscriptive about the identifier lengths at a CEL level since users could add custom validations that impose such limits per use-case. That said, maybe I am misunderstanding the nature of the request? @hudlow could you elaborate a bit more about how having a limit would be helpful? |
@TristonianJones As I develop a parser, I have three goals in mind:
These are principles I apply to the design of any API, and I would have thought them to be consistent with CEL's values. |
CEL can, at times, inherit the limitations of the data types it is provided. Since CEL isn't a data definition language, I'm reluctant to place limits that alternative type systems might hit as it's not just used with JSON and proto. CEL is also not a self-contained runtime. It' not possible to prevent all failure modes, nor it is possible to prevent running out of memory (at least not at the syntactic level) because CEL has no concept of its memory limits. These limits must be configured and CEL offers runtime controls for failing early based on the limitations of the host process. Since CEL is flexible in its use, it would typically be up to the application to determine which practices beyond those described as part of CEL which are required to use the language and to be portable with their desired use case. Networking would have totally different limitations than an expert system, for example. While I do like being able to describe limitations, and make guarantees about behavior, I feel as though what's described is a set of limitations for a particular use case which may not cover all uses of CEL. |
A few thoughts in response to that:
But it is possible to define all failure modes, and the failure mode of "if you try to use an identifier with a terabyte's worth of characters something will probably break" is undefined.
I think a key question is where do we specify what these runtime controls are such that (1) they offer sufficient control and (2) implementations provide them uniformly? |
Neither Java, Go, nor C++ declare an identifier character limit (neither do they seem to have practical limits on parsable numeric representations), but you are likely to encounter practical limits that trigger a blow-up with sufficiently long names. Conventional wisdom says this limit is 64KB in Java, but it's not a rule. In terms of portability, CEL focuses on runtime portability of the parsed and type-checked expression meaning that expressions will evaluate the same way no matter which stack parsed them. I wouldn't recommend passing unparsed expressions between processes. For parsing and type-checking, we tend to have fewer controls, but the ones that exist are intended to prevent breakages due to pathological inputs. The Golang and Java CEL libraries also provide validators which can be used to sanity test the AST structure prior to deploying the expression. Most applications have a practical size limit either in terms of the number of expression nodes in the AST total program byte size, but there isn't a practical limit to what this might be from CEL's perspective. We've seen some 500KB expressions, for example. Personally, I think 64KB is much larger than most CEL expressions should ever be. For runtime controls, most are configurable through options provided to the runtime builders. Most are similarly named, though there are some with slight behavior differences due to implementation choices that make exact parity difficult. Over time, I expect the name and behavior of these controls to become more and more standardized. If CEL were to introduce parse-time limits for identifiers and numbers, it would have to be quite large, something like 64KB only because we'd be imposing a limit beyond what languages tend to define. Would that work for you? It's large, but not large enough to cause any practical problems on any platform. |
I'd definitely take 64KB over no limit, but I'd still like a standard option supported by every implementation. I am not sure I understand the assumptions that (1) Java, Go, and C++ got this right and (2) the decision for them is the right decision for CEL which is very intentionally constrained in other ways that they are not. Anyway, I do appreciate the dialogue!
I hope I can convince you that there is great utility to end-to-end portability. 🙂 |
I'd like to propose that identifiers be, with whatever caveats are necessary for backward compatibility, bounded to a reasonable length. The 63-character limit for DNS names would be great if we can get away with it — otherwise maybe 255?
The text was updated successfully, but these errors were encountered: