-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EQL: changes to strings and regular expressions #61659
Comments
Pinging @elastic/es-ql (:Query Languages/EQL) |
Related, I think we should also support |
Ross, thanks for the detailed analysis. quotingConsidering the numbers, my preference is to keep only double quotes regexpSince regular expression declaration occurs only within The vast majority of regular expressions do not use |
Regarding quoting, from what I understood, one advantage of standardizing only on |
This doesn't immediately free up ' for quoting field names, though. The reason for that is because we should avoid reinterpreting existing syntax with new semantics. Otherwise currently valid EQL will remain syntactically valid but will work drastically differently. If we want to reuse ', we should first deprecate it, then after we are confident that it's no longer used, we can reintroduce meaning to ' strings. So in the near term, this doesn't solve that problem. But I'm also not convinced that ' is a good choice for quoting field names, and haven't heard an argument for choosing ' over ` or vice versa. |
If/when we reintroduce ', I personally think the best usage is for literal I escaped strings. I expect this use case to be very common, as doubling up backslashes for file paths gets annoying. Ruby, YAML, and TOML are a few languages that come to mine with those semantics. If we remove ?" and ?', then we'll have no syntax left to allow for unescaped strings, so we should keep that user experience in mind. We could reintroduce ' with that meaning later, but that would mean there's a gap time frame with no way to express an unescaped string. |
After discussing this issue today, we concluded to move forward with using |
It's easy to log a deprecation message for the usage of |
We shouldn't change the meaning of existing syntax overnight. I think we need to go to a deprecation path first before we assign new semantics. That way nobody ends up surprised and confusing by breaking changes. I still think we can avoid that hassle by going with ` over '. |
Keep in mind that a beta release locks the language, meaning |
The usage of single quotes to wrap a string literal is forbidden and an error encouraging the user to user double quotes is returned. Tests are properly adjusted. Related to elastic#61659
The usage of single quotes to wrap a string literal is forbidden and an error encouraging the user to user double quotes is returned. Tests are properly adjusted. Relates to #61659
Use triple doulbe quotes enclosing a string literal to interpret it as unescaped, in order to use `?` for marking query params and avoid user confusion. Relates to elastic#61659
Since `'` is no longer supported for defining string literals, we replace the backquote [`] with single quote ['] for the definition of identifiers. The backquote is not a commonly used symbol in languages and many times, especially with some fonts, can easily be mistaken and cause confusion. Relates to elastic#61659
Use triple double quotes enclosing a string literal to interpret it as unescaped, in order to use `?` for marking query params and avoid user confusion. `?` also usually implies regex expressions. Any character inside the `"""` beginning-closing markings is considered raw and the only thing that is not permitted is the `"""` sequence itself. If a user wants to use that, needs to resort to the normal `"` string literal and use proper escaping. Relates to #61659
…3174) Use triple double quotes enclosing a string literal to interpret it as unescaped, in order to use `?` for marking query params and avoid user confusion. `?` also usually implies regex expressions. Any character inside the `"""` beginning-closing markings is considered raw and the only thing that is not permitted is the `"""` sequence itself. If a user wants to use that, needs to resort to the normal `"` string literal and use proper escaping. Relates to #61659 (cherry picked from commit d87c2ca)
Bringing discussion from the 2020-08-27 meeting here
We've talked about changes to the EQL syntax, particularly around quoting (escaping fields: #57194 and #51443). And if we want to make changes to the language, we should do it now while we're experimental.
In EQL, there are four options for quoting literal strings:
"using double quotes"
which is an escaped string, and requires"
to be escaped'using single quotes'
which is an escaped string, and allows for literal"
?"using literal double quotes"
which is a raw string, no literal"
allowed?'using literal single quotes'
which is a raw string, no literal'
allowedI did some analysis from our internal repository of EQL rules to breakdown our usage of different types of strings. Almost 98% percent of our strings are in the
"
form."
'
?"
?'
Of those, all usages of
?
were used in the match function. There were no other instances of those string types but in regular expressions. If we want to remove three of the four string permutations (keeping"
), now seems like right time. Although, I do think it is very convenient to have an option for unescaped strings. Escaping can be cumbersome, and I think an unescaped form of a string would help our users. I don't know if?"
is the ideal form, or if something else is bettter.For regular expressions, I think that a new syntax could make a lot of sense with our users, and we could better align with Lucene and Javascript syntax, using
/.../
. I believe a grammar would look like this/([^/\\]|\\/|\\.)*/
. This was well received by Intelligence and Analytics team (owners of detection-rules and the EQL rules for Endgame).If we change the accepted syntax for regular expressions, we need to decide on the appropriate contexts. I think this is the most obvious:
match(some.field, /some.regular.expression/)
We could potentially allow these as well, but I can see arguments for and against. I'm not sure where I lean.
some.field == /some.regular.expression/
equivalent to the abovesome.field != /some.regular.expression/
inverse of the aboveThe text was updated successfully, but these errors were encountered: