-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EQL: Replace ?" and ?' with """ #62645
Comments
Pinging @elastic/es-ql (:Query Languages/EQL) |
Thanks for starting this discussion on an issue @costin. I'm optimistic that we'll agree on something. Instead of interpreting this issue as Replace ?" and ?' with """, I'm hoping to address the underlying question What syntax is ideal for representing raw strings? Most of my thoughts boil down to three high-level questions
A handful of alternatives that we've discussed:
How do we handle these scenarios?
|
For consistency with the rest of the language and to keep symbols usage to a minimum (which adds additional complexity and potentially can confuse users), using For the other four suggested alternatives:
|
I think that this change is quite safe even for any existing users, and the chance that the unescaped string literal is used.
It's the same behaviour for every language out there and every unescaped syntax. What about if the unescaped syntax is marked with an Imho, we should proceed with this breaking change. |
@paulewing and @MikePaquette -- can you comment on the above? Going to add my perspective on the below:
Any. If
Yes. This should be an empty string
This should fail since we would need to make assumptions about user intent, otherwise. I think we need to avoid this and require users to be very explicit
This should be interpreted as the a string:
Fail. Same reasoning as for 7 quotes
Parse to
Parse as |
Folks, before getting lost in the details let's clarify one thing: this ticket is not about changing the raw string semantics.
|
I believe we're on the same page. I asked these questions to make this proposal crystal clear.
I recapped four alternatives and added reasons for why most of them were inadequate:
|
If my understanding is correct, most of confusion revolves around escaping
👍
This is where we have different goals. The aim of this ticket is to simplify the grammar by 7.10. Not to impact in anyway the semantics for unescaped string declaration. Whether what we have is useful or not, is outside the scope of this ticket and 7.10 as far as I'm concerned.
Is the current syntax for
I'm not sure what's unclear. This is not a new concept, the only change in the grammar really is the defining string, from The same rules for newlines, tabs, unprintable characters, etc.. that apply for |
I'm referring to user behavior not syntax or semantic behavior. For the grammar, we're in agreement both on semantics and syntax: it's a string with no escape sequences between the If we create a new syntax for raw strings, but it's awful at capturing regular expressions, then we probably did a bad job. I think For example, if we want to use
My questions about how many quotes do we allow were again to make sure that this proposal successfully accomplishes its goals and use cases. I think the only logical regular expression is this: I don't think of that is simply an implementation detail, it's part of the proposal for |
It looks like we're reaching consensus. I'm removing the team-discuss label but will keep the ticket open just in case there are some details popping up regarding the grammar itself. |
For python when you have something enclosed in
|
One big downside for that is that you can't have trailing double quotes at all, which is problematic. Also, this is most comparable to >>> r""""abc""""
File "<stdin>", line 1
r""""abc""""
^
SyntaxError: EOL while scanning string literal |
I tend to lean towards the greedy approach, so that once you have the leading |
That means you can have at most one |
Yes, so as many |
Maybe these examples better illustrate the problem that I see with greedy matching. What strings are in each expression?
Non-greedy:
Greedy:
Also the analogous python syntax is |
No, it's not like that, the greedy approach works within the expression, |
@rw-access Please check the tests I've added: https://github.com/elastic/elasticsearch/pull/62539/files#diff-8b3f6645d4cf54ff461c3e337b4bbe84R122 |
But what if you want a I still think """[^\r\n\f\v]*?["]{3,} is the safest and most clear specification. It forces the string to be in a single line, can end or start with a double quote inside the string, and has no escape sequences. |
Closed by #62539 |
As part of #61659, the topic of raw strings was brought up.
Currently
?'
and?"
are used for defining a raw/non-escaped string literal. This is problematic because:'
and thus should?'
?"
makes it hard to define regex/patterns containing double quotes?
is a valid regex character and while it is not conflicting, it can be confusing to the user:?"?ab"
The proposal is to replace raw-string declaration with triple quotes,
"""
since the same character (double quotes"
) is used and the risk of clashes (and thus forcing escaping) is minimal.Based on the info in #61659, there are 21 rules, 0.8% that use
?"
and 8 rules/0.33% that use?'
from an total of 2389 rules.That's a total of 29 rules or 1.13% that will be affected by this change.
The text was updated successfully, but these errors were encountered: