-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parsing of backslashes in custom string literals seems inconsistent #22926
Comments
I think it should work like normal backslash escapes, but only allowing backslash and quote to be escaped. |
Ok, so that would be option 2 in effect? |
I thought that the point of raw strings was that backslashes don't need to be escaped either, e.g. i.e. essentially option 1 |
It's pretty important not to require escaping of backslashes in custom string literals, as otherwise regex literals become much messier. (There is also the LaTeXStrings package example.) |
Your observations are correct of course. Do you have a preference as to what to do? |
I don't understand option 3. If quotes aren't escaped, how can you tell where the end of the string is? e.g. how can you distinguish |
I just ran into this in bramtayl/BibTeX.jl#3: it seems to be extremely hard to type julia> "\\\"" # the desired string, manually escaped
"\\\""
julia> raw"\"" # wrong string, no backslash
"\""
julia> raw"\\"" # does not parse
^C
julia> raw"\\\"" # wrong string, too many backslashes
"\\\\\""
|
The semantics of raw string literals are not general in the sense that there are strings which cannot be input that way. The string with contents |
I think we should fix this. If raw strings support any kind of escaping with |
Re-reading the thread, I see allowing lots of backslashes in strings is part of the goal of raw strings, so we shouldn't require escaping them. Option 1 seems good, but then it's impossible to write a string ending in a backslash. Option 4 is also pretty good: we could have no escaping at all in raw strings, and you can include quote characters using |
The real issue is picking a parsing strategy for custom string literals such that you can both implement normal string behavior and raw string behavior. |
I'm not sure that's possible. |
Then I think that supporting normal strings needs to take priority since most string literals are closer to normal than they are like raw strings. Not sure there's anything to be done here. |
I propose making only the sequence |
Another possibility: convert |
Coming full circle here, the current leading proposal is that |
Wouldn't that mean that you would need |
|
|
@vtjnash's proposal should be spelled out explicitly here if it wants to be considered. |
@vtjnash 's full proposal is that any number of backslashes followed by a quote is special --- if the number of backslashes is even, you get n/2 backslashes in the string followed by end-of-string. If the number of backslashes is odd, you get a quote character instead of end-of-string. |
Preceded by how many backslashes? |
|
That makes more sense – previous explanations made it seems like it would be
|
That's correct |
Here's a summary of the options: Current behavior
Drawback: cannot express strings containing Jeff’s proposal
Drawback: cannot express trailing backslash in string. Jameson’s proposal
Drawback: kinda weird. Examples:
Keno’s proposalPass string literal content through verbatim. Drawback: massively breaking; requires calling the helper function in every string macro. |
Yeah, I suppose Jameson's proposal is probably the way to go here. I can't think of another approach that has a better combination of passing things through as literally as possible and allowing any possible string to be expressed somehow. |
Can I add an argument for option 4 since the current solution seems unfortunatly too complex? We could allow the triple-quote syntax extended to n-quote where n>=3 like in GFM. i.e., the block literal string can starts with an arbitrary number of quotes as long as the closing fence is equally long as the opening one. Thus we can have no escaping at all in raw strings yet still able to express strings that have 3 or more quotes in a row. A good thing is that unlike Python or Javascript, we currently ignore the first line break in block string literals. Thus we can express strings that start with several quotes and do not let them fused into the opening quotes if we are going to allow arbitrary numbers of opening quotes: just add a line break in this edge case. With proper syntax highlights, it's easy to identify the content since they could have different colors with the opening and closing quotes. On the other hand, trying to figure out the actual RegExp from |
Way too late. |
In particular there is a confusion about what backslash actually does. It certainly escapes
"
, but as we see in the second example, it also escapes itself. However, in the third example it suddenly doesn't appear to escape itself anymore. In particular, it does not seem to be possible to obtain a string literal that parses to@raw_str "\\\""
at the moment.The options I see are
raw"\\""
parse to@raw_str "\\\""
(rather than an error)raw"\\\""
parse to@raw_str "\\\""
.raw"\""
parse to@raw_str "\\\""
andraw"\\\""
parse to@raw_str "\\\\\\\""
"
in custom string literals entirely.FWIW, I think I prefer option 3.
Whatever the decision with respect to parsing, I think we should adjust raw_str to have the invariant that whenever
print(r"<x>")
is a valid expression for some sequence of characters<x>
, the output of that expression is<x>
. That's currently not the case in the above example:This discrepancy was noted when raw_str was originally introduced and is documented, but it bothers me.
The text was updated successfully, but these errors were encountered: