-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unified String Literals #3475
base: master
Are you sure you want to change the base?
Unified String Literals #3475
Conversation
I really like this approach. However, the format placeholder escaping seems like a substantial layering violation. I don't see an obvious implementation of this that doesn't turn a language concern (string lexing) into a library concern (format string placeholders). And the standard library is not the only thing in the ecosystem that handles placeholders. |
an idea for how string literals with format placeholders would work with proc-macros and
|
Adding a hash to escaping is a good idea! If in normal string we have format!("Hello, {}!", "world"); // => Hello, world!
// these are good brackets
format!(#"Hello, #{}#!"#, "world"); // => Hello, world!
// not these!
format!(#"Hello, #{}!"#, "world"); // => Hello, world! |
Looks like, string lexing already is also a library concern, given that That said, many macros build on top of edit… That is not to say there aren’t any problems. For example, the RFC is not clear about how/whether println!(concat!(#"output: #{}"#), 42); should work. And how about any of these: fn main() {
let x = 42;
println!(concat!(#"{x} #{x}"#, ""), x = x);
println!(concat!(#"{x} #{x"#, "}"), x = x);
println!(concat!(#"{x} #{"#, "x}"), x = x);
println!(concat!(#"{x} #"#, "{x}"), x = x);
println!(concat!(#"{x} "#, "#{x}"), x = x);
println!(concat!(#"{x}"#, " #{x}"), x = x);
println!(concat!(#"{x"#, "} #{x}"), x = x);
println!(concat!(#"{"#, "x} #{x}"), x = x);
println!(concat!(#""#, "{x} #{x}"), x = x);
} edit2 I’ll be re-reading @programmerjake's ideas on this point again, now that I’ve actually noticed the potential issue myself in the first place. |
@steffahn If following my proposed fn main() {
let x = 42;
println!(concat!(#"{x} #{x}"#, ""), x = x); // prints: {x} 42
println!(concat!(#"{x} #{x"#, "}"), x = x); // prints: {x} 42
println!(concat!(#"{x} #{"#, "x}"), x = x); // prints: {x} 42
println!(concat!(#"{x} #"#, "{x}"), x = x); // prints: {x} #42
println!(concat!(#"{x} "#, "#{x}"), x = x); // prints: {x} #42
println!(concat!(#"{x}"#, " #{x}"), x = x); // prints: {x} #42
println!(concat!(#"{x"#, "} #{x}"), x = x); // prints: {x} #42
println!(concat!(#"{"#, "x} #{x}"), x = x); // prints: {x} #42
println!(concat!(#""#, "{x} #{x}"), x = x); // prints: 42 #42
} |
Essentially, my plan was the following regarding format placeholders: Format placeholders are not a string lexing question at all. The concat question is interesting. I am inclined to define that When a macro or
When first introduced, this new syntax without the
Personally, I think it makes the most sense for |
Oh, I think exist a confusion between "formatted strings" (like |
@VitWW we can use the exact same macros because macros can see the original literal's syntax |
I think it would be cleaner in terms of layering for |
Probably easier to store the placeholder indices in metadata somewhere. But I'd still prefer to just not involve the lexer at all with placeholders, if possible. |
After the proposed new syntax additions, the RFC is also proposing that the current syntax for raw string literals would be removed in a future edition, and the discussed drawbacks are largely about that. Is that removal necessary or even desirable? As far as I can see, the new syntax does not add a perfect replacement for raw string literals: A literal that doesn't and cannot contain escape sequences within it. The proposed syntax only has literals where the possible escape sequences are made arbitrarily long. Pathological example: let current_raw = r"\######n";
let proposed_raw = #######"\######n"#######;
let escaped = "\\######n"; The problem with I believe the current It seems to me like the two, "guarded" and "raw" could just be considered independent features of string literals: the |
it seems to me that you could just write it: |
Another option is to use a prefix that makes the difference in quantity more obvious: let proposed_raw = ##########"\######n"##########; |
I don't really like this. As it is, it can be easy to use one too few |
Agreed, an escape sequence followed by an
That helps the specific case if the code was written with that in mind, but most likely many literals would only have as many guards as are necessary, and no more. It's not hard to imagine e.g. clippy would point that out, (goes to check), and it turn outs it does: needless_raw_string_hashes (There's some discussion on whether is should be a warning in PR#112373 adding an equivalent to the compiler, which was closed in favor of the clippy lint.) That also doesn't change the core issue that while you can make the escape sequence bigger to make it more obvious that there aren't any, you still need to scan the string contents for possible escapes. I can see the case for not adding For that reason it seems better to focus the proposal on allowing non-raw literals to use guarded escapes. I still believe that would unify the literals in a simple way: The prefixed |
@programmerjake The easiest way - to have "alternative" macros, like In this way format!("{}", x) == format2!("{}", "", x) == format2!("%{}", "%", x) == format2!("#{}", "#", x) So |
imho |
Another thing that could be done, instead of assert_eq!(
format!(#"{(#)}The natural numbers, denoted "N", are the set {#{}, #{}, ...}."#, 1, 2),
r#"The natural numbers, denoted "N", are the set {1, 2, ...}."#,
); This, too, could support custom prefixes, and would be independent of string literal syntax assert_eq!(
format!("{(%)}The natural numbers, denoted \"N\", are the set {%{}, %{}, ...}.", 1, 2),
r#"The natural numbers, denoted "N", are the set {1, 2, ...}."#,
); but it saves the need for a new alternative macro, and keeps compatibility with |
iirc all of the |
Testing the macro expansion of |
Clarify behavior of format placeholders Specify behavior of concat on guarded strings Further address the removal of raw strings Add alternatives for `concat` and `#\`
Updated the RFC to discuss the concerns brought up so far, including
That said, I'm not going to die on this hill. That could be left to a future RFC, but I'm going to leave it in the RFC unless there's heavy consensus otherwise. |
I think this RFC should say more about the proposed behaviour in different editions.
That suggests that the new syntax shouldn't be introduced until the 2024 edition. |
That's a good point. My initial thoughts are:
We could introduce a different prefix so this could be used on previous editions, but I don't think it's worth it. |
not sure how to express this without some form of parameterization on the number of prefix `#`s
Note that format_args looks at the processed string to find placeholders. It doesn't look for literal // This works fine.
let a = 1;
println!("\x7ba\x7d"); // This is just "{a}". This prints: 1 So if you really want to propose using |
I think we should keep format_args!() independent of how the strings are represented in source code. Otherwise you get inconsistency like this: let a = 1;
println!( "{a}" ); // prints: 1
println!( #"{a}"#); // prints: {a}
println!(r#"{a}"#); // prints: 1 |
(Tip: If you do end up changing the RFC to remove the format_args part, it might make sense to open a new PR to start a clean github thread, as this thread is pretty much entirely focussed on format_args.) |
What's magical about it? Macros have always worked at the syntax level, and even have to extract the contents of the string manually.
Is this supposed to be a difficult issue? Seems pretty simple to expose an API for it. Or they can just continue to look at the span like they already are.
Would keeping raw strings satisfy this desire? I don't see why we should add yet another syntax when we can use the guarding prefix for this. In my eyes,
If we introduce this new syntax without the changes to formatting macros, there's probably no way to achieve the same ergonomics. You'd either need a new set of macros, a special metadata placeholder, or wait for |
In the current proposal, all
I don't think that's inconsistent. I'll let everyone know right now that I am highly invested in keeping the formatting changes in this RFC, because my perception of the ergonomic benefits in case of literal If there was a way to split this up I would, but since the formatting behavior is tied to the guarding, it's imperative that they be introduced at the same time. |
So that would mean that |
Yes, just like |
Rust's literal syntax is already quite complicated[1] and it seems to me that this makes it even more complicated. [1] I'm currently trying to write some code that, for Reasons, needs to reimplement string quote matching. This is super hard right now (so many corner cases with suffixes and prefixes) and this proposal would make it harder. |
Think about it from the perspective of a rust developer who isn't super familiar with how proc macros work. Having the type of string you use impacting how you type placeholders is a little surprising.
It complicates the API and is another thing macro authors need to worry about. I wouldn't characterize it as a "difficult issue"z, but it is something to keep in mind.
Somewhat. But consider something like: eprintln!(#"Error: "#{}"\#n#{}"#, error.msg, error.stack_trace); If I use raw strings, I can't escape the newline, and the following line can't be indented. If I use an unguarded string, I have to escape the double quotes. If I use a guarded string, I have to put "#" in front of the placeholders. Perhaps this is a little contrived, but I don't think overly so.
Thinking about this some more, I think that my disagreement with this statement is the crux of my dislike for the format string changes proposed. I don't see the placeholder {} as a kind of escape sequence, but as something distinct and using the number of guard "#"s for two different purposes feels wrong to me. But that is just my personal opinion.
That's only true after this feature has stabilized. But yes, I agree. If this were stabilized without the changes to format strings, that would limit the options for that going forward. All that said, I don't absolutely love it, but I am not entirely opposed to the idea anymore. Finally, I'd like to suggest an alternative to the |
This can't be parsed. The #"Error: "#
{}
"\#n#{}"
# So it is basically impossible to put a formatting placeholder in double quotes, unless you escape them with |
Which is another great argument to require a leading backslash, making it an actual escape sequence. No ambiguity. |
That's why my and @steffahn sub-proposal comment of alternative custom and independent from type of strings formatting is much better alternative: format!("{(%)}The natural numbers, denoted \"N\", are the set {%{}, %{}, ...}.", 1, 2) |
How about something like this: println!( {#"e is "{{}}""#}, e) Where you can put braces around the format string to require additional braces for placeholders. |
Alright I've changed the placeholder syntax from Still not sure about the lexing notation but I figure this is ready for review. |
I agree that this approach by Swift seems nice, but...
So TL;DR
Both make some sense. I am somewhat surprising myself by leaning towards the latter. All that said, I think that if we're going to go diving in this area, it would feel better to me if we decided on some kind of set of changes all at once so that we can say "Rust has new string literals that are way better" (these could be distinct RFCs, though, as smaller, targeted RFCs generally feel better). |
If we go this route I would much prefer an alternative syntax like That being said, I would like there to be a solution for not having to double up |
I thought more about this. I was thinking that, if we really wanted to "dare to ask for more", at least from my perpsective, I would want all multiline strings to strip indentation by default (with some way to opt out). I realize this is more about the "code strings" RFC, but I'm commenting here because opting out of that sort of thing seems like it may be a remaining role for raw strings. (I would also want rustfmt to indent inside strings by default, as a result.) Obviously making this change would require an edition. |
That would also be my preferred solution. It just doesn't happen very often that you want I've been thinking about how formatting could be simplified. If f-strings are added, that would be an opportunity to change how formatting works. For example, If we want something like this, it indeed doesn't make sense to special-case escaping in However, I don't think f-strings should return a println(f"foo: {foo}"); |
@nikomatsakis thanks for taking a look.
The RFC did originally include removal of raw strings. Because it seemed a little controversial, and such as RFC could be independently done later, I left it for future possibilities (maybe I should strengthen the wording there). I would like these strings to replace raw strings altogether (thus "unified" in the title). In my view, the primary feature of raw strings is that they are copy-pasteable. As long as you have the right amount of wrapping
I think the best argument against this is that rb-strings and raw rc-strings are practically useless the moment you need to include non-utf8 bytes. Even with
Yes I agree. The composability of this syntax is one of its strongest features.
I don't think this is possible as an edition change, since it would change the semantics of string literals silently between editions. We allow someone to change the edition in |
It could be done for just the new types of multi-line strings (so not the existing raw strings). |
@tmccombs but that wouldn't even require an edition change |
No, we require running |
Just realized I wrote "automatically" when I meant "manually". Not sure if that changes your reply.
I was not aware of that. Where is this requirement documented? It was my impression that changing semantic like this isn't allowed (except in rare cases) because changing the edition manually is explicitly supported. |
We try not to churn code, but the only ironclad rule so far is that there is only one stdlib, so that stdlib must work with all editions, which largely prevents adding edition-dependent hacks in std. We could rewrite almost all of how Rust parses in the next edition, hypothetically. |
Pushed a minor update that strengthens some of the wording around removing raw strings and fixes some places I missed when changing where the guarding goes in placeholders. |
Rendered
This RFC proposes to unify the syntax of the existing string literal and raw string literal forms, supporting both the use of escape sequences and avoiding the need to escape backslashes and quotation marks. This proposal also uses the new syntax to improve format string ergonomics, reducing the need for double-brace escapes.