-
Notifications
You must be signed in to change notification settings - Fork 13.3k
The language reference doesn't explain anything about string literals containing newlines #19399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Sorry, I overlooked that Section 3.4 Whitespace already explained the peculiarity of Rust's definition of "whitespace". Still, I believe most of the above comment is still valid. Especially, as for the rejection of some of character literals, it disagrees with the EBNF definition of the syntax:
I don't think we should try to be too formal with something like EBNF from the beginning. Illustrative examples will be much more useful for day-to-day use. But the language reference must at least refer to all the features of Rust in some ways. |
We have a real grammar now, so I'm considering this closed. Thanks! |
@steveklabnik I assume you had |
I thought the grammar was actually tested, so it's accurate, which was the primary complaint, right? |
denotes a Unicode string U+0061 U+0062 U+000a U+0063 U+0064.
also denotes the same Unicode string.
On the other hand,
and
denotes a Unicode string U+0061 U+0062 U+0063 U+0064. The Rust lexer ignores an "escaped newline" optionally followed by a sequence of "whitespace" characters. (Update: the following complain about the lack of Rust's definition of "whitespace" was incorrect and I retract it.
defined by the below function inlibsyntax/parser/lexer/mod.rs
)This predicate doesn't follow the traditional definition of "space" (by the C language) or Unicode's definition of "whitespace". So if we use Unicode ideographic space (colloquially known by Japanese as "full-width space"), the space-munchinig logic doesn't work. For exampledenotes a Unicode string U+0061 U+0062 U+3000 U+0063 U+0064. Of course, such a decision is totally up to language designers, but it is desirable to give a clear explanation about it.As for a character literal, it's interesting that the lexer rejects some kinds of "space" characters:
For example, this Rust code is rejected:
The text was updated successfully, but these errors were encountered: