Description
"ab\ncd"
denotes a Unicode string U+0061 U+0062 U+000a U+0063 U+0064.
"ab
cd"
also denotes the same Unicode string.
On the other hand,
"ab\
cd"
and
"ab\
cd"
denotes a Unicode string U+0061 U+0062 U+0063 U+0064. The Rust lexer ignores an "escaped newline" optionally followed by a sequence of "whitespace" characters. (Update: the following complain about the lack of Rust's definition of "whitespace" was incorrect and I retract it. defined by the below function in libsyntax/parser/lexer/mod.rs
)
pub fn is_whitespace(c: Option<char>) -> bool {
match c.unwrap_or('\x00') { // None can be null for now... it's not whitespace
' ' | '\n' | '\t' | '\r' => true,
_ => false
}
}
This predicate doesn't follow the traditional definition of "space" (by the C language) or Unicode's definition of "whitespace". So if we use Unicode ideographic space (colloquially known by Japanese as "full-width space"), the space-munchinig logic doesn't work. For example
"ab\
cd"
denotes a Unicode string U+0061 U+0062 U+3000 U+0063 U+0064. Of course, such a decision is totally up to language designers, but it is desirable to give a clear explanation about it.
As for a character literal, it's interesting that the lexer rejects some kinds of "space" characters:
'\t' | '\n' | '\r' | '\'' if delim == '\'' => {
let last_pos = self.last_pos;
self.err_span_char(
start, last_pos,
if ascii_only { "byte constant must be escaped" }
else { "character constant must be escaped" },
first_source_char);
return false;
}
For example, this Rust code is rejected:
println!("{}", '
');