Skip to content

The language reference doesn't explain anything about string literals containing newlines #19399

Closed
@nodakai

Description

@nodakai
"ab\ncd"

denotes a Unicode string U+0061 U+0062 U+000a U+0063 U+0064.

"ab
cd"

also denotes the same Unicode string.

On the other hand,

"ab\
cd"

and

"ab\
    cd"

denotes a Unicode string U+0061 U+0062 U+0063 U+0064. The Rust lexer ignores an "escaped newline" optionally followed by a sequence of "whitespace" characters. (Update: the following complain about the lack of Rust's definition of "whitespace" was incorrect and I retract it. defined by the below function in libsyntax/parser/lexer/mod.rs)

pub fn is_whitespace(c: Option<char>) -> bool {
    match c.unwrap_or('\x00') { // None can be null for now... it's not whitespace
        ' ' | '\n' | '\t' | '\r' => true,
        _ => false
    }
}

This predicate doesn't follow the traditional definition of "space" (by the C language) or Unicode's definition of "whitespace". So if we use Unicode ideographic space (colloquially known by Japanese as "full-width space"), the space-munchinig logic doesn't work. For example

"ab\
 cd"

denotes a Unicode string U+0061 U+0062 U+3000 U+0063 U+0064. Of course, such a decision is totally up to language designers, but it is desirable to give a clear explanation about it.

As for a character literal, it's interesting that the lexer rejects some kinds of "space" characters:

            '\t' | '\n' | '\r' | '\'' if delim == '\'' => {
                let last_pos = self.last_pos;
                self.err_span_char(
                    start, last_pos,
                    if ascii_only { "byte constant must be escaped" }
                    else { "character constant must be escaped" },
                    first_source_char);
                return false;
            }

For example, this Rust code is rejected:

println!("{}", '
');

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions