Skip to content

four-byte uncode characters confuse ' #28851

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
steveklabnik opened this issue Oct 5, 2015 · 7 comments
Closed

four-byte uncode characters confuse ' #28851

steveklabnik opened this issue Oct 5, 2015 · 7 comments
Labels
A-diagnostics Area: Messages for errors, warnings, and lints

Comments

@steveklabnik
Copy link
Member

This Rust program:

fn main() {
    let len = 'ஶ்ரீ'.len_utf8();
}

contains TAMIL SYLLABLE SHRII (śrī), aka U+0BB6 U+0BCD U+0BB0 U+0BC0. When trying to compile this program, I get this error:

2:20: 2:22 error: unterminated character constant: '.
2     let len = 'ஶ்ரீ'.len_utf8();

I know that it isn't a copy-paste issue, because I used vim's C-V u to type in the four code points manually.

@steveklabnik steveklabnik added the A-parser Area: The lexing & parsing of Rust source code to an AST label Oct 5, 2015
@arielb1
Copy link
Contributor

arielb1 commented Oct 5, 2015

Aren't char-s Unicode scalar values?

@steveklabnik
Copy link
Member Author

@arielb1 yes. am I doing something wrong here? I am bad at encodings, so this is likely.

@arielb1
Copy link
Contributor

arielb1 commented Oct 5, 2015

@steveklabnik

Unicode scalar value != character.

@steveklabnik
Copy link
Member Author

Yes, but it's four bytes, no?

@steveklabnik
Copy link
Member Author

Ahhh this isn't actually four bytes. Sigh. Thanks.

@steveklabnik
Copy link
Member Author

(basically, I thought that those four things were four bytes, but they're four codepoints themselves)

@steveklabnik steveklabnik reopened this Oct 5, 2015
@steveklabnik steveklabnik added A-diagnostics Area: Messages for errors, warnings, and lints and removed A-parser Area: The lexing & parsing of Rust source code to an AST labels Oct 5, 2015
@steveklabnik
Copy link
Member Author

Actually, I am re-opening, because this diagnostic message is really bad. It should say that you're putting something that's larger than a single USV into a char literal.

steveklabnik added a commit to steveklabnik/rust that referenced this issue Nov 5, 2015
If you try to put something that's bigger than a char into a char
literal, you get an error:

    fn main() {
        let c = 'ஶ்ரீ';
    }

    error: unterminated character constant:

This is a very compiler-centric message. Yes, it's technically
'unterminated', but that's not what you, the user did wrong.

Instead, this commit changes it to

    error: character literal may only contain one codepoint

As this actually tells you what went wrong.

Fixes rust-lang#28851
bors added a commit that referenced this issue Nov 5, 2015
If you try to put something that's bigger than a char into a char
literal, you get an error:

    fn main() {
        let c = 'ஶ்ரீ';
    }

    error: unterminated character constant:

This is a very compiler-centric message. Yes, it's technically
'unterminated', but that's not what you, the user did wrong.

Instead, this commit changes it to

    error: character literal that's larger than a char:

As this actually tells you what went wrong.

Fixes #28851
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-diagnostics Area: Messages for errors, warnings, and lints
Projects
None yet
Development

No branches or pull requests

2 participants