Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't allow "UTF-16 surrogate codepoints" in char or str #8319

Closed
jruderman opened this issue Aug 5, 2013 · 4 comments
Closed

Don't allow "UTF-16 surrogate codepoints" in char or str #8319

jruderman opened this issue Aug 5, 2013 · 4 comments

Comments

@jruderman
Copy link
Contributor

I think

> "\uD800"
"\ud800"

should be disallowed at compile time and

> 0xD800 as char
'\ud800'

should assert. Or even better, the incomplete int as char could be removed.

@jruderman
Copy link
Contributor Author

At least it's consistent now:

> let mut a = ~"hi"; a.push_char(0xD800 as char); a
~"hi\ud800"

@jruderman
Copy link
Contributor Author

Because UTF-8 encodings of these surrogates are not considered "valid UTF-8" https://en.wikipedia.org/wiki/UTF-8#Invalid_code_points

Because OS APIs often do weird/unsafe things if you give them mismatched surrogates, or surrogates where they aren't expecting surrogates.

@thestinger
Copy link
Contributor

Fixed by 62a3434 (char) and b153219 (str). Please open another issue if there are still any holes (the string index-assign one is reported).

@jruderman
Copy link
Contributor Author

The cases in this bug report are both fixed.

> \uD800
error: illegal numeric character escape

and

> (0xD800 as char)
error: only `u8` can be cast as `char`, not `<VI0>`

Thanks, @thestinger!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants