-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
overlong utf-8 sequences should be treated as invalid utf-8 #3787
Comments
Changes made in eclipse or eclipse-like ide saves documents in current os charset format. This causes above error. |
There are some other invalid UTF-8 sequences that the |
Fix is_utf8 and UTF-8 char width functions to deny non-canonical 'overlong encodings' in UTF-8. We address the function is_utf8 to make it more strict and correct, but no changes are made to the handling of invalid UTF-8. Fixes issue #3787
overlong encodings are fixed, surrogate characters remain. Anything else? |
This is fixed, and surrogate characters are issue #8319. |
The current utf-8 implementation contains some assertions for the validity of utf-8 bytes. Specifically, passing a sequence such as "\x80\xae" to a string function will throw an
Assertion is_utf8(v) failed
.However, overlong encodings are accepted without any such error. So a sequence as "\xC0\xAE" (an overlong encoding for \x2E, a dot) will be accepted, and appear in the final rust-string.
This raises some security concerns as described in RFC3629 Section 10:
https://tools.ietf.org/html/rfc3629#section-10
Short example: when a program allows a user to access files, but wants to restrict access to "../", it must not be possible to circumvent this check by using an overlong encoding of a dot, and the author of the program shouldn't have to rely on the OS to perform any such check either.
The text was updated successfully, but these errors were encountered: