-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read(io, Char) doesn't match collect(string) for malformed UTF-8 #50532
Comments
stevengj
added
io
Involving the I/O subsystem: libuv, read, write, etc.
strings
"Strings!"
labels
Jul 12, 2023
This definitely seems wrong—we should be consistent about what we consider to be a character, even for invalid data. I can dig into why there's a discrepancy... |
Yeah, ok, |
StefanKarpinski
added a commit
that referenced
this issue
Jul 14, 2023
StefanKarpinski
added a commit
that referenced
this issue
Jul 17, 2023
Fixes #50532. The `read(io, Char)` method didn't correctly handle the case where the lead byte starts with too many leading ones; this fix makes it handle that case correctly, which makes `read(io, Char)` match `collect(s)` in its interpretation of what a character is in all invalid cases. Also fix and test `read(::File, Char)` which has the same bug.
KristofferC
pushed a commit
that referenced
this issue
Jul 17, 2023
Fixes #50532. The `read(io, Char)` method didn't correctly handle the case where the lead byte starts with too many leading ones; this fix makes it handle that case correctly, which makes `read(io, Char)` match `collect(s)` in its interpretation of what a character is in all invalid cases. Also fix and test `read(::File, Char)` which has the same bug. (cherry picked from commit ffe1a07)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The following mismatch seems undesirable to me: the same data
"\xfc\xa8"
is treated as 2 (malformed) characters forcollect
but as only 1 character forread
:cc @StefanKarpinski, the guru of malformed
Char
, who wrote thisread
code and the string iteration in #24999 — is this intentional?The text was updated successfully, but these errors were encountered: