-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request: standard library function/type/whatever to parse utf-8 from an iterator #90643
Comments
Do you mean |
I mean the first one, and also you can't presume that the person has the full slice available, which is the exact situation I found myself in. |
We had |
@rustbot label +T-libs-api +C-feature-request |
I read through the linked issues but unless I missed it, it seems the incongruity of the API wasn't discussed. I'm not sure why AFAICT, the closest version of pub fn char_decode_from_utf8(bytes: &[u8]) -> Option<char> {
let decoded = std::str::from_utf8(bytes).ok()?;
let mut chars = decoded.chars();
let result = chars.next()?;
match chars.next() {
None => Some(result),
Some(_) => None, // `bytes` contains more than 1 codepoint!
}
} The last bit about ensuring the byte slice decodes to only one char and no more is an important part that a first attempt might overlook - maybe it is worth including it for that reason alone. (Ensuring input is <= 4 bytes as well before calling But really this code is much too bloated for what it does, and you'd be relying on the compiler to both first inline the UTF-8 decoding routine then remove the duplicate checks to get reasonable output out of this. It would be much better if the std library exposed the internal UTF-8 decoder directly behind this API. |
There's
char::decode_utf16
, and there's an iterator adapter to turn a byte iterator over utf-8 into a char iterator that's used bystr::chars
, but there's no thing likedecode_utf8
where an iterator over bytes is decoded as things go.The text was updated successfully, but these errors were encountered: