Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: A way to collect a String from an impl Iterator<item=u8> #65538

Closed
Lokathor opened this issue Oct 18, 2019 · 8 comments
Closed

Request: A way to collect a String from an impl Iterator<item=u8> #65538

Lokathor opened this issue Oct 18, 2019 · 8 comments

Comments

@Lokathor
Copy link
Contributor

As much as rust seems to love iterators and also utf8, there doesn't seem to be a way to collect a String from an iterator of bytes. Everything forces you to collect into a Vec<u8> and then re-parse that into a String.

There is a FromIterator<char>, but there also doesn't appear to be a way to get a char from 1 to 4 bytes, or any iterator adapter that will turn a stream of utf8 bytes into a stream of char values.

@Mark-Simulacrum
Copy link
Member

I'm also not sure if this is possible - you would need to implement FromIterator for Result<String, Utf8Error> or so, right? That seems like it would overlap with the general impl for Result.

@passcod
Copy link
Contributor

passcod commented Oct 18, 2019

Adding some context (from community discord), this is about avoiding the extra allocation when going from bytes to String, not necessarily going through FromIterator. (Though that would be nice.)

@Lokathor
Copy link
Contributor Author

It doesn't need to be usable with collect. It can be a method on the String type.

let s: String= String::from_lossy_bytes_iterator(it);

let s: Result<String,???> = String::from_bytes_iterator(it2);

That error case in the non-lossy version is going to be annoying to design. You could probably work it out to a normal UTF8Error by converting the String that did parse back into a Vec and then extending it with the remains of the iterator.

@Mark-Simulacrum
Copy link
Member

I am confused. What extra allocation do you want to avoid? The case of non-utf 8 data?

@Lokathor
Copy link
Contributor Author

Supposing that you currently have a value with type impl Iterator<Item=u8>, convert it to String, safely.

As far as I can tell, there's no way to do this in core/alloc/std without first collecting into Vec<u8> (allocation 1) and then separately trying to convert that into a String (allocation 2).

@Lokathor
Copy link
Contributor Author

Lokathor commented Oct 18, 2019

There doesn't even appear to be char::from_utf8_bytes(&[u8]) -> Result<char, Error> or similar so that you can decode bytes into the individual characters, which you could then build into an iterator adapter to use with the FromIterator<char> impl. (EDIT: that's probably gonna be pretty slow though since you encode the chars and then decode the chars right away)

@scottmcm
Copy link
Member

and then separately trying to convert that into a String (allocation 2)

With https://doc.rust-lang.org/std/string/struct.String.html#method.from_utf8 that's not a separate allocation, though?

@Lokathor
Copy link
Contributor Author

It is in the lossy case.

but closed in favor of #64727 since a solution there would solve here too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants