io::IoError should carry info on the invalid byte sequence on non-utf8 InvalidInput

If you feed in a byte stream that is almost utf-8 but has errors, a looped series of calls to  `fn read_char` will eventually return an `IoError` with `kind == InvalidResult`.

Unfortunately, the returned `IoError` does not include any information about what the bytes were that were invalid (nor does it include information like how many bytes were read from the input before the error was encountered).

It seems like it would not be that bad to change `IoError` so that its `detail` field could be an `Option<Either<~str, ~[u8]>>`, or something along those lines, so that in this scenario, the `InvalidResult` would imply that one could look at the `detail` field to determine what the byte sequence was that caused the problem (and then the client code would have the option of substituting in a different character sequence specific to the byte sequence that failed).

(Alternatively, we could change `IoErrorKind` so that the `InvalidResult` variant carried an `Option<~[u8]>`, but then the `IoErrorKind` would no longer be a C-like enum.)

I believe that this is strictly more expressive than just mapping every replacement to a single replacement character, as is done by `from_utf8_lossy` (#12062).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

io::IoError should carry info on the invalid byte sequence on non-utf8 InvalidInput #12113

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

io::IoError should carry info on the invalid byte sequence on non-utf8 InvalidInput #12113

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions