You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When writing through io::Write we need to convert to bytes, thus losing encoding information. When writing to the Windows console, we recover this information by doing a UTF-8 check on the bytes. This should be redundant in many cases.
Requiring bytes also means that users are often required to pepper as_bytes() when writing strings, unless they use write_fmt. Though this is admittedly more of a minor annoyance then a serious issue.
Admittedly write_utf8 does have the issue of what to do when a partial write falls outside of a code point boundary. This could be addressed in an implementation defined manner or just by always using write_all semantics. The only difference between the two is the first option allows for short writes that happen to fall on a boundary.
Alternatives
Incomplete UTF-8 writes
pubtraitWrite{// [u8] buffer is assumed to be UTF-8.// However, it may start with a partial UTF-8 sequence if it completes a previously written incomplete sequence.// Otherwise it's an error.unsafefnwrite_utf8(&mutself,buf:&[u8]) -> io::Result<usize>;}
This doesn't fully solve the issue (still needs .as_bytes()!) but allows for the implementation to do whatever it likes under the assumption that the bytes really are str that's in the process of being written.
EDIT: Remove references to ascii.
The text was updated successfully, but these errors were encountered:
After thinking about this some more, I also opened rust-lang/rust#116871 for not erroring if given invalid Unicode (instead it's lossy). I think although the two issues are related, either would be useful whether or not the other is accepted.
This was discussed in the libs-api meeting. An important point that was raised is that, to be most useful the "is valid UTF-8" property would need to be preserved by intermediaries (e.g. buffer types). This means that all existing types (in std and the wider crate ecosystem) would need updating otherwise it'd be of very limited use. Tbh, I do agree that this is a strong argument against this proposal.
That aside, I can look at how this would affect performance as that would provide an argument for it. Though for the above reason I'm minded to close.
ChrisDenton
changed the title
Add write_utf8 and write_ascii to io::Write
Add write_utf8 to io::WriteOct 25, 2023
Closing this as per the above. I'm now convinced this is too much churn and complexity for implementers of the Write trait and when writing a large ish buffer (which people who care about perf will do) this doesn't really help as the time take to print dominates the performance.
Proposal
Problem statement
When writing through
io::Write
we need to convert to bytes, thus losing encoding information. When writing to the Windows console, we recover this information by doing a UTF-8 check on the bytes. This should be redundant in many cases.Requiring bytes also means that users are often required to pepper
as_bytes()
when writing strings, unless they usewrite_fmt
. Though this is admittedly more of a minor annoyance then a serious issue.Motivating examples or use cases
Solution sketch
Add
write_utf8
toio::Write
. So named to avoid conflict with anywrite_str
function that may be implemented on a type.Admittedly
write_utf8
does have the issue of what to do when a partial write falls outside of a code point boundary. This could be addressed in an implementation defined manner or just by always usingwrite_all
semantics. The only difference between the two is the first option allows for short writes that happen to fall on a boundary.Alternatives
Incomplete UTF-8 writes
This doesn't fully solve the issue (still needs
.as_bytes()
!) but allows for the implementation to do whatever it likes under the assumption that the bytes really arestr
that's in the process of being written.EDIT: Remove references to ascii.
The text was updated successfully, but these errors were encountered: