-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor low-level UTF-16 decoding and move Borrow(Mut) to libcore. #27808
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @pnkfelix (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. The way Github handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
} | ||
let c = (((n1 - 0xD800) as u32) << 10 | | ||
(n2 - 0xDC00) as u32) + 0x1_0000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we’re only dealing with a single pair of u16
s here, I felt this was simpler than creating a one-shot iterator. But that’s still possible.
I’d like re-exporting |
☔ The latest upstream changes (presumably #27684) made this pull request unmergeable. Please resolve the merge conflicts. |
32e7928
to
f404fbe
Compare
Should be |
f404fbe
to
a268c3f
Compare
@bluss fixed |
Thanks for the PR @SimonSapin! I like the reduction in API surface area here, and I think that this can actually go even farther and remove the lossy decoder (as it's just a simple Could you leave behind the old support, but mark it all as |
a268c3f
to
bde706f
Compare
I’m not so sure about that last change. What do you think? |
@SimonSapin sorry, I was on PTO for much of the time this PR was open. I'll skim over it but I'm not certain I'm the best person to review. (I assume the Travis failure is due to a timeout, though I was under a rough impression that timeouts were not supposed to end up being marked as fatal checks on the PRs...) |
bde706f
to
d27e40a
Compare
No worries @pnkfelix, @alexcrichton took over. After a chat with @bluss on IRC I undid the move to |
Nice. I think borrow should move to libcore anyway, since it is becoming a commonly used trait, not just for collections. I think the docs are clear enough for Borrow, to make its important guarantees clear. |
We also found out free functions make sense to construct iterators because the focus is not at all on the struct carrying the iteration, that's almost an implementation detail. And it's consistent with libstd's precedents (std::ascii::escape_default) |
(Before moving just |
I think it's fine to move the |
I'm also ok with a free function for now to construct this iterator instead of |
I don’t care so much about generalizing If you feel this is too much magic I can reverse to just |
I definitely agree that |
Alright, I’ll revert the use of |
I'm fine either way on that, it'll probably happen eventually regardless. |
d27e40a
to
d9011d2
Compare
☔ The latest upstream changes (presumably #27871) made this pull request unmergeable. Please resolve the merge conflicts. |
a1a936a
to
2b70890
Compare
Rebased. |
* Rename `utf16_items` to `decode_utf16`. "Items" is meaningless. * Move it to `rustc_unicode::char`, exposed in `std::char`. * Generalize it to any `u16` iterable, not just `&[u16]`. * Make it yield `Result` instead of a custom `Utf16Item` enum that was isomorphic to `Result`. This enable using the `FromIterator for Result` impl. * Add a `REPLACEMENT_CHARACTER` constant. * Document how `result.unwrap_or(REPLACEMENT_CHARACTER)` replaces `Utf16Item::to_char_lossy`.
2b70890
to
6174b8d
Compare
LGTM! |
We talked about this at the libs meeting today and decided to move forward, thanks @SimonSapin! |
* Rename `Utf16Items` to `Utf16Decoder`. "Items" is meaningless. * Generalize it to any `u16` iterator, not just `[u16].iter()` * Make it yield `Result` instead of a custom `Utf16Item` enum that was isomorphic to `Result`. This enable using the `FromIterator for Result` impl. * Replace `Utf16Item::to_char_lossy` with a `Utf16Decoder::lossy` iterator adaptor. This is a [breaking change], but only for users of the unstable `rustc_unicode` crate. I’d like this functionality to be stabilized and re-exported in `std` eventually, as the "low-level equivalent" of `String::from_utf16` and `String::from_utf16_lossy` like #27784 is the low-level equivalent of #27714. CC @aturon, @alexcrichton
Utf16Items
toUtf16Decoder
. "Items" is meaningless.u16
iterator, not just[u16].iter()
Result
instead of a customUtf16Item
enum that was isomorphic toResult
. This enable using theFromIterator for Result
impl.Utf16Item::to_char_lossy
with aUtf16Decoder::lossy
iterator adaptor.This is a [breaking change], but only for users of the unstable
rustc_unicode
crate.I’d like this functionality to be stabilized and re-exported in
std
eventually, as the "low-level equivalent" ofString::from_utf16
andString::from_utf16_lossy
like #27784 is the low-level equivalent of #27714.CC @aturon, @alexcrichton