Incorrect size_hint()
on EncodeUtf16
#113897
Labels
A-iterators
Area: Iterators
C-bug
Category: This is a bug.
T-libs
Relevant to the library team, which will review and decide on the PR/issue.
I tried this code:
I expected to see this happen:
Instead, this happened:
Meta
rustc --version --verbose
:The reason is that the
EncodeUtf16
iterator calculates its size hint in terms of the containedChars
iterator size hint, assuming that each character can correspond to either 1 or 2 code units.In the case that the iterator is NOT in the middle of a surrogate pair, this leads to too-low lower bounds and too high upper-bounds.
In the case that the iterator IS in the middle of a surrogate pair, the remaining code unit is not taken into account as the iterator has advanced past this point.
The actual calculation should be done in terms of the remaining bytes:
(bytes_remaining + 2) / 3
bytes_remaining
.In the case of the iterator being positioned in the middle of a surrogate pair, both these values should be increased by 1.
The text was updated successfully, but these errors were encountered: