-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make overflow behaviour more obvious in the iterator module of libcore
#23894
Conversation
r? @huonw (rust_highfive has picked a reviewer for you, use r? to override) |
I am a tiny bit nervous about adding the overhead of a required check+panic (i.e. that shows up in release builds) to the |
If the enumerator checking really does land, please don't use a plain unwrap without error message. |
@pnkfelix It seems that the checked addition has quite a performance penalty, the number of iterations roughly halves (if you do nothing else, of course):
on an empty, 256-element and 65536-element array, respectively. Gist. @bluss Dunno what to do about this |
@tbu- Regarding alternatives to But in any case it sounds like, given the performance hit, we may want this to be tied more closely into |
☔ The latest upstream changes (presumably #24045) made this pull request unmergeable. Please resolve the merge conflicts. |
@pnkfelix I only use About the debug assertions, maybe it would suffice to say that going beyond |
r? @pnkfelix (transferring reviewership, don't have the bandwidth right now.) |
@tbu- yes; I suspect it would be better to not guarantee panic on overflow when debug-assertions are disabled. |
3fd91c4
to
db2a86b
Compare
@pnkfelix @alexcrichton I updated it to only mention the possible overflow in the |
/// # Undefined overflow | ||
/// | ||
/// The method does no guarding against overflows, so enumerating more than | ||
/// `usize::MAX` elements is undefined. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "undefined" may be a bit strong here, isn't this defined to panic in debug builds and wrap in optimized builds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we call it undefined overflow and don't guarantee anything about it other than it doesn't create unsafe
things. It might panic, might give out seemingly-random numbers etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RFC and I believe the current implementation both define overflow in the case of optimized builds to be the normal 2's complement overflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I second @alexcrichton 's note here. No reason to be putting these sorts of scary terms here, IMHO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm representing the RFC correctly here:
Change the semantics of the built-in fixed-size integer types from being defined as wrapping around on overflow to it being considered a program error (but not undefined behavior in the C sense). Implementations are permitted to check for overflow at any time (statically or dynamically). Implementations are required to at least check dynamically when debug_assert! assertions are enabled. Add a WrappingOps trait to the standard library with operations defined as wrapping on overflow for the limited number of cases where this is the desired semantics, such as hash functions.
This is the currently taken (conservative) path, and if we'd like to guarantee more, then that shouldn't be done by not mentioning it in the docs, but rather by filing a new RFC, I'd say.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Also, my reading of the part I quoted is that the result of executing a +
is either the same as the wrapping result, or a panic.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@huonw I can't find the intention for overflow to be wrapping in the RFC thread (other than as a consequence of how rustc
implements it currently). We can e.g. see this user suggesting what you said, but nobody telling that this is indeed the RFC's text (so it seems it isn't the intention):
I like this proposal, but I don't like the concept of "unspecified behavior" that isn't really unspecified. To avoid confusion, I'd prefer to say that if the overflow is not diagnosed, the value is guaranteed to wrap.
Alternatively, you could write that it truly is unspecified, and optimize based on that, but I think that's non-ideal.
Now, reading between the lines of the following, it seems that the following assumes that rustc
's implementation will be wrap-on-overflow, but that this isn't meant to be guaranteed.
https://internals.rust-lang.org/t/a-tale-of-twos-complement/1062
Aren't you concerned about de facto lock-in for overflow?
There is certainly a concern that we will not be able to make overflow stricter without breaking code in the wild. We believe that having overflow checking be mandatory in debug builds should at least mitigate that risk, as intentional uses of overflow should be detected long before the code ever comes into widespread use (and redirected to the suitable wrapper type).
The RFC as it stands right now doesn't sound like "wrap or panic" semantics, but rather than "panic or unspecified value" which is a superset of the former, so we don't lock ourselves regarding backward compatiblity if we don't guarantee this right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was intention, e.g.: rust-lang/rfcs#560 (comment) and rust-lang/rfcs#560 (comment) .
It was even in the text in rust-lang/rfcs@6dbbc70#diff-3ade9a15caf1d747ea87d08e3021dc33R126 which was edited rust-lang/rfcs@865eae6#diff-3ade9a15caf1d747ea87d08e3021dc33R129 and then removed in a seemingly unrelated commit rust-lang/rfcs@9f2c7f0 . I'm fairly sure the loss of clarity in the non-as
case in the last commit was an accident.
https://internals.rust-lang.org/t/a-tale-of-twos-complement/1062
NB. that was posted a few weeks before the RFC, and the design was refined/changed in the RFC: the RFC is canonical source.
That said, I'm in favour of writing this documentation as "behaviour for too-long iterators is unspecified", but not in favour of motivating that by a pedantic reading of the RFC. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@huonw Thanks for the pointers, apparantly I was wrong.
So, now for the wording, what about putting "behaviour for iterators with more than usize::MAX
elements is unspecified" into # Undefined overflow
? What do you suggest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest an # Overflow Behavior
section which explicitly spells out that there is no guarding against overflow and the overflow semantics will depend on whether debug assertions are enabled.
Sigh; if it wasn't already clear from my comments above, I'm pretty torn about this PR. On the one hand I'm not a fan of appealing to arguments that certain cases are "never expected to trip" -- that's the whole problem with edge cases: You don't expect them, and then they get exploited. But on the other hand, we probably need to employ a consistent philosophy for all of the iterator methods. And considering how widely used these methods are, several in program hot spots, maybe the right thing is indeed to just rely on (I.e. would it make sense for a library on crates.io to offer checked variants of all these methods?) @tbu- sorry its taking so long to decide about this. |
I think that offering a crates.io library for checked variants is unrealistic. I don't think people will go an extra mile for safety, and in my opinion it should be exactly the other way around. If we do need the unchecked operations, those should be provided on crates.io, because then the users would make an explicit statement by using those. (For a very far-fetched analogy, see how IE had enabled DNT by default, but that completely destroys the meaning of the flag. Similarly you shouldn't opt-in for safety, but rather for unsafety.) Making |
But the reality is that the vast majority of iterator traversals will be over Or ... hmm. I wonder if we can use a default type param (or associated type?) to effectively dispatch to the appropriate (checked or unchecked) increment routine in each case. The default would be checked, but implementations that never yield more than It's late and I'm just musing ... Update: here's a branch with some ideas I've been playing with: https://github.com/pnkfelix/rust/tree/fsk-sized-bounded-iter |
Or here's a far less baroque variant idea: add a new Update: maybe not such a hot idea; chained iterator adapters would probably need to conservatively assume |
I'm starting to like the |
@tbu- so the team discussed this at this week's Tuesday meeting. The consensus was that we really don't want to change any of the code here; you should just let the arithmetic overflow happen (or not) based on how the addition operator itself gets compiled under the settings passed to the compiler. Even adding special case paths into iterator code that are guarded by If you would like to revise this PR to just focus on documentation changes, then that would be great. (The idea I think is that you would not guarantee a panic in the docs, but rather just say that panic can occur, in accordance with how arithmetic overflow is handled in general.) |
@tbu- do you plan to continue working on this PR? I would be happy to take it over for you (i.e. incorporating the revisions noted above and landing it from my own repository). |
Unless I misunderstand something, it's currently like this: debug assertions are removed when libstd is compiled & distributed in rust releases. Arithmetic checks are not removed, and will stay and are determined by the compiler settings of the user code that links to and inlines parts of libstd. This is great, because it means that user's debug builds will have overflow checking in enumerate if I understand correctly, while it will compile to non-checked code in release builds. Itertools has .enumerate_from which makes it easier to test with an enumeration type of smaller size (i8). It behaves like this, client code will be overflow checked in debug mode and not in release mode. It already seems optimal to me! This is how arithmetic checking is handled in general in Rust now. Your tests should cover it. If they don't, downstream users's tests will also build in debug mode and will give it another chance to cover it. |
@pnkfelix I plan to finish this PR by tomorrow. |
@pnkfelix Updated it. |
@tbu- I think @alexcrichton 's suggestion of an "# Overflow Behavior" section is a good one. |
@pnkfelix Updated it. |
@tbu- okay looks great. r=me after a squash. |
Explicitely spell out behaviour on overflow for `usize`-returning iterator functions. Mention that panics are guaranteed if debug assertions are active, otherwise a wrong result might be returned.
9520929
to
29d7fed
Compare
@pnkfelix |
Added documentation of the panicking overflow to the `count` and `position` methods of `IteratorExt`, also make them more obvious in the code. Also mention that the `Iterator::next` method of the struct returned by `IteratorExt::enumerate` can panic due to overflow.
Added documentation of the panicking overflow to the
count
andposition
methods of
IteratorExt
, also make them more obvious in the code.Also mention that the
Iterator::next
method of the struct returned byIteratorExt::enumerate
can panic due to overflow.