Make overflow behaviour more obvious in the iterator module of `libcore` #23894

tbu- · 2015-03-31T15:35:20Z

Added documentation of the panicking overflow to the count and position
methods of IteratorExt, also make them more obvious in the code.

Also mention that the Iterator::next method of the struct returned by
IteratorExt::enumerate can panic due to overflow.

rust-highfive · 2015-03-31T15:35:36Z

r? @huonw

(rust_highfive has picked a reviewer for you, use r? to override)

pnkfelix · 2015-03-31T16:08:15Z

I am a tiny bit nervous about adding the overhead of a required check+panic (i.e. that shows up in release builds) to the enumerate method.

alexcrichton · 2015-03-31T17:10:43Z

I agree with @pnkfelix as well. @tbu- could you run a few benchmarks and post them here to see the impact?

bluss · 2015-04-03T09:44:40Z

If the enumerator checking really does land, please don't use a plain unwrap without error message.

tbu- · 2015-04-03T22:11:57Z

@pnkfelix It seems that the checked addition has quite a performance penalty, the number of iterations roughly halves (if you do nothing else, of course):

running 6 tests
test empty_dc ... bench:         2 ns/iter (+/- 0)
test empty_nc ... bench:         2 ns/iter (+/- 0)
test first_dc ... bench:       872 ns/iter (+/- 1)
test first_nc ... bench:       355 ns/iter (+/- 0)
test large_dc ... bench:    219554 ns/iter (+/- 774)
test large_nc ... bench:     87670 ns/iter (+/- 155)

test result: ok. 0 passed; 0 failed; 0 ignored; 6 measured

on an empty, 256-element and 65536-element array, respectively. Gist.

@bluss Dunno what to do about this unwrap. I could potentially replace it with an unreachable!, because it really ought not to happen.

pnkfelix · 2015-04-03T23:03:47Z

@tbu- Regarding alternatives to unwrap, i usually use .expect rather than .unwrap in situations like this... (I'm not really sure why you think this "ought not to happen" -- it seems like it might very well happen in scenarios where one is iterating over something other than elements of a container).

But in any case it sounds like, given the performance hit, we may want this to be tied more closely into debug_assertions ?

bors · 2015-04-04T21:21:24Z

☔ The latest upstream changes (presumably #24045) made this pull request unmergeable. Please resolve the merge conflicts.

tbu- · 2015-04-04T22:59:44Z

@pnkfelix I only use unwrap where the iterator is guaranteed to be an ExactSizeIterator, thus having an amount of elements that fits into a usize (see also the comment above those unwraps).

About the debug assertions, maybe it would suffice to say that going beyond usize::MAX with an enumerate iterator is Rust-integer-undefined behaviour.

huonw · 2015-04-06T04:53:30Z

r? @pnkfelix (transferring reviewership, don't have the bandwidth right now.)

pnkfelix · 2015-04-06T10:50:26Z

@tbu- yes; I suspect it would be better to not guarantee panic on overflow when debug-assertions are disabled.

alexcrichton · 2015-04-06T16:40:08Z

@tbu- thanks for collecting those numbers! These core iterator types are pretty performance critical, so I agree with @pnkfelix that we probably don't want to guarantee a panic-on-overflow here and let debug-assertions take care of it.

tbu- · 2015-04-14T08:24:25Z

@pnkfelix @alexcrichton I updated it to only mention the possible overflow in the Enumerate iterator. Is that fine?

alexcrichton · 2015-04-14T16:27:47Z

src/libcore/iter.rs

+    /// # Undefined overflow
+    ///
+    /// The method does no guarding against overflows, so enumerating more than
+    /// `usize::MAX` elements is undefined.


I think "undefined" may be a bit strong here, isn't this defined to panic in debug builds and wrap in optimized builds?

I believe we call it undefined overflow and don't guarantee anything about it other than it doesn't create unsafe things. It might panic, might give out seemingly-random numbers etc.

The RFC and I believe the current implementation both define overflow in the case of optimized builds to be the normal 2's complement overflow.

I second @alexcrichton 's note here. No reason to be putting these sorts of scary terms here, IMHO

I think I'm representing the RFC correctly here:

Change the semantics of the built-in fixed-size integer types from being defined as wrapping around on overflow to it being considered a program error (but not undefined behavior in the C sense). Implementations are permitted to check for overflow at any time (statically or dynamically). Implementations are required to at least check dynamically when debug_assert! assertions are enabled. Add a WrappingOps trait to the standard library with operations defined as wrapping on overflow for the limited number of cases where this is the desired semantics, such as hash functions.

This is the currently taken (conservative) path, and if we'd like to guarantee more, then that shouldn't be done by not mentioning it in the docs, but rather by filing a new RFC, I'd say.

(Also, my reading of the part I quoted is that the result of executing a + is either the same as the wrapping result, or a panic.)

@huonw I can't find the intention for overflow to be wrapping in the RFC thread (other than as a consequence of how rustc implements it currently). We can e.g. see this user suggesting what you said, but nobody telling that this is indeed the RFC's text (so it seems it isn't the intention):

rust-lang/rfcs#560 (comment):

I like this proposal, but I don't like the concept of "unspecified behavior" that isn't really unspecified. To avoid confusion, I'd prefer to say that if the overflow is not diagnosed, the value is guaranteed to wrap.
Alternatively, you could write that it truly is unspecified, and optimize based on that, but I think that's non-ideal.

Now, reading between the lines of the following, it seems that the following assumes that rustc's implementation will be wrap-on-overflow, but that this isn't meant to be guaranteed.

https://internals.rust-lang.org/t/a-tale-of-twos-complement/1062

Aren't you concerned about de facto lock-in for overflow?

There is certainly a concern that we will not be able to make overflow stricter without breaking code in the wild. We believe that having overflow checking be mandatory in debug builds should at least mitigate that risk, as intentional uses of overflow should be detected long before the code ever comes into widespread use (and redirected to the suitable wrapper type).

The RFC as it stands right now doesn't sound like "wrap or panic" semantics, but rather than "panic or unspecified value" which is a superset of the former, so we don't lock ourselves regarding backward compatiblity if we don't guarantee this right now.

There was intention, e.g.: rust-lang/rfcs#560 (comment) and rust-lang/rfcs#560 (comment) .

It was even in the text in rust-lang/rfcs@6dbbc70#diff-3ade9a15caf1d747ea87d08e3021dc33R126 which was edited rust-lang/rfcs@865eae6#diff-3ade9a15caf1d747ea87d08e3021dc33R129 and then removed in a seemingly unrelated commit rust-lang/rfcs@9f2c7f0 . I'm fairly sure the loss of clarity in the non-as case in the last commit was an accident.

https://internals.rust-lang.org/t/a-tale-of-twos-complement/1062

NB. that was posted a few weeks before the RFC, and the design was refined/changed in the RFC: the RFC is canonical source.

That said, I'm in favour of writing this documentation as "behaviour for too-long iterators is unspecified", but not in favour of motivating that by a pedantic reading of the RFC. :)

@huonw Thanks for the pointers, apparantly I was wrong.

So, now for the wording, what about putting "behaviour for iterators with more than usize::MAX elements is unspecified" into # Undefined overflow? What do you suggest?

I would suggest an # Overflow Behavior section which explicitly spells out that there is no guarding against overflow and the overflow semantics will depend on whether debug assertions are enabled.

pnkfelix · 2015-04-17T11:14:49Z

Sigh; if it wasn't already clear from my comments above, I'm pretty torn about this PR.

On the one hand I'm not a fan of appealing to arguments that certain cases are "never expected to trip" -- that's the whole problem with edge cases: You don't expect them, and then they get exploited.

But on the other hand, we probably need to employ a consistent philosophy for all of the iterator methods. And considering how widely used these methods are, several in program hot spots, maybe the right thing is indeed to just rely on debug-assertions catching the underlying overflows in all these cases, and tell users in documentation that they should use alternative methods if the inputs they are iterating over go beyond usize ?

(I.e. would it make sense for a library on crates.io to offer checked variants of all these methods?)

@tbu- sorry its taking so long to decide about this.

tbu- · 2015-04-17T22:27:00Z

I think that offering a crates.io library for checked variants is unrealistic. I don't think people will go an extra mile for safety, and in my opinion it should be exactly the other way around. If we do need the unchecked operations, those should be provided on crates.io, because then the users would make an explicit statement by using those.

(For a very far-fetched analogy, see how IE had enabled DNT by default, but that completely destroys the meaning of the flag. Similarly you shouldn't opt-in for safety, but rather for unsafety.)

Making enumerate the one exception does not sound so bad to me, as long as there's a section explaining that integer overflow is not handled.

pnkfelix · 2015-04-17T22:49:01Z

But the reality is that the vast majority of iterator traversals will be over usize-bound collections that simply cannot overflow. I guess if we ever add specialization then that would resolve that issue...

Or ... hmm. I wonder if we can use a default type param (or associated type?) to effectively dispatch to the appropriate (checked or unchecked) increment routine in each case. The default would be checked, but implementations that never yield more than usize would override it...

It's late and I'm just musing ...

Update: here's a branch with some ideas I've been playing with: https://github.com/pnkfelix/rust/tree/fsk-sized-bounded-iter

pnkfelix · 2015-04-18T07:25:54Z

Or here's a far less baroque variant idea: add a new fn increment(&self, usize) -> usize method to iterator, where the default just adds 1. Then override that method in impls that might traverse more than usize elementa, and use checked_add there.

Update: maybe not such a hot idea; chained iterator adapters would probably need to conservatively assume checked_add

tbu- · 2015-04-18T08:26:02Z

I'm starting to like the position_unchecked and count_unchecked variant. It captures intent pretty well and doesn't default to unchecked.

pnkfelix · 2015-04-24T08:37:16Z

@tbu- so the team discussed this at this week's Tuesday meeting.

The consensus was that we really don't want to change any of the code here; you should just let the arithmetic overflow happen (or not) based on how the addition operator itself gets compiled under the settings passed to the compiler.

Even adding special case paths into iterator code that are guarded by cfg(debug-assertions) was disliked, since people did not want to set a precedent of saying that every potential arithmetic overflow in libstd needs to have its own specialized checks and error messages.

If you would like to revise this PR to just focus on documentation changes, then that would be great. (The idea I think is that you would not guarantee a panic in the docs, but rather just say that panic can occur, in accordance with how arithmetic overflow is handled in general.)

pnkfelix · 2015-04-30T09:46:30Z

@tbu- do you plan to continue working on this PR? I would be happy to take it over for you (i.e. incorporating the revisions noted above and landing it from my own repository).

bluss · 2015-04-30T10:13:40Z

Unless I misunderstand something, it's currently like this: debug assertions are removed when libstd is compiled & distributed in rust releases. Arithmetic checks are not removed, and will stay and are determined by the compiler settings of the user code that links to and inlines parts of libstd.

This is great, because it means that user's debug builds will have overflow checking in enumerate if I understand correctly, while it will compile to non-checked code in release builds.

Itertools has .enumerate_from which makes it easier to test with an enumeration type of smaller size (i8). It behaves like this, client code will be overflow checked in debug mode and not in release mode. It already seems optimal to me!

This is how arithmetic checking is handled in general in Rust now. Your tests should cover it. If they don't, downstream users's tests will also build in debug mode and will give it another chance to cover it.

tbu- · 2015-04-30T11:30:57Z

@pnkfelix I plan to finish this PR by tomorrow.

tbu- · 2015-05-01T15:32:50Z

@pnkfelix Updated it.

pnkfelix · 2015-05-05T09:53:17Z

@tbu- I think @alexcrichton 's suggestion of an "# Overflow Behavior" section is a good one.

tbu- · 2015-05-06T09:09:47Z

@pnkfelix Updated it.

pnkfelix · 2015-05-06T12:32:40Z

@tbu- okay looks great.

r=me after a squash.

Explicitely spell out behaviour on overflow for `usize`-returning iterator functions. Mention that panics are guaranteed if debug assertions are active, otherwise a wrong result might be returned.

tbu- · 2015-05-06T18:31:01Z

@pnkfelix
Squashed and rebased.

pnkfelix · 2015-05-06T18:50:12Z

@bors r+ 29d7fed

Added documentation of the panicking overflow to the `count` and `position` methods of `IteratorExt`, also make them more obvious in the code. Also mention that the `Iterator::next` method of the struct returned by `IteratorExt::enumerate` can panic due to overflow.

bors · 2015-05-06T20:33:11Z

⌛ Testing commit 29d7fed with merge 6c9f840...

bors · 2015-05-06T22:27:56Z

☀️ Test successful - auto-linux-32-nopt-t, auto-linux-32-opt, auto-linux-64-nopt-t, auto-linux-64-opt, auto-linux-64-x-android-t, auto-mac-32-opt, auto-mac-64-nopt-t, auto-mac-64-opt, auto-win-32-nopt-t, auto-win-32-opt, auto-win-64-nopt-t, auto-win-64-opt

rust-highfive assigned huonw Mar 31, 2015

rust-highfive assigned pnkfelix and unassigned huonw Apr 6, 2015

tbu- force-pushed the pr_iter_length_position branch from 3fd91c4 to db2a86b Compare April 14, 2015 08:22

alexcrichton reviewed Apr 14, 2015
View reviewed changes

Make overflow behaviour more obvious in the iterator module of libcore

29d7fed

Explicitely spell out behaviour on overflow for `usize`-returning iterator functions. Mention that panics are guaranteed if debug assertions are active, otherwise a wrong result might be returned.

tbu- force-pushed the pr_iter_length_position branch from 9520929 to 29d7fed Compare May 6, 2015 18:30

bors merged commit 29d7fed into rust-lang:master May 6, 2015

Make overflow behaviour more obvious in the iterator module of libcore #23894

Make overflow behaviour more obvious in the iterator module of libcore #23894

Uh oh!

Conversation

tbu- commented Mar 31, 2015

Uh oh!

rust-highfive commented Mar 31, 2015

Uh oh!

pnkfelix commented Mar 31, 2015

Uh oh!

alexcrichton commented Mar 31, 2015

Uh oh!

bluss commented Apr 3, 2015

Uh oh!

tbu- commented Apr 3, 2015

Uh oh!

pnkfelix commented Apr 3, 2015

Uh oh!

bors commented Apr 4, 2015

Uh oh!

tbu- commented Apr 4, 2015

Uh oh!

huonw commented Apr 6, 2015

Uh oh!

pnkfelix commented Apr 6, 2015

Uh oh!

alexcrichton commented Apr 6, 2015

Uh oh!

tbu- commented Apr 14, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pnkfelix commented Apr 17, 2015

Uh oh!

tbu- commented Apr 17, 2015

Uh oh!

pnkfelix commented Apr 17, 2015

Uh oh!

pnkfelix commented Apr 18, 2015

Uh oh!

tbu- commented Apr 18, 2015

Uh oh!

pnkfelix commented Apr 24, 2015

Uh oh!

pnkfelix commented Apr 30, 2015

Uh oh!

bluss commented Apr 30, 2015

Uh oh!

tbu- commented Apr 30, 2015

Uh oh!

tbu- commented May 1, 2015

Uh oh!

pnkfelix commented May 5, 2015

Uh oh!

tbu- commented May 6, 2015

Uh oh!

pnkfelix commented May 6, 2015

Uh oh!

tbu- commented May 6, 2015

Uh oh!

pnkfelix commented May 6, 2015

Uh oh!

Make overflow behaviour more obvious in the iterator module of `libcore` #23894

Make overflow behaviour more obvious in the iterator module of `libcore` #23894