Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make overflow behaviour more obvious in the iterator module of libcore #23894

Merged
merged 1 commit into from
May 6, 2015

Conversation

tbu-
Copy link
Contributor

@tbu- tbu- commented Mar 31, 2015

Added documentation of the panicking overflow to the count and position
methods of IteratorExt, also make them more obvious in the code.

Also mention that the Iterator::next method of the struct returned by
IteratorExt::enumerate can panic due to overflow.

@rust-highfive
Copy link
Collaborator

r? @huonw

(rust_highfive has picked a reviewer for you, use r? to override)

@pnkfelix
Copy link
Member

I am a tiny bit nervous about adding the overhead of a required check+panic (i.e. that shows up in release builds) to the enumerate method.

@alexcrichton
Copy link
Member

I agree with @pnkfelix as well. @tbu- could you run a few benchmarks and post them here to see the impact?

@bluss
Copy link
Member

bluss commented Apr 3, 2015

If the enumerator checking really does land, please don't use a plain unwrap without error message.

@tbu-
Copy link
Contributor Author

tbu- commented Apr 3, 2015

@pnkfelix It seems that the checked addition has quite a performance penalty, the number of iterations roughly halves (if you do nothing else, of course):

running 6 tests
test empty_dc ... bench:         2 ns/iter (+/- 0)
test empty_nc ... bench:         2 ns/iter (+/- 0)
test first_dc ... bench:       872 ns/iter (+/- 1)
test first_nc ... bench:       355 ns/iter (+/- 0)
test large_dc ... bench:    219554 ns/iter (+/- 774)
test large_nc ... bench:     87670 ns/iter (+/- 155)

test result: ok. 0 passed; 0 failed; 0 ignored; 6 measured

on an empty, 256-element and 65536-element array, respectively. Gist.

@bluss Dunno what to do about this unwrap. I could potentially replace it with an unreachable!, because it really ought not to happen.

@pnkfelix
Copy link
Member

pnkfelix commented Apr 3, 2015

@tbu- Regarding alternatives to unwrap, i usually use .expect rather than .unwrap in situations like this... (I'm not really sure why you think this "ought not to happen" -- it seems like it might very well happen in scenarios where one is iterating over something other than elements of a container).

But in any case it sounds like, given the performance hit, we may want this to be tied more closely into debug_assertions ?

@bors
Copy link
Contributor

bors commented Apr 4, 2015

☔ The latest upstream changes (presumably #24045) made this pull request unmergeable. Please resolve the merge conflicts.

@tbu-
Copy link
Contributor Author

tbu- commented Apr 4, 2015

@pnkfelix I only use unwrap where the iterator is guaranteed to be an ExactSizeIterator, thus having an amount of elements that fits into a usize (see also the comment above those unwraps).

About the debug assertions, maybe it would suffice to say that going beyond usize::MAX with an enumerate iterator is Rust-integer-undefined behaviour.

@huonw
Copy link
Member

huonw commented Apr 6, 2015

r? @pnkfelix (transferring reviewership, don't have the bandwidth right now.)

@rust-highfive rust-highfive assigned pnkfelix and unassigned huonw Apr 6, 2015
@pnkfelix
Copy link
Member

pnkfelix commented Apr 6, 2015

@tbu- yes; I suspect it would be better to not guarantee panic on overflow when debug-assertions are disabled.

@alexcrichton
Copy link
Member

@tbu- thanks for collecting those numbers! These core iterator types are pretty performance critical, so I agree with @pnkfelix that we probably don't want to guarantee a panic-on-overflow here and let debug-assertions take care of it.

@tbu-
Copy link
Contributor Author

tbu- commented Apr 14, 2015

@pnkfelix @alexcrichton I updated it to only mention the possible overflow in the Enumerate iterator. Is that fine?

/// # Undefined overflow
///
/// The method does no guarding against overflows, so enumerating more than
/// `usize::MAX` elements is undefined.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "undefined" may be a bit strong here, isn't this defined to panic in debug builds and wrap in optimized builds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we call it undefined overflow and don't guarantee anything about it other than it doesn't create unsafe things. It might panic, might give out seemingly-random numbers etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RFC and I believe the current implementation both define overflow in the case of optimized builds to be the normal 2's complement overflow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I second @alexcrichton 's note here. No reason to be putting these sorts of scary terms here, IMHO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm representing the RFC correctly here:

Change the semantics of the built-in fixed-size integer types from being defined as wrapping around on overflow to it being considered a program error (but not undefined behavior in the C sense). Implementations are permitted to check for overflow at any time (statically or dynamically). Implementations are required to at least check dynamically when debug_assert! assertions are enabled. Add a WrappingOps trait to the standard library with operations defined as wrapping on overflow for the limited number of cases where this is the desired semantics, such as hash functions.

This is the currently taken (conservative) path, and if we'd like to guarantee more, then that shouldn't be done by not mentioning it in the docs, but rather by filing a new RFC, I'd say.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Also, my reading of the part I quoted is that the result of executing a + is either the same as the wrapping result, or a panic.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huonw I can't find the intention for overflow to be wrapping in the RFC thread (other than as a consequence of how rustc implements it currently). We can e.g. see this user suggesting what you said, but nobody telling that this is indeed the RFC's text (so it seems it isn't the intention):

rust-lang/rfcs#560 (comment):

I like this proposal, but I don't like the concept of "unspecified behavior" that isn't really unspecified. To avoid confusion, I'd prefer to say that if the overflow is not diagnosed, the value is guaranteed to wrap.
Alternatively, you could write that it truly is unspecified, and optimize based on that, but I think that's non-ideal.

Now, reading between the lines of the following, it seems that the following assumes that rustc's implementation will be wrap-on-overflow, but that this isn't meant to be guaranteed.

https://internals.rust-lang.org/t/a-tale-of-twos-complement/1062

Aren't you concerned about de facto lock-in for overflow?

There is certainly a concern that we will not be able to make overflow stricter without breaking code in the wild. We believe that having overflow checking be mandatory in debug builds should at least mitigate that risk, as intentional uses of overflow should be detected long before the code ever comes into widespread use (and redirected to the suitable wrapper type).

The RFC as it stands right now doesn't sound like "wrap or panic" semantics, but rather than "panic or unspecified value" which is a superset of the former, so we don't lock ourselves regarding backward compatiblity if we don't guarantee this right now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was intention, e.g.: rust-lang/rfcs#560 (comment) and rust-lang/rfcs#560 (comment) .

It was even in the text in rust-lang/rfcs@6dbbc70#diff-3ade9a15caf1d747ea87d08e3021dc33R126 which was edited rust-lang/rfcs@865eae6#diff-3ade9a15caf1d747ea87d08e3021dc33R129 and then removed in a seemingly unrelated commit rust-lang/rfcs@9f2c7f0 . I'm fairly sure the loss of clarity in the non-as case in the last commit was an accident.

https://internals.rust-lang.org/t/a-tale-of-twos-complement/1062

NB. that was posted a few weeks before the RFC, and the design was refined/changed in the RFC: the RFC is canonical source.


That said, I'm in favour of writing this documentation as "behaviour for too-long iterators is unspecified", but not in favour of motivating that by a pedantic reading of the RFC. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huonw Thanks for the pointers, apparantly I was wrong.

So, now for the wording, what about putting "behaviour for iterators with more than usize::MAX elements is unspecified" into # Undefined overflow? What do you suggest?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest an # Overflow Behavior section which explicitly spells out that there is no guarding against overflow and the overflow semantics will depend on whether debug assertions are enabled.

@pnkfelix
Copy link
Member

Sigh; if it wasn't already clear from my comments above, I'm pretty torn about this PR.

On the one hand I'm not a fan of appealing to arguments that certain cases are "never expected to trip" -- that's the whole problem with edge cases: You don't expect them, and then they get exploited.

But on the other hand, we probably need to employ a consistent philosophy for all of the iterator methods. And considering how widely used these methods are, several in program hot spots, maybe the right thing is indeed to just rely on debug-assertions catching the underlying overflows in all these cases, and tell users in documentation that they should use alternative methods if the inputs they are iterating over go beyond usize ?

(I.e. would it make sense for a library on crates.io to offer checked variants of all these methods?)

@tbu- sorry its taking so long to decide about this.

@tbu-
Copy link
Contributor Author

tbu- commented Apr 17, 2015

I think that offering a crates.io library for checked variants is unrealistic. I don't think people will go an extra mile for safety, and in my opinion it should be exactly the other way around. If we do need the unchecked operations, those should be provided on crates.io, because then the users would make an explicit statement by using those.

(For a very far-fetched analogy, see how IE had enabled DNT by default, but that completely destroys the meaning of the flag. Similarly you shouldn't opt-in for safety, but rather for unsafety.)

Making enumerate the one exception does not sound so bad to me, as long as there's a section explaining that integer overflow is not handled.

@pnkfelix
Copy link
Member

But the reality is that the vast majority of iterator traversals will be over usize-bound collections that simply cannot overflow. I guess if we ever add specialization then that would resolve that issue...

Or ... hmm. I wonder if we can use a default type param (or associated type?) to effectively dispatch to the appropriate (checked or unchecked) increment routine in each case. The default would be checked, but implementations that never yield more than usize would override it...

It's late and I'm just musing ...

Update: here's a branch with some ideas I've been playing with: https://github.com/pnkfelix/rust/tree/fsk-sized-bounded-iter

@pnkfelix
Copy link
Member

Or here's a far less baroque variant idea: add a new fn increment(&self, usize) -> usize method to iterator, where the default just adds 1. Then override that method in impls that might traverse more than usize elementa, and use checked_add there.

Update: maybe not such a hot idea; chained iterator adapters would probably need to conservatively assume checked_add

@tbu-
Copy link
Contributor Author

tbu- commented Apr 18, 2015

I'm starting to like the position_unchecked and count_unchecked variant. It captures intent pretty well and doesn't default to unchecked.

@pnkfelix
Copy link
Member

@tbu- so the team discussed this at this week's Tuesday meeting.

The consensus was that we really don't want to change any of the code here; you should just let the arithmetic overflow happen (or not) based on how the addition operator itself gets compiled under the settings passed to the compiler.

Even adding special case paths into iterator code that are guarded by cfg(debug-assertions) was disliked, since people did not want to set a precedent of saying that every potential arithmetic overflow in libstd needs to have its own specialized checks and error messages.

If you would like to revise this PR to just focus on documentation changes, then that would be great. (The idea I think is that you would not guarantee a panic in the docs, but rather just say that panic can occur, in accordance with how arithmetic overflow is handled in general.)

@pnkfelix
Copy link
Member

@tbu- do you plan to continue working on this PR? I would be happy to take it over for you (i.e. incorporating the revisions noted above and landing it from my own repository).

@bluss
Copy link
Member

bluss commented Apr 30, 2015

Unless I misunderstand something, it's currently like this: debug assertions are removed when libstd is compiled & distributed in rust releases. Arithmetic checks are not removed, and will stay and are determined by the compiler settings of the user code that links to and inlines parts of libstd.

This is great, because it means that user's debug builds will have overflow checking in enumerate if I understand correctly, while it will compile to non-checked code in release builds.

Itertools has .enumerate_from which makes it easier to test with an enumeration type of smaller size (i8). It behaves like this, client code will be overflow checked in debug mode and not in release mode. It already seems optimal to me!

This is how arithmetic checking is handled in general in Rust now. Your tests should cover it. If they don't, downstream users's tests will also build in debug mode and will give it another chance to cover it.

@tbu-
Copy link
Contributor Author

tbu- commented Apr 30, 2015

@pnkfelix I plan to finish this PR by tomorrow.

@tbu-
Copy link
Contributor Author

tbu- commented May 1, 2015

@pnkfelix Updated it.

@pnkfelix
Copy link
Member

pnkfelix commented May 5, 2015

@tbu- I think @alexcrichton 's suggestion of an "# Overflow Behavior" section is a good one.

@tbu-
Copy link
Contributor Author

tbu- commented May 6, 2015

@pnkfelix Updated it.

@pnkfelix
Copy link
Member

pnkfelix commented May 6, 2015

@tbu- okay looks great.

r=me after a squash.

Explicitely spell out behaviour on overflow for `usize`-returning iterator
functions.

Mention that panics are guaranteed if debug assertions are active, otherwise a
wrong result might be returned.
@tbu- tbu- force-pushed the pr_iter_length_position branch from 9520929 to 29d7fed Compare May 6, 2015 18:30
@tbu-
Copy link
Contributor Author

tbu- commented May 6, 2015

@pnkfelix
Squashed and rebased.

@pnkfelix
Copy link
Member

pnkfelix commented May 6, 2015

@bors r+ 29d7fed

bors added a commit that referenced this pull request May 6, 2015
Added documentation of the panicking overflow to the `count` and `position`
methods of `IteratorExt`, also make them more obvious in the code.

Also mention that the `Iterator::next` method of the struct returned by
`IteratorExt::enumerate` can panic due to overflow.
@bors
Copy link
Contributor

bors commented May 6, 2015

⌛ Testing commit 29d7fed with merge 6c9f840...

@bors bors merged commit 29d7fed into rust-lang:master May 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants