- 
                Notifications
    You must be signed in to change notification settings 
- Fork 13.9k
Add UnboundedIterator Trait #47082
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add UnboundedIterator Trait #47082
Conversation
This optimizes UnboundedIterator::fuse to a noop.
| Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @kennytm (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. | 
| The @rust-lang/libs team should be more familiar with this. | 
| Regarding  | 
| 
 I really like this idea.  | 
| The main problem with a safe trait with  
 Concluding going with a safe trait is most likely possible (at least I haven't found any impossible problems just thinking about it). But having two different methods, which in theory should have the same implementation, but are allowed to have different implementations feels like a source for bugs. | 
| Thanks for the PR @oberien! This is a pretty weighty feature though so I think we'll be discussing this as a libs triage meeting at least. In terms of motivation to start out with, you've mentioned  | 
| My original motivation was that I wanted to show off Rust to a friend: Thus, I analyzed the root cause of this issue and found out, that Rust does not "know" about infinite iterators. So I thought about "tagging" unbounded iterators and posted about it on reddit. After digging deeper into a possible  tl;dr: While my original intend was to optimize the specific case of  | 
| Hm ok, so to me that sounds like most of this boils down to creating a vector. It makes sense to me that there's other possible optimizations such as  Along those lines is it possible perhaps to achieve the same goal with vectors a different method? There's already a ton of specializations for creating vectors in via collection, and perhaps a special case is needed for an iterator like  | 
| I don't think that this optimization is possible without marking unbounded iterators.  I do see the heaviness of this PR and in fact I was thinking about adding a drawback "Are these special-case optimizations worth that much code?". But I do think that this optimization is something people expect (at least I and a few friends do) and which would add to rusts promise of zero-cost abstractions. | 
| Ok we discussed this at @rust-lang/libs triage today and the conclusion was that we'd like to avoid the  | 
| #47370 may be related | 
| As @shepmaster says, I think there could be other benefits of an  | 
| The problem this PR wants to tackle is optimizations around unbounded / infinite iterators. This implies that we somehow need to mark infinite iterators in order to deal with them. This marking is required to be done transitively over iterator adapters to catch all (most) cases (and not just manually implemented ones). I decided to go with a new marker trait, because it's minimally invasive. It only changes cases which need to be touched and doesn't influence anything different. Thus, I think it's also the solution with the smallest code diff without changing existing stuff, because it's literally just marking unbounded iterators in libcore transitively. Coming back to my original suggestion using a new  The first alternative is reducing the motivation. Instead of a new trait we could  A different but similar approach is to tackle the same motivation without a new trait, but changing existing behaviour of other traits. The idea is to use  Another idea is to increase the motivation and tackle a greater problem. My suggestion here would be to change the current approach of  enum Size {
  Unknown,
  Bounded(usize, usize),
  BoundedLower(usize),
  LargerThanUsize,
  Unbounded,
}Such a return value can be used for a lot of special cases. For example  This change will allow way more widespread optimizations and possibilities for a lot of other kinds of iterators. But it'll be quite a large change and definitely require an RFC and further discussion. In conclusion I think that we should definitely tackle unbounded iterators and possible optimizations surrounding them. I think that we should mark iterators to allow optimizations of any kind and also leave the option for future special case handling. | 
| 
 I think it's important to remember that these aren't optimizations, these are behaviour changes. Which, from my PoV at least, means they should only be done if they work reliably, consistently, and obviously on a clear class of things. And given the number of iterator adapters we have that take closures, there's definitely no way for that to happen even for the iterators we control, let alone all the iterators that exist in the wild or haven't yet been written. Right now, every fold-like method on Iterator (aka that takes  | 
| 
 I didn't want to move the discussion around #47370 here. I simply wanted to state, that currently if these type of changes are implemented (be it through an RFC in the future, or #47533, or completely new ideas on unbounded iterators), they will only be available to their respective types themselves. They won't be available transitively through iterator adapters. In fact, there isn't currently no way to transitively forward these changes through some adapters for infinite iterators. 
 With specialization, each adapter can be inspected individually, and only changed if all conditions and guarantees hold. For example, given #47533 lands,  
 That's not changed at all by this PR. On the contrary, I suggest to produce a compile warning when a consumer is called on an infinite iterator, as it currently produces an infinite loop, which most likely indicates a bug. @kennytm suggested to perform this as clippy lint, which would fit well IMO. | 
| @alexcrichton What do you think about the alternatives presented 3 comments above? | 
| Ping from triage @alexcrichton! Not sure if this should be kept open. There is also a response at #47082 (comment). | 
| @oberien hm sorry I think I'm getting a little lost now... I thought the motivation of this change was solely  | 
| Why not use TrustedLen for this? It is unstable, and it can be redesigned to accommodate this too. We can't use  | 
| @alexcrichton The main reason why I dug into this topic of unbounded iterators was the  @bluss Good catch. I don't think that we are able to find any other variant of that tuple, which isn't taken yet, so maybe this idea will require changing the behaviour of  | 
| @oberien it seems that the discussion around unbounded iterators is much broader, so maybe this PR could be cut back to just dealing with the immediate vec creation problem? | 
| I guess we should go with my first suggestion in #47082 (comment) then: 
 I implemented this behaiviour test-wise before creating that comment: oberien@806002e Is that genrally a change that you'd merge for enabling the immediate optimization? If that's the case I can go ahead closing this PR and creating a new one for that branch. How should the discussion around unbounded iterators be continued? Or do you think that it doesn't add enough value in general for it to be continued? | 
| Ah yeah that patch looks good to me! Perhaps a thread on internals to discuss the infinite iterator issue? | 
| Closing in favour of #47944 | 
…n, r=bluss Implement TrustedLen for Take<Repeat> and Take<RangeFrom> This will allow optimization of simple `repeat(x).take(n).collect()` iterators, which are currently not vectorized and have capacity checks. This will only support a few aggregates on `Repeat` and `RangeFrom`, which might be enough for simple cases, but doesn't optimize more complex ones. Namely, Cycle, StepBy, Filter, FilterMap, Peekable, SkipWhile, Skip, FlatMap, Fuse and Inspect are not marked `TrustedLen` when the inner iterator is infinite. Previous discussion can be found in rust-lang#47082 r? @alexcrichton
…n, r=bluss Implement TrustedLen for Take<Repeat> and Take<RangeFrom> This will allow optimization of simple `repeat(x).take(n).collect()` iterators, which are currently not vectorized and have capacity checks. This will only support a few aggregates on `Repeat` and `RangeFrom`, which might be enough for simple cases, but doesn't optimize more complex ones. Namely, Cycle, StepBy, Filter, FilterMap, Peekable, SkipWhile, Skip, FlatMap, Fuse and Inspect are not marked `TrustedLen` when the inner iterator is infinite. Previous discussion can be found in rust-lang#47082 r? @alexcrichton
The
TrustedLenPR (tracking issue) is an effort to improve the performance ofFromIterator/Extendcode using specialization when working with iterators reporting a correctsize_hint(i.e. are marked with theTrustedLentrait). Unfortunately, it does not work for infinite iterators, which a fixed number of elements aretaken from. Instead, currently calling e.g..take(n).collect::<Vec<_>>()on an unbounded iterator produces a loop adding one element at a time to the vector with capacity checks on each iteraton. The optimized implementation ofvec!, which should be used here as well, just usescallocas we'd expect when creating a zeroed vector.Using any number different from zero results in the same loop with the iterator approach. The macro produces partially unrolled asm with SIMD instructions instead.
This PR introduces the unsafe
UnboundedIteratortrait, which marks infinite iterators. This marker is used for optimizations.Its contract is described as the following:
Improvements
TrustedLenis implemented whenever.take(n)is called on anUnboundedIterator. This optimizes cases similar torepeat(x).take(n).collect()..fuse()on anUnboundedIteratorbecomes a noop..cycle()on anUnboundedIteratorbecomes a noop.Possible Future Improvements (TBD)
repeat(0).take(n).collect::<Vec<_>>allocates the size on the heap and uses memset to set the values to 0. This could be optimized to use__rustc_alloc_zeroedjust asvec![0; n]is optimized to do..next()call will returnSome). (One problem here is code likelet iter = _; iter.next(); iter.cycle(), as any call to.next()invalidates the property). This can be used for further improvements toCycleas it can be anUnboundedIteratorif the underlying iterator returns at least one value (currently it's only unbounded if the inner iterator is unbounded). Additionally, FlatMap with an outer iterator producing at least one element and an unbounded inner iterator can be unbounded as well.None-branches from code handling return values of.next()calls on unbounded iterators. For now, I wasn't able to find a case where llvm wasn't able to do this already with static linking. With dynamic libraries, though, this is a possible optimization (if we trust the library file).Unbounded Iterator Implementers
Repeat,RangeFromRev,Fuse,Map,Inspect,Zip,Enumerate,Peekable,Cloned,Boxsize_hint:Filter,SkipWhile,FilterMap,StepBy,FlatMap,SkipCycle: If the inner iterator is unbounded,Tis in theory not required to beClone. Unfortunately, removing it from theUnboundedIteratorimplementation results inconflicting implementationin the specialization trait implementation.Chain: In theory, aChainwith the first, second or both iterators being unbounded can be anUnboundedIterator. Unfortunately, afaik it's not possible to express this right now. I tried some auto trait hackery, but ran into Negative impls of Auto Traits (OIBIT) don't take into account Trait Bounds #46813 . Thus, currently it's only implemented if the first one is unbounded.Alternatives
size_hint2onIteratorreturning more information thansize_hint, which can also indicate unbounded iterators. This will be a large public change, which will probably require an RFC first. Additionally, static analyses e.g. to warn on calling consumers on unbounded iterators might become harder. Optimizations would happen inside each function, which will match on the return value ofsize_hint2instead of specialization, increasing the cyclomatic complexity of these functions.fn next(&mut self) -> Self::Item, which expresses the contract ofUnboundedIteratorin the type system.Implementation Notes / Questions
As this is my first contribution to Rust, I'd like to evaluate on some of my design decisions and ask some questions. Also I'd highly appreciate any feedback on my code.
UnboundedIteratorforStepBy,iterator_step_byorunbounded_iter? I decided to go with the former, as that was also done byFusedIterator.Chainas described in Unbounded Iterator Implementers?UnboundedIteratoronCycleremove theClonetrait bound as mentioned in Unbounded Iterator Implementers?.fuse()into a noop, I useFusedIteratoras supertrait:trait UnboundedIterator: FusedIterator. While this results in the desired optimization, I'd rather like to express it usingimpl<I: UnboundedIterator> FusedIterator for I. Unfortunately, that conflicts withimpl<'a, I: FusedIterator + ?Sized> FusedIterator for &'a mut I {}.Additionally, making
FusedIteratora supertrait ofUnboundedIteratorrequiresChain<A, B>to have both A fused and B fused due to the implementation ofFusedIteratorforChain. Is there a better way to express that anyUnboundedIteratoris also aFusedIterator, which passes the compiler?size_hintmethod instead of directly specializingIteratorandUnboundedIteratorbecause it currently breaks type inference: Specialization influnces inference #36262 (comment)Unresolved Questions
UnboundedIteratorallows us to detect some possibly infinite loops, namely calling consumer functions on them. For examplerepeat(0).count()will result in an infinite loop. I'd suggest that a compile warning should be produced in these cases. In most cases, this is a bug. And in the few cases where this is intended behaviour,#[allow(...)]can be used. Consuming methods arecount,last,fold,all,max,min,max_by_key,max_by,min_by_key,min_by,sumandproduct. The functionscollectandpartitionmight fit here as well, but their contract does not actually state that they consume all elements. So there can be an implementation ofFromIteratorwhich only consumes a finite number of elements. Nonetheless, usually this points to a bug, so I think calling these two methods should still produce a warning.When discussing this in the Rust discord guild I got mixed feedback, which is why I'm adding this as part of the Unresolved Questions. What do you think about this?
Also, in which layer would this change be done? As far as I understand, adding a new warning requires a new lint. Would it be possible to add this similar to
#[deprecated]as an annotation which is added to the default implementations of these functions onUnboundedIterator?TrustedLenbe implemented forUnboundedIterators? AnyUnboundedIteratordoes follow the contract, but does that implementation actually add anything useful?UnboundedIteratorspecialize the underlying struct'ssize_hintmethod to ensure returning(usize::MAX, None)? Currently types that don't need specialization do return this with their current implementation. But maybe with a later PR this might change. Specializing all of them better ensures the contract in case of changes not keeping in mind theUnboundedIteratorcontract.size_hintneeding to return(usize::MAX, None)forUnboundedIteratorimplementers. This is both becauseTakehas asize_hintimplementation, which works correctly if the underlying iterator returns these values, and because it makes sense. Should this constrait be removed andTakespecialized instead?