- 
                Notifications
    
You must be signed in to change notification settings  - Fork 13.9k
 
implement fold() on array::IntoIter to improve flatten().collect() perf #87431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| 
           r? @kennytm (rust-highfive has picked a reviewer for you, use r? to override)  | 
    
      
        
              This comment has been minimized.
        
        
      
    
  This comment has been minimized.
      
        
              This comment has been minimized.
        
        
      
    
  This comment has been minimized.
``` # old test vec::bench_flat_map_collect ... bench: 2,244,024 ns/iter (+/- 18,903) # new test vec::bench_flat_map_collect ... bench: 172,863 ns/iter (+/- 2,141) ```
| 
           while this LGTM, shouldn't the original issue be addressed by implementing   | 
    
| 
           The original issue involved   | 
    
| (&mut self.alive) | ||
| .try_fold::<_, _, Result<_, !>>(init, |acc, idx| { | ||
| // SAFETY: idx is obtained by folding over the `alive` range, which implies the | ||
| // value is currently considered alive but as the range is being consumed each value | ||
| // we read here will only be read once and then considered dead. | ||
| Ok(fold(acc, unsafe { data.get_unchecked(idx).assume_init_read() })) | ||
| }) | ||
| .unwrap() | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we call fold here instead of try_fold?
| (&mut self.alive) | |
| .try_fold::<_, _, Result<_, !>>(init, |acc, idx| { | |
| // SAFETY: idx is obtained by folding over the `alive` range, which implies the | |
| // value is currently considered alive but as the range is being consumed each value | |
| // we read here will only be read once and then considered dead. | |
| Ok(fold(acc, unsafe { data.get_unchecked(idx).assume_init_read() })) | |
| }) | |
| .unwrap() | |
| self.alive.fold(init, |acc, idx| { | |
| // SAFETY: idx is obtained by folding over the `alive` range, which implies the | |
| // value is currently considered alive but as the range is being consumed each value | |
| // we read here will only be read once and then considered dead. | |
| fold(acc, unsafe { data.get_unchecked(idx).assume_init_read() }) | |
| }) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that, array::IntoIter has a Drop impl, so alive can't be move out, but that would be required to call fold(self), that's why I used try_fold instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@the8472 oops right.
(&mut self.alive).fold(init, ...) should work though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah but that would go through impl Iterator for &mut I which is less optimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.alive is a std::ops::Range<usize> and AFAIK there is no special-cased implementation of fold or try_fold for Range<usize> nor &mut Range<usize>.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even if we do the mem::take it will just turn self.alive to 0..0 and then leaks everything which is safe 🙃  (compared with self.alive.clone().fold(...) which will cause double-free).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have now tried (&mut self.alive).fold instead of try_fold, it undoes all perfomance gains. I guess somehow the indirection through &mut inhibits optimizations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which is due to this not being #[inline]
rust/library/core/src/iter/traits/iterator.rs
Lines 3474 to 3478 in 71a6c7c
| impl<I: Iterator + ?Sized> Iterator for &mut I { | |
| type Item = I::Item; | |
| fn next(&mut self) -> Option<I::Item> { | |
| (**self).next() | |
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heh.
wdyt should we just add the #[inline] or leave a FIXME comment explaining the performance regression if we use fold instead of try_fold? either way is fine for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a FIXME, changing inlining on such central methods can have mixed impact on compile time even if runtime performance is better, so that should be done on a separate PR.
| 
           @bors r+ rollup=iffy  | 
    
| 
           📌 Commit 2276c5e has been approved by   | 
    
| 
           ☀️ Test successful - checks-actions  | 
    
inline next() on &mut Iterator impl In [rust-lang#87431](https://github.com/rust-lang/rust/pull/87431/files#diff-79a6b417b85ecf4f1a4ef2235135fedf540199caf6e9e1d154ac6a413b40a757R132-R136) I found that `(&mut range).fold` doesn't optimize well because the default impl for for `fold` on `&mut Iterator` doesn't inline `next`. In that particular case it was worked around by using `try_fold` which takes a `&mut self` instead of `self`. Let's see if this can be fixed more broadly.
With #87168 flattening
array::IntoIters is nowTrustedLen, theFromIteratorimplementation forVechas a specialization forTrustedLeniterators which uses internal iteration. This implements one of the main internal iteration methods onarray::Intoto optimize the combination of those two features.This should address the main issue in #87411