Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement fold() on array::IntoIter to improve flatten().collect() perf #87431

Merged
merged 2 commits into from
Jul 27, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions library/alloc/benches/vec.rs
Original file line number Diff line number Diff line change
Expand Up @@ -726,3 +726,9 @@ fn bench_dedup_old_100000(b: &mut Bencher) {
fn bench_dedup_new_100000(b: &mut Bencher) {
bench_vec_dedup_new(b, 100000);
}

#[bench]
fn bench_flat_map_collect(b: &mut Bencher) {
let v = vec![777u32; 500000];
b.iter(|| v.iter().flat_map(|color| color.rotate_left(8).to_be_bytes()).collect::<Vec<_>>());
}
21 changes: 21 additions & 0 deletions library/core/src/array/iter.rs
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,27 @@ impl<T, const N: usize> Iterator for IntoIter<T, N> {
(len, Some(len))
}

#[inline]
fn fold<Acc, Fold>(mut self, init: Acc, mut fold: Fold) -> Acc
where
Fold: FnMut(Acc, Self::Item) -> Acc,
{
let data = &mut self.data;
// FIXME: This uses try_fold(&mut iter) instead of fold(iter) because the latter
// would go through the blanket `impl Iterator for &mut I` implementation
// which lacks inline annotations on its methods and adding those would be a larger
// perturbation than using try_fold here.
// Whether it would be beneficial to add those annotations should be investigated separately.
(&mut self.alive)
.try_fold::<_, _, Result<_, !>>(init, |acc, idx| {
// SAFETY: idx is obtained by folding over the `alive` range, which implies the
// value is currently considered alive but as the range is being consumed each value
// we read here will only be read once and then considered dead.
Ok(fold(acc, unsafe { data.get_unchecked(idx).assume_init_read() }))
})
.unwrap()
Comment on lines +137 to +144
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we call fold here instead of try_fold?

Suggested change
(&mut self.alive)
.try_fold::<_, _, Result<_, !>>(init, |acc, idx| {
// SAFETY: idx is obtained by folding over the `alive` range, which implies the
// value is currently considered alive but as the range is being consumed each value
// we read here will only be read once and then considered dead.
Ok(fold(acc, unsafe { data.get_unchecked(idx).assume_init_read() }))
})
.unwrap()
self.alive.fold(init, |acc, idx| {
// SAFETY: idx is obtained by folding over the `alive` range, which implies the
// value is currently considered alive but as the range is being consumed each value
// we read here will only be read once and then considered dead.
fold(acc, unsafe { data.get_unchecked(idx).assume_init_read() })
})

Copy link
Member Author

@the8472 the8472 Jul 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that, array::IntoIter has a Drop impl, so alive can't be move out, but that would be required to call fold(self), that's why I used try_fold instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@the8472 oops right.

(&mut self.alive).fold(init, ...) should work though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah but that would go through impl Iterator for &mut I which is less optimized.

Copy link
Member

@kennytm kennytm Jul 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.alive is a std::ops::Range<usize> and AFAIK there is no special-cased implementation of fold or try_fold for Range<usize> nor &mut Range<usize>.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even if we do the mem::take it will just turn self.alive to 0..0 and then leaks everything which is safe 🙃 (compared with self.alive.clone().fold(...) which will cause double-free).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now tried (&mut self.alive).fold instead of try_fold, it undoes all perfomance gains. I guess somehow the indirection through &mut inhibits optimizations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which is due to this not being #[inline]

impl<I: Iterator + ?Sized> Iterator for &mut I {
type Item = I::Item;
fn next(&mut self) -> Option<I::Item> {
(**self).next()
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heh.

wdyt should we just add the #[inline] or leave a FIXME comment explaining the performance regression if we use fold instead of try_fold? either way is fine for me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a FIXME, changing inlining on such central methods can have mixed impact on compile time even if runtime performance is better, so that should be done on a separate PR.

}

fn count(self) -> usize {
self.len()
}
Expand Down