Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement fold() on array::IntoIter to improve flatten().collect() perf #87431

Merged
merged 2 commits into from
Jul 27, 2021

Conversation

the8472
Copy link
Member

@the8472 the8472 commented Jul 24, 2021

With #87168 flattening array::IntoIters is now TrustedLen, the FromIterator implementation for Vec has a specialization for TrustedLen iterators which uses internal iteration. This implements one of the main internal iteration methods on array::Into to optimize the combination of those two features.

This should address the main issue in #87411

# old
test vec::bench_flat_map_collect                         ... bench:   2,244,024 ns/iter (+/- 18,903)

# new
test vec::bench_flat_map_collect                         ... bench:     172,863 ns/iter (+/- 2,141)

@rust-highfive
Copy link
Collaborator

r? @kennytm

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 24, 2021
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@the8472 the8472 added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Jul 24, 2021
```
# old
test vec::bench_flat_map_collect                         ... bench:   2,244,024 ns/iter (+/- 18,903)

# new
test vec::bench_flat_map_collect                         ... bench:     172,863 ns/iter (+/- 2,141)
```
@kennytm
Copy link
Member

kennytm commented Jul 24, 2021

while this LGTM, shouldn't the original issue be addressed by implementing SpecExtend<T, std::array::IntoIter<T>> for Vec<T, A>?

@the8472
Copy link
Member Author

the8472 commented Jul 24, 2021

The original issue involved Flatten which results in several adapters sitting between SpecExtend and the IntoIter.

Comment on lines +132 to +139
(&mut self.alive)
.try_fold::<_, _, Result<_, !>>(init, |acc, idx| {
// SAFETY: idx is obtained by folding over the `alive` range, which implies the
// value is currently considered alive but as the range is being consumed each value
// we read here will only be read once and then considered dead.
Ok(fold(acc, unsafe { data.get_unchecked(idx).assume_init_read() }))
})
.unwrap()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we call fold here instead of try_fold?

Suggested change
(&mut self.alive)
.try_fold::<_, _, Result<_, !>>(init, |acc, idx| {
// SAFETY: idx is obtained by folding over the `alive` range, which implies the
// value is currently considered alive but as the range is being consumed each value
// we read here will only be read once and then considered dead.
Ok(fold(acc, unsafe { data.get_unchecked(idx).assume_init_read() }))
})
.unwrap()
self.alive.fold(init, |acc, idx| {
// SAFETY: idx is obtained by folding over the `alive` range, which implies the
// value is currently considered alive but as the range is being consumed each value
// we read here will only be read once and then considered dead.
fold(acc, unsafe { data.get_unchecked(idx).assume_init_read() })
})

Copy link
Member Author

@the8472 the8472 Jul 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that, array::IntoIter has a Drop impl, so alive can't be move out, but that would be required to call fold(self), that's why I used try_fold instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@the8472 oops right.

(&mut self.alive).fold(init, ...) should work though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah but that would go through impl Iterator for &mut I which is less optimized.

Copy link
Member

@kennytm kennytm Jul 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.alive is a std::ops::Range<usize> and AFAIK there is no special-cased implementation of fold or try_fold for Range<usize> nor &mut Range<usize>.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even if we do the mem::take it will just turn self.alive to 0..0 and then leaks everything which is safe 🙃 (compared with self.alive.clone().fold(...) which will cause double-free).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now tried (&mut self.alive).fold instead of try_fold, it undoes all perfomance gains. I guess somehow the indirection through &mut inhibits optimizations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which is due to this not being #[inline]

impl<I: Iterator + ?Sized> Iterator for &mut I {
type Item = I::Item;
fn next(&mut self) -> Option<I::Item> {
(**self).next()
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heh.

wdyt should we just add the #[inline] or leave a FIXME comment explaining the performance regression if we use fold instead of try_fold? either way is fine for me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a FIXME, changing inlining on such central methods can have mixed impact on compile time even if runtime performance is better, so that should be done on a separate PR.

@kennytm
Copy link
Member

kennytm commented Jul 27, 2021

@bors r+ rollup=iffy

@bors
Copy link
Contributor

bors commented Jul 27, 2021

📌 Commit 2276c5e has been approved by kennytm

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 27, 2021
@bors
Copy link
Contributor

bors commented Jul 27, 2021

⌛ Testing commit 2276c5e with merge 99d6692...

@bors
Copy link
Contributor

bors commented Jul 27, 2021

☀️ Test successful - checks-actions
Approved by: kennytm
Pushing 99d6692 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Jul 27, 2021
@bors bors merged commit 99d6692 into rust-lang:master Jul 27, 2021
@rustbot rustbot added this to the 1.56.0 milestone Jul 27, 2021
bors added a commit to rust-lang-ci/rust that referenced this pull request Oct 12, 2021
inline next() on &mut Iterator impl

In [rust-lang#87431](https://github.com/rust-lang/rust/pull/87431/files#diff-79a6b417b85ecf4f1a4ef2235135fedf540199caf6e9e1d154ac6a413b40a757R132-R136)   I found that `(&mut range).fold` doesn't optimize well because the default impl for for `fold` on `&mut Iterator` doesn't inline `next`. In that particular case it was worked around by using `try_fold` which takes a `&mut self` instead of `self`.

Let's see if this can be fixed more broadly.
@lcnr lcnr added the A-const-generics Area: const generics (parameters and arguments) label Dec 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-const-generics Area: const generics (parameters and arguments) merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants