-
Notifications
You must be signed in to change notification settings - Fork 847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interesting benchmark results of min_max_helper
#1400
Comments
Interesting, I'm actually seeing the opposite effect, with ` |
More interesting! My desktop uses Intel i7 10700K processors. #[inline]
#[stable(feature = "iterator_fold_self", since = "1.51.0")]
fn reduce<F>(mut self, f: F) -> Option<Self::Item>
where
Self: Sized,
F: FnMut(Self::Item, Self::Item) -> Self::Item,
{
let first = self.next()?;
Some(self.fold(first, f))
} |
Good point, I tried some other variations and it seems to come down to the handling of references vs copying. The following two variations lead to different code:
The second one generates the same code that I saw for |
Strongly agree with you @jhorstmann. if null_count == 0 {
// optimized path for arrays without null values
m.iter()
.copied()
.reduce(|acc, item| if cmp(&acc, &item) { item } else { acc }) and I can get the benchmark result: min 512 time: [833.73 ns 834.34 ns 835.06 ns]
change: [-0.2047% -0.0757% +0.0383%] (p = 0.24 > 0.05)
No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) high mild
1 (1.00%) high severe , which is same as But wait! Which one is faster in this context, copy or reference? On my desktop, I can get 50% performance improvement by using reference. However, you said |
Also, using
min nulls 512 time: [866.42 ns 867.35 ns 868.33 ns]
change: [-1.1100% -0.9494% -0.8086%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
7 (7.00%) high mild
2 (2.00%) high severe
min nulls 512 time: [1.0525 us 1.0551 us 1.0576 us]
change: [+20.470% +20.793% +21.115%] (p = 0.00 < 0.05)
Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild Is it able to reproduce these weird results on your laptop? @jhorstmann |
I can reproduce both results and it seems AMD cpus are better at handling one version of the code and Intel better at the other version. For the non-null benchmarks I get: Intel 10510U Amd 3700U (timings fluctuate a bit more on this laptop) For I'm now actually a bit worried about the correctness of the nullable version, I don't see |
There is at least one valid value in the array. Because we have tested if null_count == array.len() {
return None;
} |
Describe your question
I find some interesting benchmark results when I try to speed up the function
min_max_help
. https://github.com/apache/arrow-rs/blob/master/arrow/src/compute/kernels/aggregate.rs#L115-L130The only thing that I rewrote is replacing
iter().fold()
byiter().reduce()
:Then I ran
to find if there are any changes in performance.
And I got the result:
The results are a little unexpected. And I have 2 questions:
min 512
have 50% performance improvement? I don't thinkiter.reduce
is faster thatiter.fold
min nulls 512
become slower? Thenull_count > 0
code block is not changed.Need your help!
The text was updated successfully, but these errors were encountered: