-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement LinearGroupBy with almost only safe Rust #19
Conversation
So I got intrigued by the fact this seems to show a slightly lower perf than the unsafe version in cases where there was one single group, so I went to run some compilation tests:
And contrary to what I expected from looking at the benchmark results, it looks like in the safe version, the compiler managed to optimize the loop inside However, as it turned out, the call to Out of curiosity, I also wrote a slight change that would encourage it to not use the And as a side-note, looking at the assembly made me realize slices are actually not stored as |
Or maybe that wasn't ever discussed. ^^ graydon/rust-prehistory@958d12e#diff-e0b3bbcc496ac9c4177ca1bb4e2a2f365e1e58261d853b11e07ecbd5c830fb64R306-R307 |
Wow! What an in-depth code review :)
You can easily try your changes by piping benchmark results to a file and then comparing them with the benchcmp crate ( cargo +nightly bench --features nightly -- linear_group::bench > base.bench
# make your changes...
cargo +nightly bench --features nightly -- linear_group::bench > update.bench
cargo benchcmp base.bench update.bench
This can also be because it is easier to create a wrong slice by specifying an end pointer lower than the start pointer. But the length value cannot be invalid, as it lives between
So, regarding your analysis, the speed doesn't seem related to the fact that we create a |
Yes indeed, that is completely optimized out by the compiler because the |
Ok, so I copy/paste the Here you can find the test that doesn't give great results with the new version you proposed. It seems that the benchmarks are bad when there are many groups of variable (but short) length, but it is only related to the rev version not the classic one. What's also quite strange is the I maybe spot a jump that is executed a lot of times when the predicate returns false, line 18 of the safe version (top right). This could explain the slowness for short length groups, where this instruction is always executed, maybe dirtying the instruction cache (do jumps do that?). From the fully unsafe version (the current released one) to the fully safe version:
From the new fully safe (simple
But more importantly: From the fully unsafe version (the current released one) to the unsafe
|
37b959e
to
6012549
Compare
6012549
to
107a9a3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. We should probably rename the PR to "Implement LinearGroupBy with almost only safe Rust" before merging it though :)
Hey @Ten0,
Related to #18.
Here are the first benchmark tests I made on my computer by running:
cargo +nightly bench --features nightly -- linear_group::bench > 7b2fb19.bench