Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vec::dedup_by optimization #82191

Merged
merged 7 commits into from
Mar 18, 2021
Merged

Vec::dedup_by optimization #82191

merged 7 commits into from
Mar 18, 2021

Conversation

Soveu
Copy link
Contributor

@Soveu Soveu commented Feb 16, 2021

Now Vec::dedup_by drops items in-place as it goes through them.
From my benchmarks, it is around 10% faster when T is small, with no major regression when otherwise.

I used ptr::copy instead of conditional ptr::copy_nonoverlapping, because the latter had some weird performance issues on my ryzen laptop (it was 50% slower on it than on intel/sandybridge laptop)
It would be good if someone was able to reproduce these results.

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @sfackler (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 16, 2021
@SkiFire13
Copy link
Contributor

same_bucket could panic and your implementation would leave the Vec in an invalid and unsound state.

@Soveu
Copy link
Contributor Author

Soveu commented Feb 17, 2021

Would be a self.set_len(1) good enough or should it drop self[read..] and self.set_len(write) on panic?

@SkiFire13
Copy link
Contributor

SkiFire13 commented Feb 17, 2021

self.set_len(1) will result in a leak. It's not the end of the world and some stdlib methods actually leak on panic, but if you can you should avoid it.

Dropping self[read..] and setting self.set_len(write) solve the problem but this involves dropping self[read..]` which is still valid and could be unexpected.

IMO it would be better to remove (but not drop!) self[write..read]. Usually things like this are done with a guard struct that removes the range when it's dropped. At the end of the loop it is usually mem::forgeted, so that the drop happens only when the user provided closure panics.

@crlf0710 crlf0710 added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 5, 2021
@Dylan-DPC-zz
Copy link

@sfackler this is ready for review

Copy link
Member

@nagisa nagisa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this changeset needs a significant test coverage (which this function I believe lacks right now). In particular testing cases when the comparator panics is something we're lacking.

Also benchmarks or machine code comparisons would be helpful as well.

library/alloc/src/vec/mod.rs Show resolved Hide resolved
library/alloc/src/vec/mod.rs Show resolved Hide resolved
library/alloc/src/vec/mod.rs Show resolved Hide resolved
library/alloc/src/vec/mod.rs Show resolved Hide resolved
library/alloc/src/vec/mod.rs Outdated Show resolved Hide resolved
@Soveu
Copy link
Contributor Author

Soveu commented Mar 15, 2021

Also benchmarks or machine code comparisons would be helpful as well.

I have measured the performance to be around 10-20% better on average

@nagisa
Copy link
Member

nagisa commented Mar 15, 2021

Yeah, I mean demonstrating the improvement with some output from benchmark runs (together with links to or commits containing said benchmark code)

@@ -2141,4 +2267,4 @@ fn test_extend_from_within_panicing_clone() {
std::panic::catch_unwind(move || vec.extend_from_within(..)).unwrap_err();

assert_eq!(count.load(Ordering::SeqCst), 4);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have a trailing newline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think x.py fmt removes it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, that's surprising. Would be the first time I see it doing so.

@nagisa nagisa assigned nagisa and unassigned sfackler Mar 15, 2021
@Soveu
Copy link
Contributor Author

Soveu commented Mar 15, 2021

Yeah, I mean demonstrating the improvement with some output from benchmark runs (together with links to or commits containing said benchmark code)

Should I include the benchmark inside alloc/benches/vec.rs?

@nagisa
Copy link
Member

nagisa commented Mar 15, 2021

Yeah, that sounds like a good place for it.

@Soveu
Copy link
Contributor Author

Soveu commented Mar 16, 2021

Note 1: tests from built-in bench don't run as long as mine, so the difference depends on where clock boost went off
Note 2: AMD suffers from a issue, where weirdly-aligned functions take longer to call, polluting benchmarks with short functions

test vec::bench_vec_dedup_new_100        ... bench:          74 ns/iter (+/- 0) = 5405 MB/s
test vec::bench_vec_dedup_new_1000       ... bench:         821 ns/iter (+/- 12) = 4872 MB/s
test vec::bench_vec_dedup_new_10000      ... bench:      11,030 ns/iter (+/- 177) = 3626 MB/s
test vec::bench_vec_dedup_new_100000     ... bench:     377,518 ns/iter (+/- 4,338) = 1059 MB/s

test vec::bench_vec_dedup_old_100        ... bench:          94 ns/iter (+/- 0) = 4255 MB/s
test vec::bench_vec_dedup_old_1000       ... bench:         985 ns/iter (+/- 29) = 4060 MB/s
test vec::bench_vec_dedup_old_10000      ... bench:      14,521 ns/iter (+/- 56) = 2754 MB/s
test vec::bench_vec_dedup_old_100000     ... bench:     399,026 ns/iter (+/- 209,664) = 1002 MB/s

Functions that generate input are (roughly) the same

This is the graph from my tests, x axis is vec.len(), y axis is ratio of time to finish
(ratio = dedup from this PR / dedup from std, so lower = better)
finalplot
(each test ran for 20 million times or for 16 seconds)

@rust-log-analyzer
Copy link
Collaborator

The job mingw-check failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

@nagisa
Copy link
Member

nagisa commented Mar 16, 2021

@bors r+

Thank you!

@bors
Copy link
Contributor

bors commented Mar 16, 2021

📌 Commit b0092bc has been approved by nagisa

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 16, 2021
@Soveu
Copy link
Contributor Author

Soveu commented Mar 16, 2021

btw, the failing check is not my fault 😅

failed to get 200 response from `https://crates.io/api/v1/crates/crc32fast/1.2.1/download`, got 503

@Neutron3529
Copy link
Contributor

btw, the failing check is not my fault sweat_smile

failed to get 200 response from `https://crates.io/api/v1/crates/crc32fast/1.2.1/download`, got 503

try add some comment to trigger the check again?

@nagisa
Copy link
Member

nagisa commented Mar 16, 2021

no need to bother, that's just the pre-land check which is not super relevant at this point. bors will get to it eventually.

Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request Mar 17, 2021
Vec::dedup_by optimization

Now `Vec::dedup_by` drops items in-place as it goes through them.
From my benchmarks, it is around 10% faster when T is small, with no major regression when otherwise.

I used `ptr::copy` instead of conditional `ptr::copy_nonoverlapping`, because the latter had some weird performance issues on my ryzen laptop (it was 50% slower on it than on intel/sandybridge laptop)
It would be good if someone was able to reproduce these results.
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request Mar 17, 2021
Vec::dedup_by optimization

Now `Vec::dedup_by` drops items in-place as it goes through them.
From my benchmarks, it is around 10% faster when T is small, with no major regression when otherwise.

I used `ptr::copy` instead of conditional `ptr::copy_nonoverlapping`, because the latter had some weird performance issues on my ryzen laptop (it was 50% slower on it than on intel/sandybridge laptop)
It would be good if someone was able to reproduce these results.
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 17, 2021
Rollup of 11 pull requests

Successful merges:

 - rust-lang#82191 (Vec::dedup_by optimization)
 - rust-lang#82270 (Emit error when trying to use assembler syntax directives in `asm!`)
 - rust-lang#82434 (Add more links between hash and btree collections)
 - rust-lang#83080 (Make source-based code coverage compatible with MIR inlining)
 - rust-lang#83168 (Extend `proc_macro_back_compat` lint to `procedural-masquerade`)
 - rust-lang#83192 (ci/docker: Add SDK/NDK level 21 to android docker for 32bit platforms)
 - rust-lang#83204 (Simplify C compilation for Fortanix-SGX target)
 - rust-lang#83216 (Allow registering tool lints with `register_tool`)
 - rust-lang#83223 (Display error details when a `mmap` call fails)
 - rust-lang#83228 (Don't show HTML diff if tidy isn't installed for rustdoc tests)
 - rust-lang#83231 (Switch riscvgc-unknown-none-elf use lp64d ABI)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 90797ef into rust-lang:master Mar 18, 2021
@rustbot rustbot added this to the 1.52.0 milestone Mar 18, 2021
fn random_sorted_fill(mut seed: u32, buf: &mut [u32]) {
let mask = if buf.len() < 8192 {
0xFF
} else if buf.len() < 200_000 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 200000?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's mostly there to not have too much duplicates, idk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.