-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of spsc_queue and stream. #44963
Conversation
This commit makes two main changes. 1. It switches the spsc_queue node caching strategy from keeping a shared counter of the number of nodes in the cache to keeping a consumer only counter of the number of node eligible to be cached. 2. It separate the consumer and producers fields of spsc_queue and stream into a producer cache line and consumer cache line.
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @sfackler (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
@bors: r+ Looks great to me, thanks so much @JLockerman! |
📌 Commit 68341a9 has been approved by |
⌛ Testing commit 68341a9 with merge 4940175219654548bf6734b9e69f330307621ea4... |
💔 Test failed - status-travis |
src/libstd/sync/mpsc/spsc_queue.rs
Outdated
@@ -84,18 +95,60 @@ impl<T> Queue<T> { | |||
/// no bound. Otherwise, the cache will never grow larger than | |||
/// `bound` (although the queue itself could be much larger. | |||
pub unsafe fn new(bound: usize) -> Queue<T> { | |||
Self::with_additions(bound, (), ()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is never used on asm.js (Emscripten), causing an unused warning when testing stage2-libstd
.
[02:11:04] Testing libstd stage2 (x86_64-unknown-linux-gnu -> asmjs-unknown-emscripten)
[02:11:04] Compiling rand v0.0.0 (file:///checkout/src/librand)
[02:11:04] Compiling std_unicode v0.0.0 (file:///checkout/src/libstd_unicode)
[02:11:04] Compiling alloc v0.0.0 (file:///checkout/src/liballoc)
[02:11:04] Compiling core v0.0.0 (file:///checkout/src/libcore)
[02:11:12] Compiling collections v0.0.0 (file:///checkout/src/libcollections)
[02:11:21] Compiling std v0.0.0 (file:///checkout/src/libstd)
[02:11:50] error: method is never used: `new`
[02:11:50] --> /checkout/src/libstd/sync/mpsc/spsc_queue.rs:97:5
[02:11:50] |
[02:11:50] 97 | pub unsafe fn new(bound: usize) -> Queue<T> {
[02:11:50] | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[02:11:50] |
[02:11:50] note: lint level defined here
[02:11:50] --> /checkout/src/libstd/lib.rs:232:9
[02:11:50] |
[02:11:50] 232| #![deny(warnings)]
[02:11:50] | ^^^^^^^^
[02:11:50] = note: #[deny(dead_code)] implied by #[deny(warnings)]
[02:11:50]
[02:11:51] error: aborting due to previous error
[02:11:51]
[02:11:52] error: Could not compile `std`.
[02:11:52] warning: build failed, waiting for other jobs to finish...
[02:12:49] error: build failed
[02:12:49]
[02:12:49]
[02:12:49] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "test" "--target" "asmjs-unknown-emscripten" "-j" "4" "--release" "--locked" "--color" "always" "--features" "panic-unwind jemalloc backtrace" "--manifest-path" "/checkout/src/libstd/Cargo.toml" "-p" "std:0.0.0" "-p" "std_unicode:0.0.0" "-p" "alloc:0.0.0" "-p" "panic_abort:0.0.0" "-p" "rand:0.0.0" "-p" "compiler_builtins:0.0.0" "-p" "unwind:0.0.0" "-p" "core:0.0.0" "-p" "libc:0.0.0" "-p" "collections:0.0.0" "-p" "alloc_system:0.0.0" "--"
[02:12:49] expected success, got: exit code: 101
[02:12:49]
[02:12:49]
[02:12:49] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test --target asmjs-unknown-emscripten
[02:12:49] Build completed unsuccessfully in 2:10:23
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've cfg'd it out for emscripten in 41320fa
Queue::new is only used is tests atm, which causes warnings on emscripten which does not run queue tests.
Ok, I removed |
@bors: r+ |
📌 Commit bb7945e has been approved by |
⌛ Testing commit bb7945e with merge 5d72c30e81f2bf999103d4cfd7a5ad2c06ce31e9... |
💔 Test failed - status-travis |
@bors retry
|
Improve performance of spsc_queue and stream. This PR makes two main changes: 1. It switches the `spsc_queue` node caching strategy from keeping a shared counter of the number of nodes in the cache to keeping a consumer only counter of the number of node eligible to be cached. 2. It separates the consumer and producers fields of `spsc_queue` and `stream` into a producer cache line and consumer cache line. Overall, it speeds up `mpsc` in `spsc` mode by 2-10x. Variance is higher than I'd like (that 2-10x speedup is on one benchmark), I believe this is due to the drop check in `send` (`fn stream::Queue::send:107`). I think this check can be combined with the sleep detection code into a version which only uses 1 shared variable, and only one atomic access per `send`, but I haven't looked through the select implementation enough to be sure. The code currently assumes a cache line size of 64 bytes. I added a CacheAligned newtype in `mpsc` which I expect to reuse for `shared`. It doesn't really belong there, it would probably be best put in `core::sync::atomic`, but putting it in `core` would involve making it public, which I thought would require an RFC. Benchmark runner is [here](https://github.com/JLockerman/queues/tree/3eca46279c53eb75833c5ecd416de2ac220bd022/shootout), benchmarks [here](https://github.com/JLockerman/queues/blob/3eca46279c53eb75833c5ecd416de2ac220bd022/queue_bench/src/lib.rs#L170-L293). Fixes #44512.
💔 Test failed - status-travis |
|
Improve performance of spsc_queue and stream. This PR makes two main changes: 1. It switches the `spsc_queue` node caching strategy from keeping a shared counter of the number of nodes in the cache to keeping a consumer only counter of the number of node eligible to be cached. 2. It separates the consumer and producers fields of `spsc_queue` and `stream` into a producer cache line and consumer cache line. Overall, it speeds up `mpsc` in `spsc` mode by 2-10x. Variance is higher than I'd like (that 2-10x speedup is on one benchmark), I believe this is due to the drop check in `send` (`fn stream::Queue::send:107`). I think this check can be combined with the sleep detection code into a version which only uses 1 shared variable, and only one atomic access per `send`, but I haven't looked through the select implementation enough to be sure. The code currently assumes a cache line size of 64 bytes. I added a CacheAligned newtype in `mpsc` which I expect to reuse for `shared`. It doesn't really belong there, it would probably be best put in `core::sync::atomic`, but putting it in `core` would involve making it public, which I thought would require an RFC. Benchmark runner is [here](https://github.com/JLockerman/queues/tree/3eca46279c53eb75833c5ecd416de2ac220bd022/shootout), benchmarks [here](https://github.com/JLockerman/queues/blob/3eca46279c53eb75833c5ecd416de2ac220bd022/queue_bench/src/lib.rs#L170-L293). Fixes #44512.
☀️ Test successful - status-appveyor, status-travis |
pub(super) struct Aligner; | ||
|
||
#[derive(Copy, Clone, Default, PartialEq, Eq, PartialOrd, Ord, Hash)] | ||
pub(super) struct CacheAligned<T>(pub T, pub Aligner); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think this can be just
#[derive(Copy, Clone, Default, PartialEq, Eq, PartialOrd, Ord, Hash)]
#[repr(align(64))]
pub(super) struct CacheAligned<T>(pub T);
The `Aligner` struct seems to be unnecessary. Previously noted by @arthurprs rust-lang#44963 (comment) Reddit discussion: https://www.reddit.com/r/rust/comments/pfvvz2/aligner_and_cachealigned/ Playground: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=fa7ca554922755f9d1b62b017d785c6f
Remove redundant Aligner The `Aligner` struct seems to be unnecessary. Previously noted by `@arthurprs` rust-lang#44963 (comment) Reddit discussion: https://www.reddit.com/r/rust/comments/pfvvz2/aligner_and_cachealigned/ Playground: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=fa7ca554922755f9d1b62b017d785c6f
This PR makes two main changes:
spsc_queue
node caching strategy from keeping a sharedcounter of the number of nodes in the cache to keeping a consumer only counter
of the number of node eligible to be cached.
spsc_queue
andstream
intoa producer cache line and consumer cache line.
Overall, it speeds up
mpsc
inspsc
mode by 2-10x.Variance is higher than I'd like (that 2-10x speedup is on one benchmark), I believe this is due to the drop check in
send
(fn stream::Queue::send:107
). I think this check can be combined with the sleep detection code into a version which only uses 1 shared variable, and only one atomic access persend
, but I haven't looked through the select implementation enough to be sure.The code currently assumes a cache line size of 64 bytes. I added a CacheAligned newtype in
mpsc
which I expect to reuse forshared
. It doesn't really belong there, it would probably be best put incore::sync::atomic
, but putting it incore
would involve making it public, which I thought would require an RFC.Benchmark runner is here, benchmarks here.
Fixes #44512.