Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of spsc_queue and stream. #44963

Merged
merged 3 commits into from
Oct 11, 2017

Conversation

JLockerman
Copy link
Contributor

This PR makes two main changes:

  1. It switches the spsc_queue node caching strategy from keeping a shared
    counter of the number of nodes in the cache to keeping a consumer only counter
    of the number of node eligible to be cached.
  2. It separates the consumer and producers fields of spsc_queue and stream into
    a producer cache line and consumer cache line.

Overall, it speeds up mpsc in spsc mode by 2-10x.
Variance is higher than I'd like (that 2-10x speedup is on one benchmark), I believe this is due to the drop check in send (fn stream::Queue::send:107). I think this check can be combined with the sleep detection code into a version which only uses 1 shared variable, and only one atomic access per send, but I haven't looked through the select implementation enough to be sure.

The code currently assumes a cache line size of 64 bytes. I added a CacheAligned newtype in mpsc which I expect to reuse for shared. It doesn't really belong there, it would probably be best put in core::sync::atomic, but putting it in core would involve making it public, which I thought would require an RFC.

Benchmark runner is here, benchmarks here.

Fixes #44512.

This commit makes two main changes.
1. It switches the spsc_queue node caching strategy from keeping a shared
counter of the number of nodes in the cache to keeping a consumer only counter
of the number of node eligible to be cached.
2. It separate the consumer and producers fields of spsc_queue and stream into
a producer cache line and consumer cache line.
@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @sfackler (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@sfackler
Copy link
Member

sfackler commented Oct 1, 2017

r? @alexcrichton

@carols10cents carols10cents added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 2, 2017
@alexcrichton
Copy link
Member

@bors: r+

Looks great to me, thanks so much @JLockerman!

@bors
Copy link
Contributor

bors commented Oct 5, 2017

📌 Commit 68341a9 has been approved by alexcrichton

@bors
Copy link
Contributor

bors commented Oct 6, 2017

⌛ Testing commit 68341a9 with merge 4940175219654548bf6734b9e69f330307621ea4...

@bors
Copy link
Contributor

bors commented Oct 6, 2017

💔 Test failed - status-travis

@@ -84,18 +95,60 @@ impl<T> Queue<T> {
/// no bound. Otherwise, the cache will never grow larger than
/// `bound` (although the queue itself could be much larger.
pub unsafe fn new(bound: usize) -> Queue<T> {
Self::with_additions(bound, (), ())
Copy link
Member

@kennytm kennytm Oct 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is never used on asm.js (Emscripten), causing an unused warning when testing stage2-libstd.

[02:11:04] Testing libstd stage2 (x86_64-unknown-linux-gnu -> asmjs-unknown-emscripten)
[02:11:04]    Compiling rand v0.0.0 (file:///checkout/src/librand)
[02:11:04]    Compiling std_unicode v0.0.0 (file:///checkout/src/libstd_unicode)
[02:11:04]    Compiling alloc v0.0.0 (file:///checkout/src/liballoc)
[02:11:04]    Compiling core v0.0.0 (file:///checkout/src/libcore)
[02:11:12]    Compiling collections v0.0.0 (file:///checkout/src/libcollections)
[02:11:21]    Compiling std v0.0.0 (file:///checkout/src/libstd)
[02:11:50] error: method is never used: `new`
[02:11:50]   --> /checkout/src/libstd/sync/mpsc/spsc_queue.rs:97:5
[02:11:50]    |
[02:11:50] 97 |     pub unsafe fn new(bound: usize) -> Queue<T> {
[02:11:50]    |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[02:11:50]    |
[02:11:50] note: lint level defined here
[02:11:50]   --> /checkout/src/libstd/lib.rs:232:9
[02:11:50]    |
[02:11:50] 232| #![deny(warnings)]
[02:11:50]    |         ^^^^^^^^
[02:11:50]    = note: #[deny(dead_code)] implied by #[deny(warnings)]
[02:11:50] 
[02:11:51] error: aborting due to previous error
[02:11:51] 
[02:11:52] error: Could not compile `std`.
[02:11:52] warning: build failed, waiting for other jobs to finish...
[02:12:49] error: build failed
[02:12:49] 
[02:12:49] 
[02:12:49] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "test" "--target" "asmjs-unknown-emscripten" "-j" "4" "--release" "--locked" "--color" "always" "--features" "panic-unwind jemalloc backtrace" "--manifest-path" "/checkout/src/libstd/Cargo.toml" "-p" "std:0.0.0" "-p" "std_unicode:0.0.0" "-p" "alloc:0.0.0" "-p" "panic_abort:0.0.0" "-p" "rand:0.0.0" "-p" "compiler_builtins:0.0.0" "-p" "unwind:0.0.0" "-p" "core:0.0.0" "-p" "libc:0.0.0" "-p" "collections:0.0.0" "-p" "alloc_system:0.0.0" "--"
[02:12:49] expected success, got: exit code: 101
[02:12:49] 
[02:12:49] 
[02:12:49] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test --target asmjs-unknown-emscripten
[02:12:49] Build completed unsuccessfully in 2:10:23

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've cfg'd it out for emscripten in 41320fa

Queue::new is only used is tests atm, which causes warnings on emscripten which does not run queue tests.
@sfackler
Copy link
Member

sfackler commented Oct 8, 2017

@JLockerman
Copy link
Contributor Author

Ok, I removed Queue::new and switched the tests to Queue::with_additions, it passes all tests and tidy locally. Sorry about the noise.

@alexcrichton
Copy link
Member

@bors: r+

@bors
Copy link
Contributor

bors commented Oct 9, 2017

📌 Commit bb7945e has been approved by alexcrichton

@bors
Copy link
Contributor

bors commented Oct 10, 2017

⌛ Testing commit bb7945e with merge 5d72c30e81f2bf999103d4cfd7a5ad2c06ce31e9...

@bors
Copy link
Contributor

bors commented Oct 10, 2017

💔 Test failed - status-travis

@kennytm
Copy link
Member

kennytm commented Oct 11, 2017

@bors retry

@bors
Copy link
Contributor

bors commented Oct 11, 2017

⌛ Testing commit bb7945e with merge 4426e10...

bors added a commit that referenced this pull request Oct 11, 2017
Improve performance of spsc_queue and stream.

This PR makes two main changes:

1. It switches the `spsc_queue` node caching strategy from keeping a shared
counter of the number of nodes in the cache to keeping a consumer only counter
of the number of node eligible to be cached.
2. It separates the consumer and producers fields of `spsc_queue` and `stream` into
a producer cache line and consumer cache line.

Overall, it speeds up `mpsc` in `spsc` mode by 2-10x.
Variance is higher than I'd like (that 2-10x speedup is on one benchmark), I believe this is due to the drop check in `send` (`fn stream::Queue::send:107`). I think this check can be combined with the sleep detection code into a version which only uses 1 shared variable, and only one atomic access per `send`, but I haven't looked through the select implementation enough to be sure.

The code currently assumes a cache line size of 64 bytes. I added a CacheAligned newtype in `mpsc` which I expect to reuse for `shared`. It doesn't really belong there, it would probably be best put in `core::sync::atomic`, but putting it in `core` would involve making it public, which I thought would require an RFC.

Benchmark runner is [here](https://github.com/JLockerman/queues/tree/3eca46279c53eb75833c5ecd416de2ac220bd022/shootout), benchmarks [here](https://github.com/JLockerman/queues/blob/3eca46279c53eb75833c5ecd416de2ac220bd022/queue_bench/src/lib.rs#L170-L293).

Fixes #44512.
@bors
Copy link
Contributor

bors commented Oct 11, 2017

💔 Test failed - status-travis

@kennytm
Copy link
Member

kennytm commented Oct 11, 2017

@bors retry #43283

android timed out.

[01:31:43] test process::tests::test_process_output_fail_to_start has been running for over 60 seconds


No output has been received in the last 30m0s, this potentially indicates a stalled build or something wrong with the build itself.
Check the details on how to adjust your build configuration on: https://docs.travis-ci.com/user/common-build-problems/#Build-times-out-because-no-output-was-received

The build has been terminated

@bors
Copy link
Contributor

bors commented Oct 11, 2017

⌛ Testing commit bb7945e with merge a47c9f8...

bors added a commit that referenced this pull request Oct 11, 2017
Improve performance of spsc_queue and stream.

This PR makes two main changes:

1. It switches the `spsc_queue` node caching strategy from keeping a shared
counter of the number of nodes in the cache to keeping a consumer only counter
of the number of node eligible to be cached.
2. It separates the consumer and producers fields of `spsc_queue` and `stream` into
a producer cache line and consumer cache line.

Overall, it speeds up `mpsc` in `spsc` mode by 2-10x.
Variance is higher than I'd like (that 2-10x speedup is on one benchmark), I believe this is due to the drop check in `send` (`fn stream::Queue::send:107`). I think this check can be combined with the sleep detection code into a version which only uses 1 shared variable, and only one atomic access per `send`, but I haven't looked through the select implementation enough to be sure.

The code currently assumes a cache line size of 64 bytes. I added a CacheAligned newtype in `mpsc` which I expect to reuse for `shared`. It doesn't really belong there, it would probably be best put in `core::sync::atomic`, but putting it in `core` would involve making it public, which I thought would require an RFC.

Benchmark runner is [here](https://github.com/JLockerman/queues/tree/3eca46279c53eb75833c5ecd416de2ac220bd022/shootout), benchmarks [here](https://github.com/JLockerman/queues/blob/3eca46279c53eb75833c5ecd416de2ac220bd022/queue_bench/src/lib.rs#L170-L293).

Fixes #44512.
@bors
Copy link
Contributor

bors commented Oct 11, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing a47c9f8 to master...

@bors bors merged commit bb7945e into rust-lang:master Oct 11, 2017
pub(super) struct Aligner;

#[derive(Copy, Clone, Default, PartialEq, Eq, PartialOrd, Ord, Hash)]
pub(super) struct CacheAligned<T>(pub T, pub Aligner);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think this can be just

#[derive(Copy, Clone, Default, PartialEq, Eq, PartialOrd, Ord, Hash)]
#[repr(align(64))]
pub(super) struct CacheAligned<T>(pub T);

@bluss bluss added the relnotes Marks issues that should be documented in the release notes of the next release. label Oct 26, 2017
bors added a commit to rust-lang-ci/rust that referenced this pull request Oct 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
relnotes Marks issues that should be documented in the release notes of the next release. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants