Improve performance of spsc_queue and stream. #44963

JLockerman · 2017-10-01T19:17:22Z

This PR makes two main changes:

It switches the spsc_queue node caching strategy from keeping a shared
counter of the number of nodes in the cache to keeping a consumer only counter
of the number of node eligible to be cached.
It separates the consumer and producers fields of spsc_queue and stream into
a producer cache line and consumer cache line.

Overall, it speeds up mpsc in spsc mode by 2-10x.
Variance is higher than I'd like (that 2-10x speedup is on one benchmark), I believe this is due to the drop check in send (fn stream::Queue::send:107). I think this check can be combined with the sleep detection code into a version which only uses 1 shared variable, and only one atomic access per send, but I haven't looked through the select implementation enough to be sure.

The code currently assumes a cache line size of 64 bytes. I added a CacheAligned newtype in mpsc which I expect to reuse for shared. It doesn't really belong there, it would probably be best put in core::sync::atomic, but putting it in core would involve making it public, which I thought would require an RFC.

Benchmark runner is here, benchmarks here.

Fixes #44512.

This commit makes two main changes. 1. It switches the spsc_queue node caching strategy from keeping a shared counter of the number of nodes in the cache to keeping a consumer only counter of the number of node eligible to be cached. 2. It separate the consumer and producers fields of spsc_queue and stream into a producer cache line and consumer cache line.

rust-highfive · 2017-10-01T19:17:37Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @sfackler (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

sfackler · 2017-10-01T19:18:11Z

r? @alexcrichton

alexcrichton · 2017-10-05T15:34:27Z

@bors: r+

Looks great to me, thanks so much @JLockerman!

bors · 2017-10-05T15:34:28Z

📌 Commit 68341a9 has been approved by alexcrichton

bors · 2017-10-06T08:00:26Z

⌛ Testing commit 68341a9 with merge 4940175219654548bf6734b9e69f330307621ea4...

bors · 2017-10-06T10:20:07Z

💔 Test failed - status-travis

kennytm · 2017-10-07T06:54:39Z

src/libstd/sync/mpsc/spsc_queue.rs

    ///               no bound. Otherwise, the cache will never grow larger than
    ///               `bound` (although the queue itself could be much larger.
    pub unsafe fn new(bound: usize) -> Queue<T> {
+        Self::with_additions(bound, (), ())


This method is never used on asm.js (Emscripten), causing an unused warning when testing stage2-libstd.

[02:11:04] Testing libstd stage2 (x86_64-unknown-linux-gnu -> asmjs-unknown-emscripten) [02:11:04] Compiling rand v0.0.0 (file:///checkout/src/librand) [02:11:04] Compiling std_unicode v0.0.0 (file:///checkout/src/libstd_unicode) [02:11:04] Compiling alloc v0.0.0 (file:///checkout/src/liballoc) [02:11:04] Compiling core v0.0.0 (file:///checkout/src/libcore) [02:11:12] Compiling collections v0.0.0 (file:///checkout/src/libcollections) [02:11:21] Compiling std v0.0.0 (file:///checkout/src/libstd) [02:11:50] error: method is never used: `new` [02:11:50] --> /checkout/src/libstd/sync/mpsc/spsc_queue.rs:97:5 [02:11:50] | [02:11:50] 97 | pub unsafe fn new(bound: usize) -> Queue<T> { [02:11:50] | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [02:11:50] | [02:11:50] note: lint level defined here [02:11:50] --> /checkout/src/libstd/lib.rs:232:9 [02:11:50] | [02:11:50] 232| #![deny(warnings)] [02:11:50] | ^^^^^^^^ [02:11:50] = note: #[deny(dead_code)] implied by #[deny(warnings)] [02:11:50] [02:11:51] error: aborting due to previous error [02:11:51] [02:11:52] error: Could not compile `std`. [02:11:52] warning: build failed, waiting for other jobs to finish... [02:12:49] error: build failed [02:12:49] [02:12:49] [02:12:49] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "test" "--target" "asmjs-unknown-emscripten" "-j" "4" "--release" "--locked" "--color" "always" "--features" "panic-unwind jemalloc backtrace" "--manifest-path" "/checkout/src/libstd/Cargo.toml" "-p" "std:0.0.0" "-p" "std_unicode:0.0.0" "-p" "alloc:0.0.0" "-p" "panic_abort:0.0.0" "-p" "rand:0.0.0" "-p" "compiler_builtins:0.0.0" "-p" "unwind:0.0.0" "-p" "core:0.0.0" "-p" "libc:0.0.0" "-p" "collections:0.0.0" "-p" "alloc_system:0.0.0" "--" [02:12:49] expected success, got: exit code: 101 [02:12:49] [02:12:49] [02:12:49] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test --target asmjs-unknown-emscripten [02:12:49] Build completed unsuccessfully in 2:10:23

I've cfg'd it out for emscripten in 41320fa

Queue::new is only used is tests atm, which causes warnings on emscripten which does not run queue tests.

sfackler · 2017-10-08T22:57:20Z

Tidy error: https://travis-ci.org/rust-lang/rust/builds/285254063?utm_source=github_status&utm_medium=notification

JLockerman · 2017-10-09T00:22:58Z

Ok, I removed Queue::new and switched the tests to Queue::with_additions, it passes all tests and tidy locally. Sorry about the noise.

alexcrichton · 2017-10-09T11:00:27Z

@bors: r+

bors · 2017-10-09T11:00:28Z

📌 Commit bb7945e has been approved by alexcrichton

bors · 2017-10-10T23:14:19Z

⌛ Testing commit bb7945e with merge 5d72c30e81f2bf999103d4cfd7a5ad2c06ce31e9...

bors · 2017-10-10T23:32:16Z

💔 Test failed - status-travis

kennytm · 2017-10-11T06:32:08Z

@bors retry

Android SDK HTTPS issue (fixed in ci: Fix installing the Android SDK #45193)

bors · 2017-10-11T11:10:19Z

⌛ Testing commit bb7945e with merge 4426e10...

Improve performance of spsc_queue and stream. This PR makes two main changes: 1. It switches the `spsc_queue` node caching strategy from keeping a shared counter of the number of nodes in the cache to keeping a consumer only counter of the number of node eligible to be cached. 2. It separates the consumer and producers fields of `spsc_queue` and `stream` into a producer cache line and consumer cache line. Overall, it speeds up `mpsc` in `spsc` mode by 2-10x. Variance is higher than I'd like (that 2-10x speedup is on one benchmark), I believe this is due to the drop check in `send` (`fn stream::Queue::send:107`). I think this check can be combined with the sleep detection code into a version which only uses 1 shared variable, and only one atomic access per `send`, but I haven't looked through the select implementation enough to be sure. The code currently assumes a cache line size of 64 bytes. I added a CacheAligned newtype in `mpsc` which I expect to reuse for `shared`. It doesn't really belong there, it would probably be best put in `core::sync::atomic`, but putting it in `core` would involve making it public, which I thought would require an RFC. Benchmark runner is [here](https://github.com/JLockerman/queues/tree/3eca46279c53eb75833c5ecd416de2ac220bd022/shootout), benchmarks [here](https://github.com/JLockerman/queues/blob/3eca46279c53eb75833c5ecd416de2ac220bd022/queue_bench/src/lib.rs#L170-L293). Fixes #44512.

bors · 2017-10-11T13:26:06Z

💔 Test failed - status-travis

kennytm · 2017-10-11T14:13:03Z

@bors retry #43283

android timed out.

[01:31:43] test process::tests::test_process_output_fail_to_start has been running for over 60 seconds


No output has been received in the last 30m0s, this potentially indicates a stalled build or something wrong with the build itself.
Check the details on how to adjust your build configuration on: https://docs.travis-ci.com/user/common-build-problems/#Build-times-out-because-no-output-was-received

The build has been terminated

bors · 2017-10-11T19:32:25Z

⌛ Testing commit bb7945e with merge a47c9f8...

Improve performance of spsc_queue and stream. This PR makes two main changes: 1. It switches the `spsc_queue` node caching strategy from keeping a shared counter of the number of nodes in the cache to keeping a consumer only counter of the number of node eligible to be cached. 2. It separates the consumer and producers fields of `spsc_queue` and `stream` into a producer cache line and consumer cache line. Overall, it speeds up `mpsc` in `spsc` mode by 2-10x. Variance is higher than I'd like (that 2-10x speedup is on one benchmark), I believe this is due to the drop check in `send` (`fn stream::Queue::send:107`). I think this check can be combined with the sleep detection code into a version which only uses 1 shared variable, and only one atomic access per `send`, but I haven't looked through the select implementation enough to be sure. The code currently assumes a cache line size of 64 bytes. I added a CacheAligned newtype in `mpsc` which I expect to reuse for `shared`. It doesn't really belong there, it would probably be best put in `core::sync::atomic`, but putting it in `core` would involve making it public, which I thought would require an RFC. Benchmark runner is [here](https://github.com/JLockerman/queues/tree/3eca46279c53eb75833c5ecd416de2ac220bd022/shootout), benchmarks [here](https://github.com/JLockerman/queues/blob/3eca46279c53eb75833c5ecd416de2ac220bd022/queue_bench/src/lib.rs#L170-L293). Fixes #44512.

bors · 2017-10-11T21:55:16Z

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing a47c9f8 to master...

arthurprs · 2017-10-18T08:01:45Z

src/libstd/sync/mpsc/cache_aligned.rs

+pub(super) struct Aligner;
+
+#[derive(Copy, Clone, Default, PartialEq, Eq, PartialOrd, Ord, Hash)]
+pub(super) struct CacheAligned<T>(pub T, pub Aligner);


Nit: I think this can be just

#[derive(Copy, Clone, Default, PartialEq, Eq, PartialOrd, Ord, Hash)] #[repr(align(64))] pub(super) struct CacheAligned<T>(pub T);

@arthurprs

The `Aligner` struct seems to be unnecessary. Previously noted by @arthurprs rust-lang#44963 (comment) Reddit discussion: https://www.reddit.com/r/rust/comments/pfvvz2/aligner_and_cachealigned/ Playground: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=fa7ca554922755f9d1b62b017d785c6f

Remove redundant Aligner The `Aligner` struct seems to be unnecessary. Previously noted by `@arthurprs` rust-lang#44963 (comment) Reddit discussion: https://www.reddit.com/r/rust/comments/pfvvz2/aligner_and_cachealigned/ Playground: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=fa7ca554922755f9d1b62b017d785c6f

rust-highfive assigned sfackler Oct 1, 2017

rust-highfive assigned alexcrichton and unassigned sfackler Oct 1, 2017

carols10cents added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 2, 2017

kennytm requested changes Oct 7, 2017

View reviewed changes

cfg out Queue::new for emscripten

41320fa

Queue::new is only used is tests atm, which causes warnings on emscripten which does not run queue tests.

Remove Queue::new.

bb7945e

bors merged commit bb7945e into rust-lang:master Oct 11, 2017

arthurprs reviewed Oct 18, 2017

View reviewed changes

bluss added the relnotes Marks issues that should be documented in the release notes of the next release. label Oct 26, 2017

tonyyzy mentioned this pull request Oct 25, 2021

Remove redundant Aligner #90284

Merged

Improve performance of spsc_queue and stream. #44963

Improve performance of spsc_queue and stream. #44963

Uh oh!

Conversation

JLockerman commented Oct 1, 2017

Uh oh!

rust-highfive commented Oct 1, 2017

Uh oh!

sfackler commented Oct 1, 2017

Uh oh!

alexcrichton commented Oct 5, 2017

Uh oh!

bors commented Oct 5, 2017

Uh oh!

bors commented Oct 6, 2017

Uh oh!

bors commented Oct 6, 2017

Uh oh!

kennytm Oct 7, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JLockerman Oct 8, 2017

Choose a reason for hiding this comment

Uh oh!

sfackler commented Oct 8, 2017

Uh oh!

JLockerman commented Oct 9, 2017

Uh oh!

alexcrichton commented Oct 9, 2017

Uh oh!

bors commented Oct 9, 2017

Uh oh!

bors commented Oct 10, 2017

Uh oh!

bors commented Oct 10, 2017

Uh oh!

kennytm commented Oct 11, 2017

Uh oh!

bors commented Oct 11, 2017

Uh oh!

bors commented Oct 11, 2017

Uh oh!

kennytm commented Oct 11, 2017

Uh oh!

bors commented Oct 11, 2017

Uh oh!

bors commented Oct 11, 2017

Uh oh!

arthurprs Oct 18, 2017

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

kennytm Oct 7, 2017 •

edited

Loading