Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize the oneshot case #199

Open
ghost opened this issue Sep 25, 2018 · 5 comments
Open

Optimize the oneshot case #199

ghost opened this issue Sep 25, 2018 · 5 comments

Comments

@ghost
Copy link

ghost commented Sep 25, 2018

Here's a simple benchmark:

#![feature(test)]

extern crate crossbeam_channel as channel;
extern crate test;

use std::sync::mpsc;

#[bench]
fn oneshot_mpsc(b: &mut test::Bencher) {
    b.iter(|| {
        let (s, r) = mpsc::channel();
        s.send(0).unwrap();
        r.recv().unwrap();
    });
}

#[bench]
fn oneshot_crossbeam(b: &mut test::Bencher) {
    b.iter(|| {
        let (s, r) = channel::unbounded();
        s.send(0);
        r.recv().unwrap();
    });
}

Results:

test oneshot_crossbeam ... bench:         248 ns/iter (+/- 9)
test oneshot_mpsc      ... bench:          90 ns/iter (+/- 2)

While I doubt we'll ever beat mpsc in this benchmark, at least let's try getting as close as possible.

@tobz
Copy link
Contributor

tobz commented Oct 2, 2018

Do you have any notes on why this may be? I'd be interested in helping, but I'd need some mentorship here.

@ghost
Copy link
Author

ghost commented Oct 2, 2018

I've noticed several problems affecting performance:

  1. Too much allocation at construction. We could probably remove one allocation in flavors::list::Channel::new.

  2. Closing is slow. In flavors::list::Channel::close we have a mutex lock that can probably be avoided in most cases.

  3. Channels are large. Due to the use of CachePadded, it turns out flavors::list::Channel is a pretty large struct and that affects the cost of construction. Not sure what to do about it, but some creativity might help.

There might be more. :)

@tobz
Copy link
Contributor

tobz commented Oct 2, 2018

Just responding as I discover things..

I tried naively removing the cache padding to see what effect it might have: about a 2-5% speedup (385ns seemed to be the lower bound for no padding, with the padded versions being 390-400ns).

Also, it doesn't look like close actually has a mutex? All I see is an atomic swap and then a synchronous draining of the receivers.

EDIT: I lied, I see the mutex in SyncWaker. :)

@ghost ghost transferred this issue from crossbeam-rs/crossbeam-channel Nov 5, 2018
@ghost
Copy link
Author

ghost commented Nov 21, 2018

Another thing that might be worth trying:

Instead of ReceiverFlavor and ChannelFlavor, have ReceiverFlavor and SenderFlavor. This might optimize the layout of structs a bit better.

@taiki-e
Copy link
Member

taiki-e commented Jan 23, 2022

FYI: I tried an implementation based on the single-element queue used in concurrent-queue (async-channel): 77e70ae

The performance of oneshot and SPSC has been improved, but there were regressions in the performance of MPMC, MPSC, and SPMC.

Before:

crossbeam-channe/benches/oneshot.rs

test oneshot_crossbeam_bounded   ... bench:         349 ns/iter (+/- 28)
test oneshot_crossbeam_unbounded ... bench:         349 ns/iter (+/- 43)
test oneshot_mpsc                ... bench:         105 ns/iter (+/- 8)

crossbeam-channe/benches/crossbeam.rs

test bounded_1::create    ... bench:         322 ns/iter (+/- 13)
test bounded_1::mpmc      ... bench:   3,404,353 ns/iter (+/- 143,002)
test bounded_1::mpsc      ... bench:   7,634,640 ns/iter (+/- 183,081)
test bounded_1::oneshot   ... bench:         365 ns/iter (+/- 14)
test bounded_1::spmc      ... bench:   7,925,501 ns/iter (+/- 179,122)
test bounded_1::spsc      ... bench:   7,444,061 ns/iter (+/- 1,204,285)

After:

crossbeam-channe/benches/oneshot.rs

test oneshot_crossbeam_bounded   ... bench:         162 ns/iter (+/- 9)
test oneshot_crossbeam_unbounded ... bench:         344 ns/iter (+/- 14)
test oneshot_mpsc                ... bench:         107 ns/iter (+/- 13)

crossbeam-channe/benches/crossbeam.rs

test bounded_1::create    ... bench:         127 ns/iter (+/- 10)
test bounded_1::mpmc      ... bench:   5,888,806 ns/iter (+/- 74,952)
test bounded_1::mpsc      ... bench:  10,586,691 ns/iter (+/- 270,984)
test bounded_1::oneshot   ... bench:         164 ns/iter (+/- 11)
test bounded_1::spmc      ... bench:  10,544,557 ns/iter (+/- 796,707)
test bounded_1::spsc      ... bench:   4,612,521 ns/iter (+/- 1,626,963)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants