Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleaning up the blocking module #112

Closed
marioortizmanero opened this issue Aug 15, 2020 · 7 comments · Fixed by #129
Closed

Cleaning up the blocking module #112

marioortizmanero opened this issue Aug 15, 2020 · 7 comments · Fixed by #129
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@marioortizmanero
Copy link
Collaborator

marioortizmanero commented Aug 15, 2020

This is a continuation of #102, which discussed how to properly export the blocking module without copy-pasting the async functions and removing async and await. The current solution makes it painful to maintain rspotify, and compilation times when using the blocking client are huge because you're basically compiling rspotify twice, which kinda defeats the point of #108.

I first thought that using an attribute macro could be a good idea because all that needs to be modified from the async implementations is removing the async and await keywords. See this comment for a demonstration. But this still doesn't fix the "compiling the codebase twice" issue, because the code would still be repeated, now after the macro is ran instead of manually. The macro in question would make compilation times even longer, and could be messy to implement. The only possible way to only "compile the codebase once" would be by having the blocking feature only export the blocking interface when used, but I'm not sure we can assume that the users won't need both the async and blocking interfaces for a program.

The second possible solution is calling the async implementations in the blocking functions using block_on. This might sound less efficient at first, but here's a comparison I made to prove my point:

use std::time::Instant;

async fn original() -> Result<String, reqwest::Error> {
    reqwest::get("https://www.rust-lang.org")
        .await?
        .text()
        .await
}

fn with_copypaste() -> Result<String, reqwest::Error> {
    reqwest::blocking::get("https://www.rust-lang.org")?
        .text()
}

fn with_block_on() -> Result<String, reqwest::Error> {
    let mut rt = tokio::runtime::Builder::new()
        .basic_scheduler()
        .enable_all()
        .build()
        .unwrap();

    rt.block_on(async move {
        original().await
    })
}

fn main() {
    let now = Instant::now();
    print!("benchmarking with copypaste ...");
    for _ in 1..100 {
        with_copypaste().unwrap();
    }
    println!("done in {}ms", now.elapsed().as_millis());

    let now = Instant::now();
    print!("benchmarking with block_on ...");
    for _ in 1..100 {
        with_block_on().unwrap();
    }
    println!("done in {}ms", now.elapsed().as_millis());
}
❯ cargo run --release
    Finished release [optimized] target(s) in 0.04s
     Running `/home/mario/.cache/cargo/release/macros`
benchmarking with copypaste ...done in 31988ms
benchmarking with block_on ...done in 30250ms

What? block_on was just as fast as the copypaste? Turns out that this is more or less what reqwest does with its own blocking module!. So both the copypaste and the runtime solutions are basically doing the same. This way we don't even need the blocking feature in reqwest, and rspotify will only be "compiled once".

This solution could use a smaller macro that automatically generates the boilerplate block_on functions to avoid some repetition.

The best part is that the example isn't even properly optimized. It generates a runtime every time the function is called. If it were global, it would only have to be initialized once. Here's a more complex and improvable (since it doesn't seem to be actually faster for some reason) version with lazy_static:

use std::time::Instant;
use std::sync::{Arc, Mutex};

use lazy_static::lazy_static;
use tokio::runtime;

async fn original() -> Result<String, reqwest::Error> {
    reqwest::get("https://www.rust-lang.org")
        .await?
        .text()
        .await
}

fn with_copypaste() -> Result<String, reqwest::Error> {
    reqwest::blocking::get("https://www.rust-lang.org")?
        .text()
}

lazy_static! {
    // Mutex to have mutable access and Arc so that it's thread-safe.
    static ref RT: Arc<Mutex<runtime::Runtime>> = Arc::new(Mutex::new(runtime::Builder::new()
        .basic_scheduler()
        .enable_all()
        .build()
        .unwrap()));
}

fn with_block_on() -> Result<String, reqwest::Error> {
    RT.lock().unwrap().block_on(async move {
        original().await
    })
}

fn main() {
    let now = Instant::now();
    print!("benchmarking with copypaste ...");
    for _ in 1..100 {
        with_copypaste().unwrap();
    }
    println!("done in {}ms", now.elapsed().as_millis());

    let now = Instant::now();
    print!("benchmarking with block_on ...");
    for _ in 1..100 {
        with_block_on().unwrap();
    }
    println!("done in {}ms", now.elapsed().as_millis());
}
❯ cargo run --release
   Compiling macros v0.1.0 (/tmp/macros)
    Finished release [optimized] target(s) in 2.42s
     Running `/home/mario/.cache/cargo/release/macros`
benchmarking with copypaste ...done in 30172ms
benchmarking with block_on ...done in 30606ms
@marioortizmanero marioortizmanero added enhancement New feature or request help wanted Extra attention is needed labels Aug 15, 2020
@marioortizmanero marioortizmanero self-assigned this Aug 16, 2020
@ramsayleung
Copy link
Owner

Totally agree with your point, the copy-paste blocking module makes rspotify painful to maintain, you have to maintain two versions of code if you are adding a new endpoint.

This solution could use a smaller macro that automatically generates the boilerplate block_on functions to avoid some repetition.

This solution is good, we don't have generate code by hand, can't help to see it.

And there is one more thing which we should keep eyes on, will state of blocking call become more complicated to maintain?

After discussion in this PR #102, I come out with an intuition almost as same as this:

fn with_block_on() -> Result<String, reqwest::Error> {
    let mut rt = tokio::runtime::Builder::new()
        .basic_scheduler()
        .enable_all()
        .build()
        .unwrap();

    rt.block_on(async move {
        original().await
    })
}

And my problem is that is there more state-machine and error we have to take care of compared with copy-paste code, because we plan to introduce tokei runtime to handle the execution of calling reqwest function.

This way we don't even need the blocking feature in reqwest

Yeah, but I think it's some kind of thing like:

Andy gives, Bill takes away.

reqwest gives, tokio takes away :)

@marioortizmanero
Copy link
Collaborator Author

marioortizmanero commented Aug 16, 2020

And my problem is that is there more state-machine and error we have to take care of compared with copy-paste code, because we plan to introduce tokei runtime to handle the execution of calling reqwest function.

I don't understand what you mean with that. The way I see it is that performance/complexity-wise it's basically the same as the copy paste version. reqwest also introduces a blocking tokio runtime under the hood, with an even more complicated method to run the blocking code than just using block_on. We have less code to take care of than the copy-paste version because we only have to call runtime.block_on(async move { ... }): the implementation only needs to be written once.

I'll start working on this once #95, #110 and #113 are finished, but I still need to figure out how to only create a single global runtime and how to make it more performant.

@ramsayleung
Copy link
Owner

Reqwest also introduces a blocking tokio runtime under the hood, with an even more complicated method to run the blocking code than just using block_on

Yeap, I get your point

but I still need to figure out how to only create a single global runtime and how to make it more performant.

How about the Singleton design pattern, would it be helpful?

@marioortizmanero
Copy link
Collaborator Author

How about the Singleton design pattern, would it be helpful?

I'm not a big fan of singletons but I guess it's the only way to do it in this case. Isn't what I suggested with lazy_static a singleton anyway? What I don't understand is why the singleton performs worse than using multiple runtimes, perhaps the Arc<Mutex<T>> overhead is greater than instantiating a new runtime? :/

@Darksonn
Copy link

Darksonn commented Aug 29, 2020

I recommend using a shared runtime, but with a small change compared to what you have now:

  1. Use a threaded runtime with one thread.
  2. Do not put it in an Arc<Mutex<...>>
  3. Use Handle::block_on to block on many things concurrently.

Alternatively you could spawn the operation on the shared runtime and use Handle::block_on on the JoinHandle returned by tokio::spawn.

I'm not sure which of the two are better.

As slight variation on the above, you can use a basic scheduler, but spawn a thread that calls block on, waiting on e.g. a channel with jobs. This is what reqwest's blocking module does afaik.

@marioortizmanero
Copy link
Collaborator Author

marioortizmanero commented Aug 29, 2020

Thanks for the recommendations, @Darksonn!

I'm extending the initial benchmark with all the possibilities. The benchmark obviously isn't reliable enough for the choice, but I think it gives an idea of how the performance is affected:

main.rs
use std::time::Instant;
use std::sync::{Arc, Mutex};
use std::thread;

use lazy_static::lazy_static;
use tokio::runtime;

async fn original() -> Result<String, reqwest::Error> {
    reqwest::get("https://www.rust-lang.org")
        .await?
        .text()
        .await
}

fn with_copypaste() -> Result<String, reqwest::Error> {
    reqwest::blocking::get("https://www.rust-lang.org")?
        .text()
}

lazy_static! {
    // Mutex to have mutable access and Arc so that it's thread-safe.
    static ref RT_THREADING: Arc<Mutex<runtime::Runtime>> = Arc::new(Mutex::new(runtime::Builder::new()
        .basic_scheduler()
        .enable_all()
        .build()
        .unwrap()));

    // Without mutexes, it should be used with `handle`.
    static ref RT: runtime::Runtime = runtime::Runtime::new().unwrap();
}

fn with_block_on_local_runtime() -> Result<String, reqwest::Error> {
    let mut rt = tokio::runtime::Builder::new()
        .basic_scheduler()
        .enable_all()
        .build()
        .unwrap();

    rt.block_on(async move {
        original().await
    })
}

fn with_block_on_threaded_runtime() -> Result<String, reqwest::Error> {
    RT_THREADING.lock().unwrap().block_on(async move {
        original().await
    })
}

fn with_block_on_handle() -> Result<String, reqwest::Error> {
    RT.handle().block_on(async move {
        original().await
    })
}

fn with_block_on_spawn_handle() -> Result<String, reqwest::Error> {
    RT.handle().block_on(async {
        RT.spawn(async move {
            original().await
        }).await.unwrap()
    })
}

fn main() {
    macro_rules! benchmark {
        ($name:ident) => {
            let mut total = 0;
            let now = Instant::now();
            print!("benchmarking {} single-threaded ...", stringify!($name));

            for _ in 1..50 {
                $name().unwrap();
            }

            total += now.elapsed().as_millis();
            println!(" done in {}ms", now.elapsed().as_millis());

            let now = Instant::now();
            print!("benchmarking {} multi-threaded ...", stringify!($name));

            // multi threaded
            let mut handles = Vec::new();
            for _ in 1..50 {
                handles.push(thread::spawn(|| {
                    $name().unwrap();
                }))
            }
            for handle in handles {
                handle.join().unwrap();
            }

            total += now.elapsed().as_millis();
            println!(" done in {}ms", now.elapsed().as_millis());

            println!(">> total time taken: {}ms", total);
        }
    }

    benchmark!(with_copypaste);
    benchmark!(with_block_on_local_runtime);
    benchmark!(with_block_on_threaded_runtime);
    benchmark!(with_block_on_handle);
    benchmark!(with_block_on_spawn_handle);
}
Cargo.toml
[package]
name = "benchmarks"
version = "0.1.0"
authors = ["Mario Ortiz Manero <marioortizmanero@gmail.com>"]
edition = "2018"

[dependencies]
reqwest = { version = "0.10.8", features = ["blocking"] }
tokio = { version = "0.2.22", features = ["full"] }
lazy_static = "1.4.0"

Results on an Intel Pentium G4560 (4) @ 3.500GHz (which, to be fair, is a terrible CPU, specially when multithreading). It'd be a good idea to run the benchmark on more systems:

benchmarking with_copypaste single-threaded ... done in 17664ms
benchmarking with_copypaste multi-threaded ... done in 6231ms
>> total time taken: 23895ms
benchmarking with_block_on_local_runtime single-threaded ... done in 17636ms
benchmarking with_block_on_local_runtime multi-threaded ... done in 10506ms
>> total time taken: 28142ms
benchmarking with_block_on_threaded_runtime single-threaded ... done in 21804ms
benchmarking with_block_on_threaded_runtime multi-threaded ... done in 17705ms
>> total time taken: 39509ms
benchmarking with_block_on_handle single-threaded ... done in 16935ms
benchmarking with_block_on_handle multi-threaded ... done in 6104ms
>> total time taken: 23039ms
benchmarking with_block_on_spawn_handle single-threaded ... done in 16600ms
benchmarking with_block_on_spawn_handle multi-threaded ... done in 5886ms
>> total time taken: 22486ms

The reqwest solution you suggested is basically the same as with_copypaste, since it's what it uses under the hood.

@ramsayleung
Copy link
Owner

It'd be a good idea to run the benchmark on more systems.

Perhaps we could do that with the Github Action with runs-on label?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants