Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmarks #44

Open
LucioFranco opened this issue Oct 4, 2019 · 6 comments
Open

Add benchmarks #44

LucioFranco opened this issue Oct 4, 2019 · 6 comments
Labels

Comments

@LucioFranco
Copy link
Member

It would be handy to add benchmarks mostly around encoding and decoding.

@blittable
Copy link
Contributor

blittable commented Oct 9, 2019

@LucioFranco

A few notes while chipping away at this... something might be of interest and inspire additional tests. IMHO, most of the request/response building is unimpressive (in a good way).

Planning to look at concurrency/loads next and then, hopefully we can shape a pull request that provides solid regressions.

  1. Request/Response construction comparison with 1 field and field size varying by byte size (log scaled):

image

  1. The next one compares the above with a different .proto - containing two structs with a total of 54 fields for 444 bytes, varying one (String) by size as the above. The bigger .proto has a nested sub-message. We should be able to calculate the overhead of the additional struct fields. There's a metric there if it's worth knowing.

image

  1. The raw logs. The throughput measurement... it's not concurrent requests based, simply bytes processed. It spike a bit in middle as the # of bytes increases. That can be duplicated.
Request_Response/request/100                                                                             
                        time:   [29.072 ns 29.336 ns 29.644 ns]
                        thrpt:  [3.1417 GiB/s 3.1747 GiB/s 3.2035 GiB/s]
                 change:
                        time:   [-14.790% -13.916% -12.931%] (p = 0.00 < 0.05)
                        thrpt:  [+14.851% +16.166% +17.358%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe
Request_Response/response/100                                                                             
                        time:   [30.985 ns 31.229 ns 31.500 ns]
                        thrpt:  [2.9566 GiB/s 2.9822 GiB/s 3.0057 GiB/s]
                 change:
                        time:   [-7.4938% -6.9291% -6.3729%] (p = 0.00 < 0.05)
                        thrpt:  [+6.8067% +7.4450% +8.1009%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe
Request_Response/request/1000                                                                             
                        time:   [42.263 ns 42.528 ns 42.845 ns]
                        thrpt:  [21.737 GiB/s 21.899 GiB/s 22.036 GiB/s]
                 change:
                        time:   [-6.8061% -6.1563% -5.4255%] (p = 0.00 < 0.05)
                        thrpt:  [+5.7368% +6.5602% +7.3031%]
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe
Request_Response/response/1000                                                                             
                        time:   [41.965 ns 42.069 ns 42.202 ns]
                        thrpt:  [22.068 GiB/s 22.138 GiB/s 22.193 GiB/s]
                 change:
                        time:   [-10.390% -9.8268% -9.3006%] (p = 0.00 < 0.05)
                        thrpt:  [+10.254% +10.898% +11.595%]
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  5 (5.00%) high severe
Request_Response/request/10000                                                                            
                        time:   [245.95 ns 247.33 ns 249.14 ns]
                        thrpt:  [37.381 GiB/s 37.655 GiB/s 37.866 GiB/s]
                 change:
                        time:   [-7.0783% -6.6178% -6.1523%] (p = 0.00 < 0.05)
                        thrpt:  [+6.5557% +7.0868% +7.6175%]
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  6 (6.00%) high mild
  3 (3.00%) high severe
Request_Response/response/10000                                                                            
                        time:   [184.85 ns 185.43 ns 186.29 ns]
                        thrpt:  [49.994 GiB/s 50.224 GiB/s 50.384 GiB/s]
                 change:
                        time:   [-19.907% -19.642% -19.355%] (p = 0.00 < 0.05)
                        thrpt:  [+24.000% +24.443% +24.855%]
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe
Request_Response/request/100000                                                                             
                        time:   [2.7391 us 2.7491 us 2.7614 us]
                        thrpt:  [33.726 GiB/s 33.877 GiB/s 34.001 GiB/s]
                 change:
                        time:   [+2.0085% +2.7858% +3.5567%] (p = 0.00 < 0.05)
                        thrpt:  [-3.4346% -2.7103% -1.9690%]
                        Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
  2 (2.00%) high mild
  12 (12.00%) high severe
Request_Response/response/100000                                                                             
                        time:   [2.8018 us 2.8147 us 2.8310 us]
                        thrpt:  [32.898 GiB/s 33.087 GiB/s 33.240 GiB/s]
                 change:
                        time:   [-0.1598% +1.2106% +2.6221%] (p = 0.09 > 0.05)
                        thrpt:  [-2.5551% -1.1961% +0.1601%]
                        No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
  7 (7.00%) high mild
  6 (6.00%) high severe
Request_Response/request/1000000                                                                             
                        time:   [35.638 us 35.708 us 35.792 us]
                        thrpt:  [26.021 GiB/s 26.082 GiB/s 26.133 GiB/s]
                 change:
                        time:   [-0.2461% +0.2848% +0.8140%] (p = 0.31 > 0.05)
                        thrpt:  [-0.8074% -0.2840% +0.2467%]
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe
Request_Response/response/1000000                                                                             
                        time:   [36.883 us 36.924 us 36.971 us]
                        thrpt:  [25.191 GiB/s 25.223 GiB/s 25.251 GiB/s]
                 change:
                        time:   [+3.8371% +4.2068% +4.6298%] (p = 0.00 < 0.05)
                        thrpt:  [-4.4249% -4.0370% -3.6953%]
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) high mild
  6 (6.00%) high severe
Request_Response/request/10000000                                                                            
                        time:   [708.38 us 726.35 us 745.29 us]
                        thrpt:  [12.496 GiB/s 12.822 GiB/s 13.147 GiB/s]
                 change:
                        time:   [+6.7526% +10.265% +13.908%] (p = 0.00 < 0.05)
                        thrpt:  [-12.210% -9.3096% -6.3255%]
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe
Request_Response/response/10000000                                                                            
                        time:   [723.25 us 739.96 us 758.05 us]
                        thrpt:  [12.286 GiB/s 12.586 GiB/s 12.877 GiB/s]
                 change:
                        time:   [+11.334% +15.100% +19.064%] (p = 0.00 < 0.05)
                        thrpt:  [-16.011% -13.119% -10.180%]
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

     Running target/release/deps/request_response_diverse_fields-90067feaf8f4f410
Request_Response_Diverse_Types/request/10                                                                            
                        time:   [218.86 ns 219.16 ns 219.50 ns]
                        thrpt:  [43.447 MiB/s 43.516 MiB/s 43.574 MiB/s]
                 change:
                        time:   [-98.284% -98.126% -97.958%] (p = 0.00 < 0.05)
                        thrpt:  [+4796.5% +5236.1% +5727.9%]
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe
Request_Response_Diverse_Types/request/100                                                                            
                        time:   [350.18 ns 350.53 ns 350.93 ns]
                        thrpt:  [271.75 MiB/s 272.07 MiB/s 272.34 MiB/s]
                 change:
                        time:   [-97.658% -97.346% -96.985%] (p = 0.00 < 0.05)
                        thrpt:  [+3216.9% +3667.6% +4169.3%]
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe
Request_Response_Diverse_Types/request/1000                                                                            
                        time:   [267.28 ns 268.31 ns 269.61 ns]
                        thrpt:  [3.4544 GiB/s 3.4711 GiB/s 3.4845 GiB/s]
                 change:
                        time:   [-97.965% -97.800% -97.614%] (p = 0.00 < 0.05)
                        thrpt:  [+4091.1% +4444.5% +4814.7%]
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe
Request_Response_Diverse_Types/request/100000                                                                             
                        time:   [3.2873 us 3.3733 us 3.4717 us]
                        thrpt:  [26.826 GiB/s 27.608 GiB/s 28.331 GiB/s]
                 change:
                        time:   [-80.166% -78.908% -77.585%] (p = 0.00 < 0.05)
                        thrpt:  [+346.13% +374.12% +404.18%]
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  3 (3.00%) high mild
  9 (9.00%) high severe
Request_Response_Diverse_Types/request/1000000                                                                             
                        time:   [36.100 us 36.295 us 36.538 us]
                        thrpt:  [25.489 GiB/s 25.660 GiB/s 25.798 GiB/s]
                 change:
                        time:   [-38.165% -35.252% -32.403%] (p = 0.00 < 0.05)
                        thrpt:  [+47.935% +54.444% +61.722%]
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  5 (5.00%) high mild
  4 (4.00%) high severe
Request_Response_Diverse_Types/request/10000000                                                                            
                        time:   [699.38 us 714.28 us 729.45 us]
                        thrpt:  [12.768 GiB/s 13.039 GiB/s 13.316 GiB/s]
                 change:
                        time:   [-48.848% -47.040% -44.908%] (p = 0.00 < 0.05)
                        thrpt:  [+81.514% +88.821% +95.496%]
                        Performance has improved.

@LucioFranco
Copy link
Member Author

@blittable This looks great!

@blittable
Copy link
Contributor

@LucioFranco

Started a few e-2-e performance tests over here: https://github.com/blittable/tonic-perf

One challenge... running a number of iterations around request/responses introduces a challenge with the client. The performance of the client - if the responses are awaited and handled (like with a println!) is ~32 seconds on 1000 iterations.

A Node.js client clocks in at around 170 milli-seconds for the same calls against the same tonic server with the same protobuf files.

If the same tonic/client code is handled by spawning threads for the client's invocations (and message creation) - it's around 74 micro-seconds for 1000 iterations.

So, the question: what's wrong with this?

   // tokio::spawn(async move {  <- Crazy fast
       for _ in 0..1000_i32 {
           let request = tonic::Request::new(HelloRequest {
               name: "world".into(),
               iteration: 1,
           });

           let resp = match client.say_hello(request).await {
               Ok(resp) => println!("{:?}", resp.into_inner().message),
               Err(e) => {
                   println!("Errant response; err = {:?}", e);
               }
           };
       }
   //});

1 - Test server: https://github.com/blittable/tonic-perf/blob/cd8463ffe110ff3502b698b2d69dda5c1ca918fe/hello-tonic/src/hellotonic/server.rs#L38

2 - Test client: https://github.com/blittable/tonic-perf/blob/master/hello-tonic/src/hellotonic/client.rs

3 - Comparable node test: https://github.com/blittable/tonic-perf/tree/master/comparables/node/dynamic_codegen ( node greeter_client.js / tonic server)

side note... Rustwide makes a nice reproducible env (easy to flip nightly/beta/stable win/linux / etc) for building/testing https://github.com/blittable/tonic-perf/tree/master/rustwide-build - but not so agreeable with distributed configurations.

@JohnDoneth
Copy link
Contributor

JohnDoneth commented Oct 17, 2019

@blittable

I believe the reason why the tokio::spawn code is so fast is because that closure is being put onto the tokio runtime but never really waited on. It looks like the work of making those client requests is dropped when run_requests returns and main finishes when the runtime is dropped.

Also you probably want to create a future per request without awaiting each one and use something like select! over a collection of futures. Without testing it for myself I believe the reason the test is so slow is that each request is still being awaited in a loop so you have serial execution. If one request takes ~30ms you're gonna get a result of 1000 * ~30ms which is about 30 seconds.

Edit: Here is an example of waiting for multiple requests concurrently.

I hope this was helpful!

@alce
Copy link
Collaborator

alce commented Oct 18, 2019

@blittable you can try something like:

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = GreeterClient::connect("http://[::1]:50051")?;

    let mut futures = FuturesUnordered::new();
    for _ in 0..1000_i32 {
        let request = tonic::Request::new(HelloRequest {
            name: "world".into(),
        });

        let mut client = client.clone();
        futures.push(async move { client.say_hello(request).await });
    }

    while let Some(res) = futures.next().await {
        match res {
            Ok(resp) => println!("{:?}", resp.into_inner().message),
            Err(e) => {
                println!("Errant response; err = {:?}", e);
            }
        }
    }

    Ok(())
}

@blittable
Copy link
Contributor

@alce Thanks for that and the route guide! The client's about 100% faster than the equivalent node.js version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants