Experiment faster stateful implement #493

naoh87 · 2022-01-31T18:51:41Z

Motivation

https://github.com/LesnyRumcajs/grpc_bench/wiki/2022-01-11-bench-results
Just I want to improve performance.
This PR reduce unsafeRun call to 1 and eliminate blocking.

Benchmark

scenario: https://github.com/LesnyRumcajs/grpc_bench
ghz --cpus=3 -n 100000 ... after warm up 30s

This PR

Summary:
  Count:	100000
  Total:	9.60 s
  Slowest:	9.75 ms
  Fastest:	0.17 ms
  Average:	2.54 ms
  Requests/sec:	10416.95

Response time histogram:
  0.174 [1]     |
  1.131 [16409] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  2.088 [30140] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  3.045 [19611] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  4.003 [15349] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  4.960 [12088] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  5.917 [5319]  |∎∎∎∎∎∎∎
  6.874 [879]   |∎
  7.831 [137]   |
  8.789 [38]    |
  9.746 [29]    |

Latency distribution:
  10 % in 0.92 ms
  25 % in 1.39 ms
  50 % in 2.23 ms
  75 % in 3.59 ms
  90 % in 4.63 ms
  95 % in 5.12 ms
  99 % in 5.96 ms

Status code distribution:
  [OK]   100000 responses

fs2-grpc: 2.4.4

Summary:
  Count:	100000
  Total:	11.53 s
  Slowest:	30.64 ms
  Fastest:	0.26 ms
  Average:	3.09 ms
  Requests/sec:	8671.53

Response time histogram:
  0.259  [1]     |
  3.297  [60865] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  6.335  [36025] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  9.372  [2840]  |∎∎
  12.410 [82]    |
  15.448 [87]    |
  18.486 [49]    |
  21.524 [24]    |
  24.562 [3]     |
  27.600 [11]    |
  30.637 [13]    |

Latency distribution:
  10 % in 1.20 ms
  25 % in 1.80 ms
  50 % in 2.75 ms
  75 % in 4.19 ms
  90 % in 5.38 ms
  95 % in 5.96 ms
  99 % in 7.22 ms

Status code distribution:
  [OK]   100000 responses

ScalaPB

Summary:
  Count:	100000
  Total:	9.54 s
  Slowest:	11.45 ms
  Fastest:	0.14 ms
  Average:	2.48 ms
  Requests/sec:	10479.99

Response time histogram:
  0.139  [1]     |
  1.270  [22921] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  2.400  [32755] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  3.531  [20002] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  4.662  [15353] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  5.793  [7000]  |∎∎∎∎∎∎∎∎∎
  6.923  [1665]  |∎∎
  8.054  [256]   |
  9.185  [37]    |
  10.316 [6]     |
  11.446 [4]     |

Latency distribution:
  10 % in 0.87 ms
  25 % in 1.33 ms
  50 % in 2.16 ms
  75 % in 3.48 ms
  90 % in 4.57 ms
  95 % in 5.13 ms
  99 % in 6.20 ms

Status code distribution:
  [OK]   100000 responses

naoh87 · 2022-02-02T02:35:17Z

runtime/src/main/scala/fs2/grpc/server/ImpureUnaryServerCall.scala

+      private var cancel: Cancel = Noop
+
+      override def onCancel(): Unit =
+        cancel()


Should be blocked until cancellation is completed?

ahjohannessen · 2022-02-02T09:22:08Z

@naoh87 What kind of numbers do you get if you bench this https://github.com/typelevel/fs2-grpc/pull/486/files?

naoh87 · 2022-02-02T12:22:15Z

I change warm up time to 60s and run 30s for stabilize result because #486 and #493 performance is so close.

This benchmark was run on MacBookPro 2019 (4 core 2.8GHz Core i7)

// PS : This result is mistook measurement

This PR

Summary:
  Count:	357669
  Total:	30.00 s
  Slowest:	26.08 ms
  Fastest:	0.14 ms
  Average:	2.15 ms
  Requests/sec:	11921.77

Response time histogram:
  0.142  [1]      |
  2.736  [281690] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  5.330  [63114]  |∎∎∎∎∎∎∎∎∎
  7.924  [12309]  |∎∎
  10.518 [467]    |
  13.112 [41]     |
  15.706 [6]      |
  18.300 [10]     |
  20.894 [0]      |
  23.488 [0]      |
  26.083 [1]      |

Latency distribution:
  10 % in 0.92 ms
  25 % in 1.32 ms
  50 % in 1.84 ms
  75 % in 2.54 ms
  90 % in 3.98 ms
  95 % in 4.94 ms
  99 % in 6.49 ms

#486 08f556f

Summary:
  Count:	332880
  Total:	30.00 s
  Slowest:	19.52 ms
  Fastest:	0.16 ms
  Average:	2.30 ms
  Requests/sec:	11095.20

Response time histogram:
  0.158  [1]      |
  2.095  [184726] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  4.031  [110812] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  5.968  [27372]  |∎∎∎∎∎∎
  7.905  [9036]   |∎∎
  9.841  [828]    |
  11.778 [35]     |
  13.715 [13]     |
  15.651 [4]      |
  17.588 [1]      |
  19.525 [2]      |

Latency distribution:
  10 % in 0.97 ms
  25 % in 1.39 ms
  50 % in 1.96 ms
  75 % in 2.72 ms
  90 % in 4.24 ms
  95 % in 5.32 ms
  99 % in 6.93 ms

ahjohannessen · 2022-02-02T12:35:24Z

I see, not much of a difference between those two.

How do you run this ghz --cpus=3 -n 100000 Using docker?

naoh87 · 2022-02-02T12:41:12Z

Nope. I just sbt run because I am not interested in Docker Desktop VM overhead.

ahjohannessen · 2022-02-02T12:42:16Z

Perhaps using constructs from https://github.com/davenverse/condemned/tree/main/core/src/main/scala/io/chrisdavenport/condemned
can make my take on it a bit faster.

ahjohannessen · 2022-02-02T12:43:02Z

Nope. I just sbt run because I am not interested in Docker Desktop VM overhead.

I mean, is ghz something you installed with brew?

naoh87 · 2022-02-02T12:44:24Z

Yes. I have installed with brew.

❯ ghz --version
0.106.0

ahjohannessen · 2022-02-02T12:44:48Z

Would you mind sharing the ghz command, so I could try locally here? :)

naoh87 · 2022-02-02T12:47:37Z

❯ ghz --cpus=3 --proto ../scenarios/complex_proto/helloworld.proto -D ../scenarios/complex_proto/payload --insecure localhost:50051 --call helloworld.Greeter.SayHello -z 30s
at grpc_bench/scala_fs2_bench
I was edited scala_fs2_bench/plugins.sbt to use fs2-grpc: 2.4.4 and switch runtime with dependencyOverrides

ahjohannessen · 2022-02-02T12:49:38Z

Thanks 👍

naoh87 · 2022-02-02T13:08:36Z

I am really sorry about that my above benchmark has completely mistook measurement.
I forgot to switch ScalaPB implementation.

naoh87 · 2022-02-02T13:30:11Z

I retry benchmark. But both implement are almost same peformance.
This PR

Summary:
  Count:	660755
  Total:	60.00 s
  Slowest:	25.08 ms
  Fastest:	0.17 ms
  Average:	2.32 ms
  Requests/sec:	11011.78

Latency distribution:
  10 % in 0.99 ms
  25 % in 1.42 ms
  50 % in 1.98 ms
  75 % in 2.75 ms
  90 % in 4.27 ms
  95 % in 5.31 ms
  99 % in 6.92 ms

#486

Summary:
  Count:	640781
  Total:	60.01 s
  Slowest:	19.69 ms
  Fastest:	0.19 ms
  Average:	2.44 ms
  Requests/sec:	10678.68

Latency distribution:
  10 % in 1.07 ms
  25 % in 1.53 ms
  50 % in 2.12 ms
  75 % in 2.90 ms
  90 % in 4.39 ms
  95 % in 5.46 ms
  99 % in 7.11 ms

naoh87 · 2022-02-02T13:36:23Z

I was also tried start run(message) at onMessage and give up handle Too many requests but it does not improve performance a lot.

ahjohannessen · 2022-02-02T16:08:54Z

@naoh87 What is the Status code distribution: for #486 ?

naoh87 · 2022-02-02T17:49:19Z

This result is another run of -z 60s, but it's enough to use.

Status code distribution:
  [OK]            640219 responses
  [Canceled]      12 responses
  [Unavailable]   36 responses

I think this result is reasonable, because default ghz request parallelism is 50

naoh87 · 2022-02-03T13:31:53Z

I benchmarked with -XX:+UnlockDiagnosticVMOptions and -XX:+DebugNonSafepoints and profile with async-profiler.
This shows #493 is less runtime overhead.

#493

Summary:
  Count:	675933
  Total:	60.01 s
  Slowest:	24.32 ms
  Fastest:	0.17 ms
  Average:	2.29 ms
  Requests/sec:	11264.55

Latency distribution:
  10 % in 0.98 ms
  25 % in 1.41 ms
  50 % in 1.96 ms
  75 % in 2.70 ms
  90 % in 4.22 ms
  95 % in 5.24 ms
  99 % in 6.79 ms

#486

Summary:
  Count:	639397
  Total:	60.00 s
  Slowest:	103.10 ms
  Fastest:	0.19 ms
  Average:	2.44 ms
  Requests/sec:	10655.82

Latency distribution:
  10 % in 1.06 ms
  25 % in 1.52 ms
  50 % in 2.11 ms
  75 % in 2.88 ms
  90 % in 4.41 ms
  95 % in 5.49 ms
  99 % in 7.10 ms

https://github.com/naoh87/fs2-grpc-profile/tree/master/docs/2022-02-03

naoh87 · 2022-02-03T14:07:14Z

./profiler.sh -e cpu -d 30 ... while ghz --cpu=3 ...

naoh87 · 2022-02-04T09:28:34Z

runtime/src/main/scala/fs2/grpc/server/Fs2StreamServerCallHandler.scala

+  private val Closed: Cancel = () => ()
+
+  private def mkListener[F[_]: Async, Request, Response](
+      run: Stream[F, Request] => Cancel,


Implement stream request Listener like unary one

naoh-dw and others added 5 commits February 1, 2022 03:44

add ImpureUnaryServerCall

a564afe

refactor ImpureUnaryServerCall

f159dcb

fix header

fd8070f

do scalafmt

cc5fb27

fix compile warning

4b8d307

naoh87 force-pushed the fast_unary branch from c8693da to 4b8d307 Compare February 1, 2022 18:36

naoh87 commented Feb 2, 2022

View reviewed changes

some import fix

ea28bab

naoh87 added 2 commits February 3, 2022 22:25

set private[this]

4c720e8

minor fix

afc0af5

naoh87 added 3 commits February 4, 2022 15:51

change class name to Fs2ServerCallHandler

9a9b6e2

minor refactor Fs2StatefulServerCall

a39d9ad

add Fs2StreamServerCallHandler

5585829

naoh87 commented Feb 4, 2022

View reviewed changes

naoh87 added 2 commits February 4, 2022 18:32

remove unused import

6c56537

suppress compile warning

40a6987

naoh87 force-pushed the fast_unary branch from ba1c4ba to 40a6987 Compare February 4, 2022 09:46

naoh87 added 7 commits February 5, 2022 16:02

minor tweak UnsafeChannel

6add7b8

update ServerSuite

d8aea3c

remove existing server runtime implement

b4c6abf

extract Fs2StatefulServerCall to single file

c84b544

Test / scalafmt

c16966d

fix test

5062ba2

fix test

ef5c9cd

naoh87 mentioned this pull request Feb 5, 2022

Reimplement light server runtime #496

Merged

naoh87 closed this Feb 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment faster stateful implement #493

Experiment faster stateful implement #493

naoh87 commented Jan 31, 2022 •

edited

Loading

naoh87 Feb 2, 2022

ahjohannessen commented Feb 2, 2022

naoh87 commented Feb 2, 2022 •

edited

Loading

ahjohannessen commented Feb 2, 2022

naoh87 commented Feb 2, 2022

ahjohannessen commented Feb 2, 2022

ahjohannessen commented Feb 2, 2022

naoh87 commented Feb 2, 2022

ahjohannessen commented Feb 2, 2022

naoh87 commented Feb 2, 2022 •

edited

Loading

ahjohannessen commented Feb 2, 2022

naoh87 commented Feb 2, 2022 •

edited

Loading

naoh87 commented Feb 2, 2022

naoh87 commented Feb 2, 2022 •

edited

Loading

ahjohannessen commented Feb 2, 2022

naoh87 commented Feb 2, 2022 •

edited

Loading

naoh87 commented Feb 3, 2022 •

edited

Loading

naoh87 commented Feb 3, 2022 •

edited

Loading

naoh87 Feb 4, 2022 •

edited

Loading

Experiment faster stateful implement #493

Experiment faster stateful implement #493

Conversation

naoh87 commented Jan 31, 2022 • edited Loading

Motivation

Benchmark

naoh87 Feb 2, 2022

Choose a reason for hiding this comment

ahjohannessen commented Feb 2, 2022

naoh87 commented Feb 2, 2022 • edited Loading

ahjohannessen commented Feb 2, 2022

naoh87 commented Feb 2, 2022

ahjohannessen commented Feb 2, 2022

ahjohannessen commented Feb 2, 2022

naoh87 commented Feb 2, 2022

ahjohannessen commented Feb 2, 2022

naoh87 commented Feb 2, 2022 • edited Loading

ahjohannessen commented Feb 2, 2022

naoh87 commented Feb 2, 2022 • edited Loading

naoh87 commented Feb 2, 2022

naoh87 commented Feb 2, 2022 • edited Loading

ahjohannessen commented Feb 2, 2022

naoh87 commented Feb 2, 2022 • edited Loading

naoh87 commented Feb 3, 2022 • edited Loading

naoh87 commented Feb 3, 2022 • edited Loading

naoh87 Feb 4, 2022 • edited Loading

Choose a reason for hiding this comment

naoh87 commented Jan 31, 2022 •

edited

Loading

naoh87 commented Feb 2, 2022 •

edited

Loading

naoh87 commented Feb 2, 2022 •

edited

Loading

naoh87 commented Feb 2, 2022 •

edited

Loading

naoh87 commented Feb 2, 2022 •

edited

Loading

naoh87 commented Feb 2, 2022 •

edited

Loading

naoh87 commented Feb 3, 2022 •

edited

Loading

naoh87 commented Feb 3, 2022 •

edited

Loading

naoh87 Feb 4, 2022 •

edited

Loading