Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment faster stateful implement #493

Closed
wants to merge 20 commits into from
Closed

Conversation

naoh87
Copy link
Contributor

@naoh87 naoh87 commented Jan 31, 2022

Motivation

https://github.com/LesnyRumcajs/grpc_bench/wiki/2022-01-11-bench-results
Just I want to improve performance.
This PR reduce unsafeRun call to 1 and eliminate blocking.

Benchmark

scenario: https://github.com/LesnyRumcajs/grpc_bench
ghz --cpus=3 -n 100000 ... after warm up 30s

This PR

Summary:
  Count:	100000
  Total:	9.60 s
  Slowest:	9.75 ms
  Fastest:	0.17 ms
  Average:	2.54 ms
  Requests/sec:	10416.95

Response time histogram:
  0.174 [1]     |
  1.131 [16409] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  2.088 [30140] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  3.045 [19611] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  4.003 [15349] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  4.960 [12088] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  5.917 [5319]  |∎∎∎∎∎∎∎
  6.874 [879]   |∎
  7.831 [137]   |
  8.789 [38]    |
  9.746 [29]    |

Latency distribution:
  10 % in 0.92 ms
  25 % in 1.39 ms
  50 % in 2.23 ms
  75 % in 3.59 ms
  90 % in 4.63 ms
  95 % in 5.12 ms
  99 % in 5.96 ms

Status code distribution:
  [OK]   100000 responses

fs2-grpc: 2.4.4

Summary:
  Count:	100000
  Total:	11.53 s
  Slowest:	30.64 ms
  Fastest:	0.26 ms
  Average:	3.09 ms
  Requests/sec:	8671.53

Response time histogram:
  0.259  [1]     |
  3.297  [60865] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  6.335  [36025] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  9.372  [2840]  |∎∎
  12.410 [82]    |
  15.448 [87]    |
  18.486 [49]    |
  21.524 [24]    |
  24.562 [3]     |
  27.600 [11]    |
  30.637 [13]    |

Latency distribution:
  10 % in 1.20 ms
  25 % in 1.80 ms
  50 % in 2.75 ms
  75 % in 4.19 ms
  90 % in 5.38 ms
  95 % in 5.96 ms
  99 % in 7.22 ms

Status code distribution:
  [OK]   100000 responses

ScalaPB

Summary:
  Count:	100000
  Total:	9.54 s
  Slowest:	11.45 ms
  Fastest:	0.14 ms
  Average:	2.48 ms
  Requests/sec:	10479.99

Response time histogram:
  0.139  [1]     |
  1.270  [22921] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  2.400  [32755] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  3.531  [20002] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  4.662  [15353] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  5.793  [7000]  |∎∎∎∎∎∎∎∎∎
  6.923  [1665]  |∎∎
  8.054  [256]   |
  9.185  [37]    |
  10.316 [6]     |
  11.446 [4]     |

Latency distribution:
  10 % in 0.87 ms
  25 % in 1.33 ms
  50 % in 2.16 ms
  75 % in 3.48 ms
  90 % in 4.57 ms
  95 % in 5.13 ms
  99 % in 6.20 ms

Status code distribution:
  [OK]   100000 responses

private var cancel: Cancel = Noop

override def onCancel(): Unit =
cancel()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be blocked until cancellation is completed?

@ahjohannessen
Copy link
Collaborator

@naoh87 What kind of numbers do you get if you bench this https://github.com/typelevel/fs2-grpc/pull/486/files?

@naoh87
Copy link
Contributor Author

naoh87 commented Feb 2, 2022

I change warm up time to 60s and run 30s for stabilize result because #486 and #493 performance is so close.

This benchmark was run on MacBookPro 2019 (4 core 2.8GHz Core i7)

// PS : This result is mistook measurement

This PR

Summary:
  Count:	357669
  Total:	30.00 s
  Slowest:	26.08 ms
  Fastest:	0.14 ms
  Average:	2.15 ms
  Requests/sec:	11921.77

Response time histogram:
  0.142  [1]      |
  2.736  [281690] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  5.330  [63114]  |∎∎∎∎∎∎∎∎∎
  7.924  [12309]  |∎∎
  10.518 [467]    |
  13.112 [41]     |
  15.706 [6]      |
  18.300 [10]     |
  20.894 [0]      |
  23.488 [0]      |
  26.083 [1]      |

Latency distribution:
  10 % in 0.92 ms
  25 % in 1.32 ms
  50 % in 1.84 ms
  75 % in 2.54 ms
  90 % in 3.98 ms
  95 % in 4.94 ms
  99 % in 6.49 ms

#486 08f556f

Summary:
  Count:	332880
  Total:	30.00 s
  Slowest:	19.52 ms
  Fastest:	0.16 ms
  Average:	2.30 ms
  Requests/sec:	11095.20

Response time histogram:
  0.158  [1]      |
  2.095  [184726] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  4.031  [110812] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  5.968  [27372]  |∎∎∎∎∎∎
  7.905  [9036]   |∎∎
  9.841  [828]    |
  11.778 [35]     |
  13.715 [13]     |
  15.651 [4]      |
  17.588 [1]      |
  19.525 [2]      |

Latency distribution:
  10 % in 0.97 ms
  25 % in 1.39 ms
  50 % in 1.96 ms
  75 % in 2.72 ms
  90 % in 4.24 ms
  95 % in 5.32 ms
  99 % in 6.93 ms

@ahjohannessen
Copy link
Collaborator

I see, not much of a difference between those two.

How do you run this ghz --cpus=3 -n 100000 Using docker?

@naoh87
Copy link
Contributor Author

naoh87 commented Feb 2, 2022

Nope. I just sbt run because I am not interested in Docker Desktop VM overhead.

@ahjohannessen
Copy link
Collaborator

Perhaps using constructs from https://github.com/davenverse/condemned/tree/main/core/src/main/scala/io/chrisdavenport/condemned
can make my take on it a bit faster.

@ahjohannessen
Copy link
Collaborator

Nope. I just sbt run because I am not interested in Docker Desktop VM overhead.

I mean, is ghz something you installed with brew?

@naoh87
Copy link
Contributor Author

naoh87 commented Feb 2, 2022

Yes. I have installed with brew.

❯ ghz --version
0.106.0

@ahjohannessen
Copy link
Collaborator

Would you mind sharing the ghz command, so I could try locally here? :)

@naoh87
Copy link
Contributor Author

naoh87 commented Feb 2, 2022

❯ ghz --cpus=3 --proto ../scenarios/complex_proto/helloworld.proto -D ../scenarios/complex_proto/payload --insecure localhost:50051 --call helloworld.Greeter.SayHello -z 30s
at grpc_bench/scala_fs2_bench
I was edited scala_fs2_bench/plugins.sbt to use fs2-grpc: 2.4.4 and switch runtime with dependencyOverrides

@ahjohannessen
Copy link
Collaborator

Thanks 👍

@naoh87
Copy link
Contributor Author

naoh87 commented Feb 2, 2022

I am really sorry about that my above benchmark has completely mistook measurement.
I forgot to switch ScalaPB implementation.

@naoh87
Copy link
Contributor Author

naoh87 commented Feb 2, 2022

I retry benchmark. But both implement are almost same peformance.
This PR

Summary:
  Count:	660755
  Total:	60.00 s
  Slowest:	25.08 ms
  Fastest:	0.17 ms
  Average:	2.32 ms
  Requests/sec:	11011.78

Latency distribution:
  10 % in 0.99 ms
  25 % in 1.42 ms
  50 % in 1.98 ms
  75 % in 2.75 ms
  90 % in 4.27 ms
  95 % in 5.31 ms
  99 % in 6.92 ms

#486

Summary:
  Count:	640781
  Total:	60.01 s
  Slowest:	19.69 ms
  Fastest:	0.19 ms
  Average:	2.44 ms
  Requests/sec:	10678.68

Latency distribution:
  10 % in 1.07 ms
  25 % in 1.53 ms
  50 % in 2.12 ms
  75 % in 2.90 ms
  90 % in 4.39 ms
  95 % in 5.46 ms
  99 % in 7.11 ms

@naoh87
Copy link
Contributor Author

naoh87 commented Feb 2, 2022

I was also tried start run(message) at onMessage and give up handle Too many requests but it does not improve performance a lot.

@ahjohannessen
Copy link
Collaborator

@naoh87 What is the Status code distribution: for #486 ?

@naoh87
Copy link
Contributor Author

naoh87 commented Feb 2, 2022

This result is another run of -z 60s, but it's enough to use.

Status code distribution:
  [OK]            640219 responses
  [Canceled]      12 responses
  [Unavailable]   36 responses

I think this result is reasonable, because default ghz request parallelism is 50

@naoh87
Copy link
Contributor Author

naoh87 commented Feb 3, 2022

I benchmarked with -XX:+UnlockDiagnosticVMOptions and -XX:+DebugNonSafepoints and profile with async-profiler.
This shows #493 is less runtime overhead.

#493

Summary:
  Count:	675933
  Total:	60.01 s
  Slowest:	24.32 ms
  Fastest:	0.17 ms
  Average:	2.29 ms
  Requests/sec:	11264.55

Latency distribution:
  10 % in 0.98 ms
  25 % in 1.41 ms
  50 % in 1.96 ms
  75 % in 2.70 ms
  90 % in 4.22 ms
  95 % in 5.24 ms
  99 % in 6.79 ms

#486

Summary:
  Count:	639397
  Total:	60.00 s
  Slowest:	103.10 ms
  Fastest:	0.19 ms
  Average:	2.44 ms
  Requests/sec:	10655.82

Latency distribution:
  10 % in 1.06 ms
  25 % in 1.52 ms
  50 % in 2.11 ms
  75 % in 2.88 ms
  90 % in 4.41 ms
  95 % in 5.49 ms
  99 % in 7.10 ms

https://github.com/naoh87/fs2-grpc-profile/tree/master/docs/2022-02-03

@naoh87
Copy link
Contributor Author

naoh87 commented Feb 3, 2022

./profiler.sh -e cpu -d 30 ... while ghz --cpu=3 ...

493
486

private val Closed: Cancel = () => ()

private def mkListener[F[_]: Async, Request, Response](
run: Stream[F, Request] => Cancel,
Copy link
Contributor Author

@naoh87 naoh87 Feb 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implement stream request Listener like unary one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants