Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reimplement light server runtime #496

Merged
merged 1 commit into from
Feb 13, 2022

Conversation

naoh87
Copy link
Contributor

@naoh87 naoh87 commented Feb 5, 2022

This runtime call to 1 unsafeRun* / RPC.

This PR come from #493

Background #386

* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*/

package fs2.grpc.internal
Copy link
Contributor Author

@naoh87 naoh87 Feb 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Place implement to internal package for binary compat control.
I think Fs2ServerCallHandler I/F is what we really need to care about.
Is this right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is exposed to codegen - everything else should be package private.

@naoh87 naoh87 force-pushed the fast_server_runtime branch from 33f708e to c9356cd Compare February 5, 2022 18:14
@naoh87 naoh87 changed the title implement light server runtime Reimplement light server runtime Feb 5, 2022
@naoh87
Copy link
Contributor Author

naoh87 commented Feb 5, 2022

Benchmark

I benchmarked with ghz --cpus=3 -z 60s ..., and
benchmark scinario is https://github.com/LesnyRumcajs/grpc_bench

This PR:        11075 rps
fs2-grpc 2.4.4:  9129 rps
Akka-gRPC:      11772 rps

This PR reduce 75% performance difference from Akka-gRPC

This PR

Summary:
  Count:	664576
  Total:	60.01 s
  Slowest:	41.06 ms
  Fastest:	0.17 ms
  Average:	2.33 ms
  Requests/sec:	11075.27

Latency distribution:
  10 % in 1.00 ms
  25 % in 1.43 ms
  50 % in 1.99 ms
  75 % in 2.75 ms
  90 % in 4.26 ms
  95 % in 5.30 ms
  99 % in 6.89 ms

fs2-grpc 2.4.4

Summary:
  Count:	547798
  Total:	60.00 s
  Slowest:	40.29 ms
  Fastest:	0.26 ms
  Average:	2.84 ms
  Requests/sec:	9129.40

Latency distribution:
  10 % in 1.29 ms
  25 % in 1.84 ms
  50 % in 2.53 ms
  75 % in 3.38 ms
  90 % in 4.89 ms
  95 % in 6.03 ms
  99 % in 7.88 ms

Akka-gRPC

Summary:
  Count:	706347
  Total:	60.00 s
  Slowest:	33.03 ms
  Fastest:	0.13 ms
  Average:	2.14 ms
  Requests/sec:	11772.13

Latency distribution:
  10 % in 0.89 ms
  25 % in 1.28 ms
  50 % in 1.80 ms
  75 % in 2.51 ms
  90 % in 4.04 ms
  95 % in 5.02 ms
  99 % in 6.57 ms

@ahjohannessen
Copy link
Collaborator

I think the usage of unsafe code is a bit too much. The general mantra for fs2-grpc is to have code that is easy to reason about. I think you can come very far using constructs like Ref and SyncIO and eliminate usage of var. This would make it easier to maintain in the longer run. I know that some parts need change, like sync usage of Dispatcher as it causes context shift. That seems doable with SyncIO and using non-blocking methods on Dispatcher. It would be interesting to see if you could adjust your code to something similar to what is in #486.
I do think that performance is important, but not at all costs when we can get numbers that are close to grpc-java using building blocks from cats-effect and fs2.

@ahjohannessen
Copy link
Collaborator

This library provides some unsafe constructs: https://github.com/davenverse/condemned/blob/main/core/src/main/scala/io/chrisdavenport/condemned/UnsafeDeferred.scala - But I think most of SyncIO + non-blocking methods on dispatcher should get us a long way, the cats-effect constructs are also under more eyes and have higher test coverage.

@ahjohannessen
Copy link
Collaborator

@naoh87 Some ideas in condemned seem to have led to this

private def mkListener[Request, Response](
run: Request => SyncIO[Cancel],
call: Fs2StatefulServerCall[Request, Response],
state: Ref[SyncIO, State[Request]]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced var to Ref for your advice.

@naoh87 naoh87 force-pushed the fast_server_runtime branch from f3be05f to 393eca6 Compare February 6, 2022 10:23
state.get.flatMap(_.cancel.getOrElse(SyncIO.unit)).unsafeRunSync()

override def onMessage(message: Request): Unit =
state.get
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

access instead of get and use modify from it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's not necessary to use access.
Because all ServerCall.Listener methods called simultaneously.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean synchroniously?

Copy link
Contributor Author

@naoh87 naoh87 Feb 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Iam sorry about failed to choose English word.

The caller is free to call an instance from multiple threads, but only one call simultaneously

https://github.com/grpc/grpc-java/blob/v1.44.0/api/src/main/java/io/grpc/ServerCall.java#L42-L56

import UnsafeChannel.State
import scala.collection.immutable.Queue

final class UnsafeChannel[A] extends AtomicReference[State[A]](State.Consumed) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use Ref to implement the state machine?

@naoh87 naoh87 force-pushed the fast_server_runtime branch from d494558 to 0d0ca45 Compare February 6, 2022 13:50
import scala.collection.immutable.Queue

@nowarn
final class UnsafeChannel[A](val ref: Ref[SyncIO, State[A]]) extends AnyVal {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also interesting: https://github.com/davenverse/process/blob/main/core/shared/src/main/scala/io/chrisdavenport/process/structures/UnsafeByteQueue.scala

Seems like some building blocks for unsafe interop are missing in cats-effect

Copy link
Contributor Author

@naoh87 naoh87 Feb 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that.
Methods name comes from fs2 one.
I made this parts for removing unnecessary context shift.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok :)

@ahjohannessen
Copy link
Collaborator

It would probably be a good idea to do a PR that only focuses on unaryTo as the other scenarios have more moving parts and more involved.

@naoh87
Copy link
Contributor Author

naoh87 commented Feb 8, 2022

@ahjohannessen I was wondering if you have had checked the change.

@ahjohannessen
Copy link
Collaborator

@naoh87 I have not had time yet, will look soon

}
}

final class Fs2StatefulServerCall[Request, Response](
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a bit confusing to have this and Fs2ServerCall

@ahjohannessen
Copy link
Collaborator

I need a computer in order to do a review :) on my phone now.

@ahjohannessen
Copy link
Collaborator

Starting to look good :) I have not yet had time to study the code. I wonder what kind of benchmarks and flamegraphs this has compared to main and grpc-java.


import Fs2ServerCall.Cancel

def stream[F[_]](response: Stream[F, Response], dispatcher: Dispatcher[F])(implicit F: Async[F]): SyncIO[Cancel] =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def stream[F[_]](response: Stream[F, Response], dispatcher: Dispatcher[F])(implicit F: Async[F]): SyncIO[Cancel] =
def stream[F[_]](response: Stream[F, Response], dispatcher: Dispatcher[F])(implicit F: Sync[F]): SyncIO[Cancel] =

dispatcher
)

def unary[F[_]](response: F[Response], dispatcher: Dispatcher[F])(implicit F: Async[F]): SyncIO[Cancel] =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def unary[F[_]](response: F[Response], dispatcher: Dispatcher[F])(implicit F: Async[F]): SyncIO[Cancel] =
def unary[F[_]](response: F[Response], dispatcher: Dispatcher[F])(implicit F: Sync[F]): SyncIO[Cancel] =

}

private def closeStreamF[F[_]](status: Status, metadata: Metadata)(implicit F: Sync[F]): F[Unit] =
F.delay(call.close(status, metadata))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
F.delay(call.close(status, metadata))
close(status, metadata).to[F]

def close(status: Status, metadata: Metadata): SyncIO[Unit] =
SyncIO(call.close(status, metadata))

private def run[F[_]](completed: F[Unit], dispatcher: Dispatcher[F])(implicit F: Sync[F]): SyncIO[Cancel] = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps define a local helper inside run to improve readability:

    def handleError(t: Throwable): F[Unit] = t match {
      case ex: StatusException => closeStreamF(ex.getStatus, Option(ex.getTrailers).getOrElse(new Metadata()))
      case ex: StatusRuntimeException => closeStreamF(ex.getStatus, Option(ex.getTrailers).getOrElse(new Metadata()))
      case ex => closeStreamF(Status.INTERNAL.withDescription(ex.getMessage).withCause(ex), new Metadata())
    }

Then:

case Outcome.Errored(e) => handleError(e)

})
SyncIO {
cancel()
()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative: SyncIO(cancel()).void


}

def unary[F[_]: Async, Request, Response](
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def unary[F[_]: Async, Request, Response](
def unary[F[_]: Sync, Request, Response](

startCallSync(call, opt)(call => req => call.unary(impl(req, headers), dispatcher)).unsafeRunSync()
}

def stream[F[_]: Async, Request, Response](
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def stream[F[_]: Async, Request, Response](
def stream[F[_]: Sync, Request, Response](


package fs2.grpc.internal.server

import cats.effect.Async
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import cats.effect.Async
import cats.effect.Sync

)(f: Fs2ServerCall[Request, Response] => Request => SyncIO[Cancel]): SyncIO[ServerCall.Listener[Request]] = {
for {
call <- Fs2ServerCall.setup(options, call)
_ <- call.request(2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a comment to why two requests are done.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is what grpc-java does:

// We expect only 1 request, but we ask for 2 requests here so that if a misbehaving client
// sends more than 1 requests, ServerCall will catch it...

from ServerCalls - A future maintainer would be happy to learn why this is.

import io.grpc._
import fs2._

object Fs2ServerCall {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
object Fs2ServerCall {
private[server] object Fs2ServerCall {

* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*/

package fs2.grpc.internal.server
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move internal in under server.

}
}

final class Fs2ServerCall[Request, Response](
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
final class Fs2ServerCall[Request, Response](
private[server] final class Fs2ServerCall[Request, Response](

import fs2.grpc.server.ServerOptions
import io.grpc._

object Fs2UnaryServerCallHandler {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
object Fs2UnaryServerCallHandler {
private[server] object Fs2UnaryServerCallHandler {

run(
response.pull.peek1
.flatMap {
case Some((_, tail)) =>
Copy link
Collaborator

@ahjohannessen ahjohannessen Feb 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling this tail implies that head of response is not in there, which it is tl.cons(hd)), i.e.:

uncons.flatMap {
  case None           => Pull.pure(None)
  case Some((hd, tl)) => Pull.pure(Some((hd(0), tl.cons(hd))))
}

Suggestion: rename tail to stream or similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good.

@naoh87
Copy link
Contributor Author

naoh87 commented Feb 11, 2022

async-profiler results

244
496
0scalapb

@naoh87 naoh87 force-pushed the fast_server_runtime branch from 6dd15b8 to fe1fe39 Compare February 11, 2022 05:15
@naoh87
Copy link
Contributor Author

naoh87 commented Feb 13, 2022

Unary Benchmark

1-cpu time restricted server benchmark on Linux(kernel 5.4.0).
I run ghq from MBP with ghq --cpu=4 --connections=5 ...

#496

actual cpu used: 0.90 core
Summary:
  Count:	647800
  Total:	60.00 s
  Slowest:	37.50 ms
  Fastest:	0.24 ms
  Average:	2.12 ms
  Requests/sec:	10796.08

v2.4.4

actual cpu used: 1.0 core
Summary:
  Count:	315765
  Total:	60.00 s
  Slowest:	73.58 ms
  Fastest:	0.30 ms
  Average:	7.65 ms
  Requests/sec:	5262.45

ScalaPB

actual cpu used: 0.75 core
Summary:
  Count:	644442
  Total:	60.00 s
  Slowest:	30.59 ms
  Fastest:	0.23 ms
  Average:	2.12 ms
  Requests/sec:	10740.29

ServerStreaming Benchmark

Benchmark with same condition with unary benchmark
fs2 implement

Stream.emit(HelloReply(request.request)).repeatN(100)

ScalaPB implement

(0 until 100).foreach(_ => responseObserver.onNext(HelloReply(request.request)))
responseObserver.onCompleted()

#496

actual cpu used: 1.0 core
Summary:
  Count:	79699
  Total:	60.00 s
  Slowest:	104.91 ms
  Fastest:	1.54 ms
  Average:	33.53 ms
  Requests/sec:	1328.28

v2.4.4

actual cpu used: 1.0 core
Summary:
  Count:	72239
  Total:	60.00 s
  Slowest:	120.49 ms
  Fastest:	1.92 ms
  Average:	37.31 ms
  Requests/sec:	1203.94

ScalaPB

actual cpu used: 0.85 core
Summary:
  Count:	131386
  Total:	60.01 s
  Slowest:	133.41 ms
  Fastest:	1.71 ms
  Average:	16.36 ms
  Requests/sec:	2189.54

import fs2.grpc.server.ServerCallOptions
import io.grpc._

private[server] object Fs2ServerCall {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps place this inside the handler as it is only a helper.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use this for streaming handler after this PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

def init[A](cb: A => SyncIO[Cancel]): SyncIO[Ref[SyncIO, Context[A]]] =
Ref[SyncIO].of[Context[A]](BeforeCall(cb, None))
}
case class BeforeCall[A](cb: A => SyncIO[Cancel], request: Option[A]) extends Context[A] {
Copy link
Collaborator

@ahjohannessen ahjohannessen Feb 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is a bit hard to understand when looking at the fields.

Copy link
Contributor Author

@naoh87 naoh87 Feb 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about BeforeCall(callback: ..., received: ...) ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to split that into two cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. I don't speak English well.
I couldn't remember what two case indicate.
Could you mind tell me another easy explanation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two cases express A => SyncIO[Cancel] will not called after Called if two cases means BeforeCall and Called

Copy link
Collaborator

@ahjohannessen ahjohannessen Feb 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it makes sense to split it up more, but the cases are at least

  • cancelled
  • request message pending
  • request message received and half-closed pending
  • request message received and half-closed received
  • call-completed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a few state to hold over Listener methods.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that mean it is more better split like BeforeReceive, Received, Called?
It looks good.

Yes

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a few state to hold over Listener methods.

Not sure what you mean by that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ref is used by saving state over ServerCall.Listener method.
I thought this doesn't introduce a bunch of state.


import Fs2ServerCall.Cancel

sealed trait Context[A]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not more like a call status? I wonder if more explicit cases would make the code easier to reason about

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you think CallerState?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is more descriptive 👍

@ahjohannessen
Copy link
Collaborator

ahjohannessen commented Feb 13, 2022

I think we are good, if you rebase and squash I’ll merge. Do the extra states have a measurable impact on req/s ?

@naoh87 naoh87 force-pushed the fast_server_runtime branch from 72cf19c to 78462a1 Compare February 13, 2022 20:27
@naoh87
Copy link
Contributor Author

naoh87 commented Feb 13, 2022

I rebased to one commit and theres was no measurable performance impact.

// We expect only 1 request, but we ask for 2 requests here so that if a misbehaving client
// sends more than 1 requests, ServerCall will catch it.
_ <- call.request(2)
ctx <- CallerState.init(f(call))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

state

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

state.set(Cancelled()) >> call.close(status, new Metadata())
}

def unary[F[_]: Async, Request, Response](
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Async needed?

startCallSync(call, opt)(call => req => call.unary(impl(req, headers), dispatcher)).unsafeRunSync()
}

def stream[F[_]: Async, Request, Response](
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Async needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sync is enough.

@naoh87 naoh87 force-pushed the fast_server_runtime branch 2 times, most recently from 920a3a3 to fecc2ae Compare February 13, 2022 21:46
@naoh87 naoh87 force-pushed the fast_server_runtime branch from fecc2ae to c6b4894 Compare February 13, 2022 21:47
@ahjohannessen ahjohannessen merged commit 867782d into typelevel:main Feb 13, 2022
@ahjohannessen
Copy link
Collaborator

@naoh87 I have cut a release with your changes. Btw, fs2.grpc.client.ClientSuite.single message to unaryToStreaming is unstable, I think a 50.millis sleep makes it more reliable.

@ahjohannessen
Copy link
Collaborator

@naoh87 Might be interesting to PR an update to https://github.com/LesnyRumcajs/grpc_bench with latest release :)

@naoh87
Copy link
Contributor Author

naoh87 commented Feb 18, 2022

I will do it later.
And fs2-grpc will be gained more performance to use directExecutor safely after all api become non-blocking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants