Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very low throughput - Spring Webflux Netty vs Gatling #650

Closed
sercasti opened this issue Mar 18, 2019 · 13 comments
Closed

Very low throughput - Spring Webflux Netty vs Gatling #650

sercasti opened this issue Mar 18, 2019 · 13 comments
Labels
for/stackoverflow Questions are best asked on SO or Gitter

Comments

@sercasti
Copy link

sercasti commented Mar 18, 2019

Expected behavior

A vainilla Hello World example of Spring WebFlux with Netty should outperform Spring WebFlux with Tomcat, easily.

Actual behavior

After 900 active gatling users, netty starts spitting ConnectTimeoutException

21:41:56.162 [WARN ] i.g.h.e.GatlingHttpListener - Request 'reactiveRequest' failed for user 1231
io.netty.channel.ConnectTimeoutException: connection timed out: localhost/127.0.0.1:8080
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:267)
	at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
	at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:127)

alt text

Same test against the same controller but using webflux over tomcat has 100% success. Maybe I'm missing some configuring to make netty scale, but I couldn't find any.

Steps to reproduce

Use this project or start a vainilla netty reactor service: https://github.com/sercasti/demoReactive
All it does is start a rest endpoint with this code:

@GetMapping("/reactive")
public Mono<String> getMonoSha512Hex(@RequestParam String id) throws IOException {
	String pathToFile = "/Users/sergio/Downloads/fotos/FOTO" + id + ".JPG";
	return Mono.defer(() -> {
		return Mono.fromCallable(() -> {
	FileInputStream fileInputStream = null;
        String sha512Hex = null;
        try {
          fileInputStream = new FileInputStream(new File(pathToFile));
          sha512Hex = DigestUtils.sha512Hex(fileInputStream);
        } finally {
          fileInputStream.close();
        }
        return sha512Hex;
    });
  });
}

Then execute this gatling test:
https://github.com/sercasti/gatlingStressTest/blob/master/src/test/scala/baeldung/RecordedSimulation.scala

class RecordedSimulation extends Simulation {
  var name="reactive"
  var url="http://localhost:8080"
  val idFeeder = Iterator.continually(
    Map("randomId" -> Random.nextInt(5000))
  )
  
  val httpProtocol = http
    .baseUrl(url)
  val scn = scenario(name)
      .feed(idFeeder)
      .exec(http(name + "Request").get("/" + name + "?id=${randomId}")
      .check(status.is(200)))
  setUp(    
      scn.inject(rampUsersPerSec(10) to 300 during (60 seconds))
  ).protocols(httpProtocol)
}

class NettySimulation extends ServiceSimulation {
    name="reactive"
	url="http://localhost:8080"
}

You will see that after the first 1000 requests are successfully attended to, starts to blow up.
alt text

Simulation results folder: https://github.com/sercasti/gatlingStressTest/tree/master/target/gatling/recordedsimulation-20190318004124114

Reactor Netty version

0.8.5

JVM version (e.g. java -version)

1.8.0_181

OS version (e.g. uname -a)

Mojave Darwin Kernel Version 18.2.0

@bsideup
Copy link
Contributor

bsideup commented Mar 18, 2019

Hi @sercasti,

  1. Mono.defer can be removed, Mono.fromCallable already guarantees the laziness
  2. you're doing a blocking operation (File I/O) inside a non-blocking Netty's thread
  3. sha512Hex is a very CPU heavy operation, it is better to do it in a separate thread pool (see subscribeOn operator) to avoid Netty's thread starvation

@sercasti
Copy link
Author

Thanks for your quick reply!
I'll fix 1. right away
As per 2. and 3., what would you recommend as a dummy operation to generate non blocking load? I'm trying to generate a scenario that will show the benefits of reactive vs blocking

@rstoyanchev
Copy link
Contributor

If you return the file as a org.springframework.core.io.Resource you will benefit from zero copy file transfer. Of course the file contents would have to have been encoded already.

To demonstrate the benefits you need to introduce some latency due to I/O (e.g. remote call) in the handling of the request.

@gihad
Copy link

gihad commented Mar 18, 2019

@sercasti Can you report back with your findings? Back when I was evaluating this the performance was really low compared to Vertx but I'm still hoping it will improve.

@sercasti
Copy link
Author

@gihad I keep trying, but getting the same results.
Following the advice from this conversation, I switched to a non blocking expensive computation (5000 digits of Pi) and webflux+tomcat is still outperforming webflux+netty, additionally, netty is throwing i.n.c.ConnectTimeoutException on high load.

https://github.com/sercasti/demoReactive/blob/PI/src/main/java/com/example/demo/controller/Controller.java
Screen Shot 2019-03-18 at 20 35 08

@bsideup
Copy link
Contributor

bsideup commented Mar 19, 2019

@sercasti but you didn't move the computation to a separate thread pool with subscribeOn. CPU heavy computations are as bad as blocking IO.

@violetagg
Copy link
Member

@sercasti Let me explain in details what you are observing:
You are using Spring WebFlux with runtime Netty and runtime Tomcat.
When you use Spring WebFlux it is guaranteed that Tomcat will be used with the non-blocking functionality that is provided by Servlet 3.1. However by default Tomcat comes with a thread pool of 200 thread, while Netty will be configured to use threads number equal to the cores that you have.
As your scenarios is with CPU heavy computations, the threads number matters as @bsideup already told you.

Now can you execute the following scenarios:

  • Tell us CPU/Memory/Thread consumption when you are using WebFlux with Netty
  • Tell us CPU/Memory/Thread consumption when you are using WebFlux with Tomcat with default 200 threads
  • Tell us CPU/Memory/Thread consumption when you are using WebFlux with Tomcat with threads number equals to the cores that you have.

I hope that when you measure, you use two different machines/VMs for the client and the server in order to have real results)

Regards, Violeta

@dave-fl
Copy link

dave-fl commented Mar 19, 2019

Your limit will always be how fast your thread pool can complete work + the time for the system to context switch to join on when the work is ready. You should be publishing on the worker thread pool and not interrupting the event loop.

I would add some logic to allow some results to get cached or perform some non blocking work. You can use a fixed delay of say 50ms (Mono.delay) and return something instantly while hitting your server hard. Also, don't run gatling and the server on the same machine.

@smaldini smaldini added the for/stackoverflow Questions are best asked on SO or Gitter label Mar 21, 2019
@smaldini
Copy link
Contributor

Closing for now unless there are some additional details you want to share @sercasti

@gihad
Copy link

gihad commented Apr 12, 2019

@smaldini I believe the underlining performance issue is still not addressed. See #392 where Webflux performance is 10x slower then Vertx.

@violetagg
Copy link
Member

@gihad This issue is closed based on the scenario that @sercasti tries to measure.
#650 (comment)

@smaldini
Copy link
Contributor

smaldini commented Apr 13, 2019

@gihad part of the problem is currently addressed by WebFlux for 5.2 already (content negotiation optimizations, encoder/decoder etc) and we have a few improvements which are coming, some in 0.8 and the rest in 0.9.

One area we are particularly exposed on in benchmarks (like #654) is single body responses. We are doing multiple flushes for what could be one single network flush (chunked encoding, flush headers, then flush body, then flush last http). I noticed locally that detecting those single body responses and ship them with one flush + content length we are pretty similar in latency and req/s. We'll probably include that change for our next versions while continuing with various other optimizations.

Vert.x

Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.86ms  833.59us  21.52ms   72.73%
    Req/Sec     1.22k     2.96k    8.52k    85.47%
  Latency Distribution (HdrHistogram - Recorded Latency)
 50.000%    1.75ms
 75.000%    2.30ms
 90.000%    2.91ms
 99.000%    4.05ms
 99.900%    8.04ms
 99.990%   12.22ms
 99.999%   19.26ms
100.000%   21.53ms
...
Requests/sec:  58372.04
Transfer/sec:    572.38MB

Us with sendObject optimization (one flush) (off 0.8 branch)

Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.96ms    1.33ms  35.20ms   92.02%
    Req/Sec   418.33      1.83k    8.73k    95.02%
  Latency Distribution (HdrHistogram - Recorded Latency)
 50.000%    1.75ms
 75.000%    2.31ms
 90.000%    2.95ms
 99.000%    6.88ms
 99.900%   18.32ms
 99.990%   21.42ms
 99.999%   31.50ms
100.000%   35.23ms
...
Requests/sec:  58377.35
Transfer/sec:    573.88MB

@gihad
Copy link

gihad commented Apr 13, 2019

Thanks @smaldini, this is very encouraging.
Since you guys are looking into performance optimization it seems like a matter of time until the performance is on par.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
for/stackoverflow Questions are best asked on SO or Gitter
Projects
None yet
Development

No branches or pull requests

7 participants