Very low throughput - Spring Webflux Netty vs Gatling #650

sercasti · 2019-03-18T01:56:02Z

Expected behavior

A vainilla Hello World example of Spring WebFlux with Netty should outperform Spring WebFlux with Tomcat, easily.

Actual behavior

After 900 active gatling users, netty starts spitting ConnectTimeoutException

21:41:56.162 [WARN ] i.g.h.e.GatlingHttpListener - Request 'reactiveRequest' failed for user 1231
io.netty.channel.ConnectTimeoutException: connection timed out: localhost/127.0.0.1:8080
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:267)
	at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
	at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:127)

Same test against the same controller but using webflux over tomcat has 100% success. Maybe I'm missing some configuring to make netty scale, but I couldn't find any.

Steps to reproduce

Use this project or start a vainilla netty reactor service: https://github.com/sercasti/demoReactive
All it does is start a rest endpoint with this code:

@GetMapping("/reactive")
public Mono<String> getMonoSha512Hex(@RequestParam String id) throws IOException {
	String pathToFile = "/Users/sergio/Downloads/fotos/FOTO" + id + ".JPG";
	return Mono.defer(() -> {
		return Mono.fromCallable(() -> {
	FileInputStream fileInputStream = null;
        String sha512Hex = null;
        try {
          fileInputStream = new FileInputStream(new File(pathToFile));
          sha512Hex = DigestUtils.sha512Hex(fileInputStream);
        } finally {
          fileInputStream.close();
        }
        return sha512Hex;
    });
  });
}

Then execute this gatling test:
https://github.com/sercasti/gatlingStressTest/blob/master/src/test/scala/baeldung/RecordedSimulation.scala

class RecordedSimulation extends Simulation {
  var name="reactive"
  var url="http://localhost:8080"
  val idFeeder = Iterator.continually(
    Map("randomId" -> Random.nextInt(5000))
  )
  
  val httpProtocol = http
    .baseUrl(url)
  val scn = scenario(name)
      .feed(idFeeder)
      .exec(http(name + "Request").get("/" + name + "?id=${randomId}")
      .check(status.is(200)))
  setUp(    
      scn.inject(rampUsersPerSec(10) to 300 during (60 seconds))
  ).protocols(httpProtocol)
}

class NettySimulation extends ServiceSimulation {
    name="reactive"
	url="http://localhost:8080"
}

You will see that after the first 1000 requests are successfully attended to, starts to blow up.

Simulation results folder: https://github.com/sercasti/gatlingStressTest/tree/master/target/gatling/recordedsimulation-20190318004124114

Reactor Netty version

0.8.5

JVM version (e.g. `java -version`)

1.8.0_181

OS version (e.g. `uname -a`)

Mojave Darwin Kernel Version 18.2.0

The text was updated successfully, but these errors were encountered:

bsideup · 2019-03-18T07:32:29Z

Hi @sercasti,

Mono.defer can be removed, Mono.fromCallable already guarantees the laziness
you're doing a blocking operation (File I/O) inside a non-blocking Netty's thread
sha512Hex is a very CPU heavy operation, it is better to do it in a separate thread pool (see subscribeOn operator) to avoid Netty's thread starvation

sercasti · 2019-03-18T11:44:33Z

Thanks for your quick reply!
I'll fix 1. right away
As per 2. and 3., what would you recommend as a dummy operation to generate non blocking load? I'm trying to generate a scenario that will show the benefits of reactive vs blocking

rstoyanchev · 2019-03-18T13:19:08Z

If you return the file as a org.springframework.core.io.Resource you will benefit from zero copy file transfer. Of course the file contents would have to have been encoded already.

To demonstrate the benefits you need to introduce some latency due to I/O (e.g. remote call) in the handling of the request.

gihad · 2019-03-18T21:53:45Z

@sercasti Can you report back with your findings? Back when I was evaluating this the performance was really low compared to Vertx but I'm still hoping it will improve.

sercasti · 2019-03-18T23:36:28Z

@gihad I keep trying, but getting the same results.
Following the advice from this conversation, I switched to a non blocking expensive computation (5000 digits of Pi) and webflux+tomcat is still outperforming webflux+netty, additionally, netty is throwing i.n.c.ConnectTimeoutException on high load.

https://github.com/sercasti/demoReactive/blob/PI/src/main/java/com/example/demo/controller/Controller.java

bsideup · 2019-03-19T06:46:52Z

@sercasti but you didn't move the computation to a separate thread pool with subscribeOn. CPU heavy computations are as bad as blocking IO.

violetagg · 2019-03-19T08:15:45Z

@sercasti Let me explain in details what you are observing:
You are using Spring WebFlux with runtime Netty and runtime Tomcat.
When you use Spring WebFlux it is guaranteed that Tomcat will be used with the non-blocking functionality that is provided by Servlet 3.1. However by default Tomcat comes with a thread pool of 200 thread, while Netty will be configured to use threads number equal to the cores that you have.
As your scenarios is with CPU heavy computations, the threads number matters as @bsideup already told you.

Now can you execute the following scenarios:

Tell us CPU/Memory/Thread consumption when you are using WebFlux with Netty
Tell us CPU/Memory/Thread consumption when you are using WebFlux with Tomcat with default 200 threads
Tell us CPU/Memory/Thread consumption when you are using WebFlux with Tomcat with threads number equals to the cores that you have.

I hope that when you measure, you use two different machines/VMs for the client and the server in order to have real results)

Regards, Violeta

dave-fl · 2019-03-19T22:58:03Z

Your limit will always be how fast your thread pool can complete work + the time for the system to context switch to join on when the work is ready. You should be publishing on the worker thread pool and not interrupting the event loop.

I would add some logic to allow some results to get cached or perform some non blocking work. You can use a fixed delay of say 50ms (Mono.delay) and return something instantly while hitting your server hard. Also, don't run gatling and the server on the same machine.

smaldini · 2019-04-12T19:57:14Z

Closing for now unless there are some additional details you want to share @sercasti

gihad · 2019-04-12T20:08:51Z

@smaldini I believe the underlining performance issue is still not addressed. See #392 where Webflux performance is 10x slower then Vertx.

violetagg · 2019-04-12T20:14:23Z

@gihad This issue is closed based on the scenario that @sercasti tries to measure.
#650 (comment)

smaldini · 2019-04-13T00:16:21Z

@gihad part of the problem is currently addressed by WebFlux for 5.2 already (content negotiation optimizations, encoder/decoder etc) and we have a few improvements which are coming, some in 0.8 and the rest in 0.9.

One area we are particularly exposed on in benchmarks (like #654) is single body responses. We are doing multiple flushes for what could be one single network flush (chunked encoding, flush headers, then flush body, then flush last http). I noticed locally that detecting those single body responses and ship them with one flush + content length we are pretty similar in latency and req/s. We'll probably include that change for our next versions while continuing with various other optimizations.

Vert.x

Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.86ms  833.59us  21.52ms   72.73%
    Req/Sec     1.22k     2.96k    8.52k    85.47%
  Latency Distribution (HdrHistogram - Recorded Latency)
 50.000%    1.75ms
 75.000%    2.30ms
 90.000%    2.91ms
 99.000%    4.05ms
 99.900%    8.04ms
 99.990%   12.22ms
 99.999%   19.26ms
100.000%   21.53ms
...
Requests/sec:  58372.04
Transfer/sec:    572.38MB

Us with sendObject optimization (one flush) (off 0.8 branch)

Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.96ms    1.33ms  35.20ms   92.02%
    Req/Sec   418.33      1.83k    8.73k    95.02%
  Latency Distribution (HdrHistogram - Recorded Latency)
 50.000%    1.75ms
 75.000%    2.31ms
 90.000%    2.95ms
 99.000%    6.88ms
 99.900%   18.32ms
 99.990%   21.42ms
 99.999%   31.50ms
100.000%   35.23ms
...
Requests/sec:  58377.35
Transfer/sec:    573.88MB

gihad · 2019-04-13T12:40:15Z

Thanks @smaldini, this is very encouraging.
Since you guys are looking into performance optimization it seems like a matter of time until the performance is on par.

smaldini added the for/stackoverflow Questions are best asked on SO or Gitter label Mar 21, 2019

smaldini closed this as completed Apr 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Very low throughput - Spring Webflux Netty vs Gatling #650

Very low throughput - Spring Webflux Netty vs Gatling #650

sercasti commented Mar 18, 2019 •

edited

Loading

bsideup commented Mar 18, 2019

sercasti commented Mar 18, 2019

rstoyanchev commented Mar 18, 2019

gihad commented Mar 18, 2019

sercasti commented Mar 18, 2019

bsideup commented Mar 19, 2019

violetagg commented Mar 19, 2019

dave-fl commented Mar 19, 2019

smaldini commented Apr 12, 2019

gihad commented Apr 12, 2019 •

edited

Loading

violetagg commented Apr 12, 2019

smaldini commented Apr 13, 2019 •

edited

Loading

gihad commented Apr 13, 2019

Very low throughput - Spring Webflux Netty vs Gatling #650

Very low throughput - Spring Webflux Netty vs Gatling #650

Comments

sercasti commented Mar 18, 2019 • edited Loading

Expected behavior

Actual behavior

Steps to reproduce

Reactor Netty version

JVM version (e.g. java -version)

OS version (e.g. uname -a)

bsideup commented Mar 18, 2019

sercasti commented Mar 18, 2019

rstoyanchev commented Mar 18, 2019

gihad commented Mar 18, 2019

sercasti commented Mar 18, 2019

bsideup commented Mar 19, 2019

violetagg commented Mar 19, 2019

dave-fl commented Mar 19, 2019

smaldini commented Apr 12, 2019

gihad commented Apr 12, 2019 • edited Loading

violetagg commented Apr 12, 2019

smaldini commented Apr 13, 2019 • edited Loading

gihad commented Apr 13, 2019

sercasti commented Mar 18, 2019 •

edited

Loading

JVM version (e.g. `java -version`)

OS version (e.g. `uname -a`)

gihad commented Apr 12, 2019 •

edited

Loading

smaldini commented Apr 13, 2019 •

edited

Loading