Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node v16 vs v20/v21 regression on http perf #127

Open
mcollina opened this issue Oct 19, 2023 · 7 comments
Open

Node v16 vs v20/v21 regression on http perf #127

mcollina opened this issue Oct 19, 2023 · 7 comments

Comments

@mcollina
Copy link
Member

This server:

'use strict'

const server = require('http').createServer(function (req, res) {
  res.setHeader('content-type', 'application/json; charset=utf-8')
  res.end(JSON.stringify({ hello: 'world' }))
})

server.listen(3000)

gives me around 55k req/s on v21, v20, v18, while it gives me 65k req/s in v16 and v14.

I use the following to test but you should be able to get similar results with wrk too:

autocannon -c 100 -d 5 -p 10 -W [ -c 10 -d 2 -p 2 ] localhost:3000

This might be due to #79.

@targos
Copy link
Member

targos commented Oct 19, 2023

Have you tried cpu profiling or other diagnostics?

@mcollina
Copy link
Member Author

mcollina commented Oct 19, 2023

Not really, a bisect would be interesting anyway, but I suspect it would be a compound effect.

@H4ad
Copy link
Member

H4ad commented Oct 20, 2023

A simple --cpu-prof:

node --cpu-prof test-server.js & autocannon -c 100 -d 5 -p 10 -W [ -c 10 -d 2 -p 2 ] localhost:3000
test-server.js
'use strict'

const data = JSON.stringify({ hello: 'world' });
const server = require('http').createServer(function (req, res) {
  res.setHeader('content-type', 'application/json; charset=utf-8')
  res.end(data)
})

server.listen(3000)

setTimeout(() => {
  server.close();
}, 8000);

16.20.1

image

507k requests in 5.01s, 94.6 MB read

20.8.1

image

400k requests in 5.01s, 74.5 MB read

server-on-16-20-1.cpuprofile
server-on-20-8-1.cpuprofile

@H4ad
Copy link
Member

H4ad commented Oct 20, 2023

I did a little exploration on writev_generic and I found some interesting things:

The method is called with the following data:

console.log(data)
[
  {
    chunk: 'HTTP/1.1 200 OK\r\n' +
      'content-type: application/json; charset=utf-8\r\n' +
      'Date: Fri, 20 Oct 2023 01:04:24 GMT\r\n' +
      'Connection: keep-alive\r\n' +
      'Keep-Alive: timeout=5\r\n' +
      'Content-Length: 17\r\n' +
      '\r\n' +
      '{"hello":"world"}',
    encoding: 'utf8',
    callback: [Function: nop]
  },
  {
    chunk: '',
    encoding: 'latin1',
    callback: [Function: bound onFinish]
  },
  allBuffers: false
]

Later, callback property is not discarded before calling C++

I also tested a little app in Fastify, the result was literally the same.

In Express, the result was different:

console.log(data)
[
  {
    chunk: 'HTTP/1.1 200 OK\r\n' +
      'X-Powered-By: Express\r\n' +
      'Content-Type: application/json; charset=utf-8\r\n' +
      'Content-Length: 17\r\n' +
      'ETag: W/"11-IkjuL6CqqtmReFMfkkvwC0sKj04"\r\n' +
      'Date: Fri, 20 Oct 2023 01:56:39 GMT\r\n' +
      'Connection: keep-alive\r\n' +
      'Keep-Alive: timeout=5\r\n' +
      '\r\n',
    encoding: 'latin1',
    callback: [Function: nop]
  },
  {
    chunk: <Buffer 7b 22 68 65 6c 6c 6f 22 3a 22 77 6f 72 6c 64 22 7d>,
    encoding: 'buffer',
    callback: [Function: nop]
  },
  {
    chunk: '',
    encoding: 'latin1',
    callback: [Function: bound onFinish]
  },
  allBuffers: false
---

Based on these payloads, I tried the following patch:

fast-path-utf8.patch
commit 1cbbb5ce2cdfa53caa8ff2771150598fa8fd0099
Author: Vinícius Lourenço <contact@viniciusl.com.br>
Date:   Thu Oct 19 23:12:19 2023 -0300

    lib: fast path for single utf8 chunk

diff --git a/lib/internal/stream_base_commons.js b/lib/internal/stream_base_commons.js
index 1555956aff..1bbf45313b 100644
--- a/lib/internal/stream_base_commons.js
+++ b/lib/internal/stream_base_commons.js
@@ -120,8 +120,21 @@ function createWriteWrap(handle, callback) {
 }
 
 function writevGeneric(self, data, cb) {
+  if (data.length === 2 && data[0].encoding === 'utf8' && data[1].encoding === 'latin1' && data[1].chunk === '') {
+    return writevGenericFastSingleUtf8(self, data, cb);
+  }
   const req = createWriteWrap(self[kHandle], cb);
-  const allBuffers = data.allBuffers;
+  const chunks = buildChunks(data, data.allBuffers);
+  const err = req.handle.writev(req, chunks, data.allBuffers);
+
+  // Retain chunks
+  if (err === 0) req._chunks = chunks;
+
+  afterWriteDispatched(req, err, cb);
+  return req;
+}
+
+function buildChunks(data, allBuffers) {
   let chunks;
   if (allBuffers) {
     chunks = data;
@@ -135,10 +148,16 @@ function writevGeneric(self, data, cb) {
       chunks[i * 2 + 1] = entry.encoding;
     }
   }
-  const err = req.handle.writev(req, chunks, allBuffers);
+  return chunks;
+}
+
+function writevGenericFastSingleUtf8(self, data, cb) {
+  const req = createWriteWrap(self[kHandle], cb);
+
+  const err = req.handle.writeUtf8String(req, data[0].chunk);
 
   // Retain chunks
-  if (err === 0) req._chunks = chunks;
+  if (err === 0) req._chunks = buildChunks(data, data.allBuffers);
 
   afterWriteDispatched(req, err, cb);
   return req;

And testing again using test-server.js, I was able to get up to 420k~450k req/s (from 400k req/s).


Although the benchmark works with the patch, the tests didn't pass, so I even tried to open a PR, I just want to raise attention to these functions because maybe we can explore some fast-path or find other optimizations by having people look at it.

Side note: @anonrig Maybe FastAPI can be used for WriteString but I'm not sure (I'm not a C++ guy), so take a look and see if you find something interesting.

@mcollina
Copy link
Member Author

mcollina commented Oct 20, 2023

@H4ad can you paste your full benchmarks? From what you said, it seems there is no regression between Node.js v16 and v20, which is surprising. What OS and CPU architecture are you on?

@H4ad
Copy link
Member

H4ad commented Oct 20, 2023

@mcollina Sorry about that, I inverted the req/ops between the versions, I updated my comment.

@debadree25
Copy link
Member

FastAPI can be used for WriteString but I'm not sure (I'm not a C++ guy), so take a look and see if you find something interesting.

In this function at lines https://github.com/nodejs/node/blob/eed33c9deaa336e76c5385a077490107c6110f08/src/stream_base.cc#L394 and https://github.com/nodejs/node/blob/eed33c9deaa336e76c5385a077490107c6110f08/src/stream_base.cc#L416 it includes CHECKs so if we are to implement fast api I guess we would see side effects (of the data being written twice ig) in case of an error maybe there could be a way to have this checks at the beginning of the method 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants