Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding issue #1119

Closed
srenault opened this issue Dec 2, 2021 · 6 comments · Fixed by #1120
Closed

Encoding issue #1119

srenault opened this issue Dec 2, 2021 · 6 comments · Fixed by #1120
Labels
bug Something isn't working

Comments

@srenault
Copy link

srenault commented Dec 2, 2021

Bug Description

For a single request, which as always the same response, I'm having an encoding issue that appears only from time to time.
In the response, I'm having this character � instead of special characters like é or à etc...
The encoding issue doesn't appear for all special characters in the response.
Over the time, it's not always the same characters that are impacted by the encoding issue.

Reproducible By

I used this simple gist to reproduce the issue:
https://gist.github.com/srenault/654c0179277a29b1c798461b20d5ec44

Logs & Screenshots

Screenshot 2021-12-02 at 20 43 58

Environment

Node v14.16.0

Additional context

If I reset the globalDispatcher with a new Agent() before each call of request, I don't have the issue.

@srenault srenault added the bug Something isn't working label Dec 2, 2021
@srenault srenault changed the title Encoding issue with the Default Encoding issue Dec 2, 2021
@koh110
Copy link
Contributor

koh110 commented Dec 3, 2021

We have same issue.

I think this problem is due to the lack of consideration for multibyte characters.

This is my reproduction code.

import { createServer } from "http";
import { Client } from "undici";
import { once } from "events";
import assert from "assert";

const data = "あ".repeat(10000000);

const server = createServer((request, response) => {
  response.end(JSON.stringify({ data }));
}).listen();

await once(server, "listening");

const client = new Client(`http://localhost:${server.address().port}`);

try {
  const { body, headers, statusCode } = await client.request({
    path: "/",
    method: "GET",
  });
  console.log(`response received ${statusCode}`);
  console.log("headers", headers);
  // body.setEncoding('utf8')
  const json = await body.json();
  assert.strictEqual(json.data, data, "multi byte");
} catch (error) {
  console.error(Object.keys(error), error.code);
} finally {
  client.close();
  server.close();
}

It looks like the bug will be fixed.

diff --git a/lib/api/readable.js b/lib/api/readable.js
index 0580f42..3bb454e 100644
--- a/lib/api/readable.js
+++ b/lib/api/readable.js
@@ -219,9 +219,9 @@ function consumeEnd (consume) {

   try {
     if (type === 'text') {
-      resolve(body.join(''))
+      resolve(Buffer.concat(body))
     } else if (type === 'json') {
-      resolve(JSON.parse(body.join('')))
+      resolve(JSON.parse(Buffer.concat(body)))
     } else if (type === 'arrayBuffer') {
       const dst = new Uint8Array(length)

test code

diff --git a/test/client-request.js b/test/client-request.js
index ebb105e..3f8c128 100644
--- a/test/client-request.js
+++ b/test/client-request.js
@@ -210,6 +210,27 @@ test('request json', (t) => {
   })
 })

+test('request long multi byte json', (t) => {
+  t.plan(1)
+
+  const obj = { asd: 'あ'.repeat(100000) }
+  const server = createServer((req, res) => {
+    res.end(JSON.stringify(obj))
+  })
+  t.teardown(server.close.bind(server))
+
+  server.listen(0, async () => {
+    const client = new Client(`http://localhost:${server.address().port}`)
+    t.teardown(client.destroy.bind(client))
+
+    const { body } = await client.request({
+      path: '/',
+      method: 'GET'
+    })
+    t.strictSame(obj, await body.json())
+  })
+})

Can I send PR? @ronag @mcollina (you look like an active maintainer. Please let me know if there is an appropriate destination.)

@mcollina
Copy link
Member

mcollina commented Dec 3, 2021

Thanks for producing this example.

I think this is a bug in Node streams and we got a reproduction.

The proposed fix is correct but significantly slower than the string-based alternative :(.

@mcollina
Copy link
Member

mcollina commented Dec 3, 2021

Anyway, a PR will be highly welcomed.

@mcollina
Copy link
Member

mcollina commented Dec 3, 2021

Does uncommenting body.setEncoding('utf8') fix the problem?

@koh110
Copy link
Contributor

koh110 commented Dec 3, 2021

I thought it was caused by body.setEncoding('utf8') too. But unfortunately that didn't fix the problem 😢

@koh110
Copy link
Contributor

koh110 commented Dec 3, 2021

I will try to create a PR for the time being. I would be grateful if you could review it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants