Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: add tests for invalid UTF-8 #40351

Conversation

git-srinivas
Copy link
Contributor

@git-srinivas git-srinivas commented Oct 6, 2021

Fixes: #39804

  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • make lint passes

@nodejs-github-bot nodejs-github-bot added the needs-ci PRs that need a full CI run. label Oct 6, 2021
@Mesteery Mesteery added the stream Issues and PRs related to the stream subsystem. label Oct 6, 2021
@git-srinivas git-srinivas force-pushed the body-mixin-text-to-return-usvstring branch from 0995716 to 18827d1 Compare October 7, 2021 02:57
Copy link
Member

@lpinca lpinca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Run with --expose-internals flag

const assert = require('assert');
const { toUSVString } = require('internal/util');

const decoder = new TextDecoder();
const chunk = Buffer.from([0x66, 0x6f, 0x6f, 0xed, 0xa0, 0x80]); // foo + U+D800
const str = decoder.decode(chunk);

assert.strictEqual(toUSVString(str), 'foo\ufffd');

TextDecoder() already replaces each byte of the surrogate code point with U+FFFD.

@lpinca
Copy link
Member

lpinca commented Oct 7, 2021

Hmm both Chrome and Safari work like this PR:

new Blob([new Uint8Array([0x66, 0x6f, 0x6f, 0xed, 0xa0, 0x80])]).text()
  .then((str) => str === 'foo\ufffd\ufffd\ufffd')
  .then(console.log);

but this is not consistent with

const { toUSVString } = require('internal/util');

toUSVString('foo\ud800'); // returns 'foo\ufffd';

@git-srinivas
Copy link
Contributor Author

// Run with --expose-internals flag

const assert = require('assert');
const { toUSVString } = require('internal/util');

const decoder = new TextDecoder();
const chunk = Buffer.from([0x66, 0x6f, 0x6f, 0xed, 0xa0, 0x80]); // foo + U+D800
const str = decoder.decode(chunk);

assert.strictEqual(toUSVString(str), 'foo\ufffd');

TextDecoder() already replaces each byte of the surrogate code point with U+FFFD.

@lpinca Does it mean the issue 39804 is not valid?

@ronag
Copy link
Member

ronag commented Oct 8, 2021

```js
// Run with --expose-internals flag

const assert = require('assert');
const { toUSVString } = require('internal/util');

const decoder = new TextDecoder();
const chunk = Buffer.from([0x66, 0x6f, 0x6f, 0xed, 0xa0, 0x80]); // foo + U+D800
const str = decoder.decode(chunk);

assert.strictEqual(toUSVString(str), 'foo\ufffd');

TextDecoder() already replaces each byte of the surrogate code point with U+FFFD.

@lpinca Does it mean the issue 39804 is not valid?

Does your tests pass without the fix?

@git-srinivas
Copy link
Contributor Author

yes. I ran test cases with --expose-internals flag. All of my test cases are passing without the fix.

@ronag
Copy link
Member

ronag commented Oct 11, 2021

Then I don't think this needs fixing. I would propose you remove the changes but keep the tests. We should still merge the tests.

@git-srinivas
Copy link
Contributor Author

// Run with --expose-internals flag

const assert = require('assert');
const { toUSVString } = require('internal/util');

const decoder = new TextDecoder();
const chunk = Buffer.from([0x66, 0x6f, 0x6f, 0xed, 0xa0, 0x80]); // foo + U+D800
const str = decoder.decode(chunk);

assert.strictEqual(toUSVString(str), 'foo\ufffd');

TextDecoder() already replaces each byte of the surrogate code point with U+FFFD.

@lpinca @ronag
I am trying to understand this example to write some good testcases.
May i know how surrogate U+D800 is encoded as [0xed, 0xa0, 0x80]?

I tried with Buffer.from('\ud800','utf8') and what i get is <Buffer ef bf bd> .

@lpinca
Copy link
Member

lpinca commented Oct 12, 2021

All surrogates are 3 bytes and in this range:

byte 1 = ED
byte 2 = [A0 - BF]
byte 3 = [80 - BF]

I tried with Buffer.from('\ud800','utf8') and what i get is <Buffer ef bf bd> .

That's because Buffer.from() replaces invalid UTF-8 byte sequences with U+FFFD.

@git-srinivas git-srinivas force-pushed the body-mixin-text-to-return-usvstring branch from 18827d1 to 5fa492a Compare October 13, 2021 15:08
git-srinivas added a commit to git-srinivas/node that referenced this pull request Oct 13, 2021
Since the readable web streams  already returns
USVString
The changes made to blob.js and consumers.js  are reverted
Test cases are added to check  surrogate to USVString conversion for
ReadablewebStreams Textdecoder and Blob

PR-URL: nodejs#40351
Fixes: nodejs#39804
git-srinivas added a commit to git-srinivas/node that referenced this pull request Oct 13, 2021
Comment on lines 76 to 88
{
const passthrough = new PassThrough();

text(passthrough).then(common.mustCall(async (str) => {
assert.strictEqual(str.length, 11);
assert.deepStrictEqual(str, 'hellothere\ufffd');
}));

passthrough.write('hello');
setTimeout(() => passthrough.end('there\ud801'), 10);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not make much sense because passthrough.write() and passthrough.end() will call Buffer.from() when the chunk is written. If anything, chunks should be Buffers or Uint8Arrays.

Comment on lines 203 to 201
{
const decoder = new TextDecoder();
const chunk = Buffer.from('\ud807');
const str = decoder.decode(chunk);
assert.strictEqual(str, '\ufffd');
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the one above this does not make much sense to me. When decoder.decode(chunk) is called, chunk is already U+FFFD (valid UTF-8).

@git-srinivas git-srinivas force-pushed the body-mixin-text-to-return-usvstring branch from 4fb8c8a to e5857ba Compare October 21, 2021 19:45
@lpinca
Copy link
Member

lpinca commented Oct 22, 2021

@git-srinivas if you can please

  1. Update PR title to update the actual scope of the PR
  2. Squash commits and use a proper commit title and body as per point 1.

@git-srinivas
Copy link
Contributor Author

Sure @lpinca I'll do the changes

@git-srinivas git-srinivas force-pushed the body-mixin-text-to-return-usvstring branch 2 times, most recently from 22de91f to 50b943b Compare October 22, 2021 12:16
@git-srinivas git-srinivas changed the title lib: body mixin text to return usvstring test: ensure incomplete utf-8 byte sequence is converted to usv string Oct 22, 2021
@lpinca
Copy link
Member

lpinca commented Oct 22, 2021

@git-srinivas can I suggest something like this?

test: add tests for invalid UTF-8

Verify that `Blob.prototype.text()`, `streamConsumers.text()` and
`TextDecoder.prototype.decode()` work as expected with invalid UTF-8.

Fixes: https://github.com/nodejs/node/issues/39804

I find the current commit message a bit misleading becausetoUSVString() works differently as per #40351 (comment).

Thank you.

Verify that `Blob.prototype.text()`, `streamConsumers.text()` and
`TextDecoder.prototype.decode()` work as expected with invalid UTF-8.

Fixes: nodejs#39804
@git-srinivas git-srinivas force-pushed the body-mixin-text-to-return-usvstring branch from 50b943b to 75c69af Compare October 23, 2021 02:44
@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

lpinca pushed a commit that referenced this pull request Nov 15, 2021
Verify that `Blob.prototype.text()`, `streamConsumers.text()` and
`TextDecoder.prototype.decode()` work as expected with invalid UTF-8.

PR-URL: #40351
Fixes: #39804
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Robert Nagy <ronagy@icloud.com>
@lpinca
Copy link
Member

lpinca commented Nov 15, 2021

Landed in dc35aef.

@lpinca lpinca closed this Nov 15, 2021
lpinca pushed a commit that referenced this pull request Nov 15, 2021
Verify that `Blob.prototype.text()`, `streamConsumers.text()` and
`TextDecoder.prototype.decode()` work as expected with invalid UTF-8.

PR-URL: #40351
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Robert Nagy <ronagy@icloud.com>
mhdawson pushed a commit to mhdawson/io.js that referenced this pull request Nov 15, 2021
Verify that `Blob.prototype.text()`, `streamConsumers.text()` and
`TextDecoder.prototype.decode()` work as expected with invalid UTF-8.

PR-URL: nodejs#40351
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Robert Nagy <ronagy@icloud.com>
targos pushed a commit that referenced this pull request Nov 21, 2021
Verify that `Blob.prototype.text()`, `streamConsumers.text()` and
`TextDecoder.prototype.decode()` work as expected with invalid UTF-8.

PR-URL: #40351
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Robert Nagy <ronagy@icloud.com>
danielleadams pushed a commit that referenced this pull request Jan 30, 2022
Verify that `Blob.prototype.text()`, `streamConsumers.text()` and
`TextDecoder.prototype.decode()` work as expected with invalid UTF-8.

PR-URL: #40351
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Robert Nagy <ronagy@icloud.com>
danielleadams pushed a commit that referenced this pull request Feb 1, 2022
Verify that `Blob.prototype.text()`, `streamConsumers.text()` and
`TextDecoder.prototype.decode()` work as expected with invalid UTF-8.

PR-URL: #40351
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Robert Nagy <ronagy@icloud.com>
@danielleadams danielleadams mentioned this pull request Feb 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. buffer Issues and PRs related to the buffer subsystem. encoding Issues and PRs related to the TextEncoder and TextDecoder APIs. stream Issues and PRs related to the stream subsystem. test Issues and PRs related to the tests.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Body Mixin text() is supposed to return USVString
7 participants