TextDecoder().decode() pollyfill performance issue #316

samuelbrian · 2022-10-19T05:06:09Z

Since changing from the text-encoding pollyfill to text-encoding-utf-8 (in #311), the performance of text decoding has dropped significantly.

In my testing, it was taking over 3 seconds to read a 175kb file and decode it into a JS string. It does depends on the size of the chunk you're giving to the text decoder.

Part of the problem is that the decoding routine in text-encoding-utf-8 examines each byte in the input buffer, it shifts the byte from the start (see the Stream class here) .
The original text-encoding does this better (see the Stream class here) by reversing the input buffer and operating on the end of the buffer rather than the start.

I've applied the reversing changes to text-encoding-utf-8 (in samuelbrian@99b895c) and it's seems on-par with the original version. But I'm not sure the best way to manage patching the node_modules dependencies, short of forking and maintaining a copy of text-encoding-utf-8.

Even better is using text encoding and decoding from QuickJS which I have done here samuelbrian@f308a6d, and that makes a file read that was taking >3 seconds into a <1 millisecond operation.
I've exposed these as tjs functions rather than implement them as TextEncoder/TextDecoder and remove the pollyfill, mainly because I didn't want to have to deal with the streaming option. But it might be viable to do.

The text was updated successfully, but these errors were encountered:

saghul · 2022-10-19T06:42:39Z

Good analysis, thanks!

How about this: implement our own TextDecoder polyfill using the C functions you created under the hood.

Did the original one support streaming? I didn't think it did...

samuelbrian · 2022-10-19T22:59:39Z

The original and UTF-8-only pollyfills do support streaming. I'm relying on it right now to decode while reading single bytes from raw mode stdin.

I think to correctly handle streaming, ignoreBOM, and fatal, it'd have to go deeper and use QuickJS's unicode_to_utf8() and unicode_from_utf8().

saghul · 2022-10-20T05:50:12Z

Oh I see.

So another approach is to go the patch-package route. We can patch a Node module at install time (see the patches directory) and either fix the current dep or revert back and slash other encodings?

samuelbrian · 2022-10-26T01:17:15Z

I never noticed the patches directory. That's perfect.

I did start working on implementing the TextDecoder API in C, only using JS_NewStringLen, and testing against test cases I found in the Firefox source tree.
Streaming and BOM were fine, but handling decoding errors did not work out well.
Fatal was a hack because I was searching for the � in the output string and throwing if it was found, but that would also be triggered if the � was in the input data.
And there were problems handling invalid UTF-8 when it is chunked across streamed decodes.

To do it right, I think I'd need to expose unicode_from_utf8() and also add something like a JS_NewStringUnicode that'd let me create a string that is already decoded.

The UTF-8-only encoder/decoder was performing poorly due to inefficient array manipulation. Applied changes from the original text-encoding package that reduce the amount of copying. Relates to saghul#316

The UTF-8-only encoder/decoder was performing poorly due to inefficient array manipulation. Applied changes from the original text-encoding package that reduce the amount of copying. Relates to #316

saghul · 2022-12-08T08:23:54Z

Fixed in #321

saghul closed this as completed Feb 2, 2023

saghul mentioned this issue Dec 21, 2023

TextDecoding pretty slow #447

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TextDecoder().decode() pollyfill performance issue #316

TextDecoder().decode() pollyfill performance issue #316

samuelbrian commented Oct 19, 2022 •

edited

Loading

saghul commented Oct 19, 2022

samuelbrian commented Oct 19, 2022

saghul commented Oct 20, 2022

samuelbrian commented Oct 26, 2022

saghul commented Dec 8, 2022

TextDecoder().decode() pollyfill performance issue #316

TextDecoder().decode() pollyfill performance issue #316

Comments

samuelbrian commented Oct 19, 2022 • edited Loading

saghul commented Oct 19, 2022

samuelbrian commented Oct 19, 2022

saghul commented Oct 20, 2022

samuelbrian commented Oct 26, 2022

saghul commented Dec 8, 2022

samuelbrian commented Oct 19, 2022 •

edited

Loading