Text decoding performance abysmally slow. #3163

qwerasd205 · 2019-10-20T10:36:07Z

This issue was originally focused on file reading performance but it was brought to my attention that the issue is in fact with the text decoding, and that the file read operation does not take very long. This issue does still directly impact reading large files as strings, though.

The following paragraph is from the original issue, and still holds true to an extent. I have updated the rest of the issue to talk about the text decoding, though.

The file reading performance of Deno is, well, abysmal. On my machine it takes 9.25 seconds to read a 25mb file using deno_std's fs readFileStrSync function, and additionally, while it's reading the file it pins CPU usage at 100%+. For comparison, perl5 reads the same file in 0.025 seconds.

I have updated the Repro code below to be a test of the text decoder speed. When you run the code, note the 100%+ CPU usage.

Repro code:
Deno:

'use strict';

const decoder = new TextDecoder('utf-8');
const data = new Uint8Array(new Array(25e6).fill(65)); //25 mb of ASCII capital letter 'A";
console.log('Starting decode...');
const start = performance.now();
decoder.decode(data);
console.log('Decode finished in ' + ((performance.now() - start)/1000).toFixed(3) + 's');

Example output:
Decode finished in 6.134s

There must be something wrong somewhere with the ~~file reading~~ text decoding for it to take so long (and to pin the CPU usage at 100%+ while it's at it). Until this is fixed I can't really continue working on my project in Deno which involves saving and loading significant amounts of data, because as it is, with ~30mb of data between two files it's taking 10+ seconds to load.

% deno -v
deno: 0.21.0
v8: 7.9.304
typescript: 3.6.3

macOS Catalina 10.15 Beta

The text was updated successfully, but these errors were encountered:

bartlomieju · 2019-10-20T12:16:08Z

Just a quick note: reading operation is actually very fast (0.047s on my machine). It's decoding step that takes so long.

qwerasd205 · 2019-10-20T17:06:24Z

Just a quick note: reading operation is actually very fast (0.047s on my machine). It's decoding step that takes so long.

This is a very good note. I did not think to check that and just assumed it was the read operation, but I suppose that doesn't make much sense.

qwerasd205 · 2019-10-20T17:09:09Z

It remains that reading a file to a string is horribly inefficient and CPU heavy, due to the decoding step. Should I update the issue to be about the decoding?

bartlomieju · 2019-10-20T17:28:03Z

Yes please

qwerasd205 · 2019-10-20T19:08:27Z

I think perhaps the TextDecoder code should be moved to Rust, as I have a feeling it's the fact that it's in JS that's slowing it down. It's an operation that needs to be high performance (for purposes of ingesting large files, and likely more). An alternative to that (though that is still preferable) would be to provide a readFileStr[Sync] function in the Deno namespace, to bypass having to decode from within the JS.

I'm not sure there's any high performance way to implement the TextDecoder in JS.

kitsonk · 2019-10-20T21:55:00Z

Hmmm... it is going to be tricky to do this fast given our current op model. There is the zero copy buffer, but Rust doesn't generate JavaScript strings, which is what we need to end up with. The way this is done in the likes of Chromium is injecting the decode function into the isolate, which then utilises native C++ to perform it.

The other option is to author it in Rust, transpile it to WASM and utilise that in the runtime.

It would be good to add this to the benchmarks, because obviously string encoding and decoding is pretty important.

Also, should do more research into other JavaScript implementations that might be more performance tuned. We maybe doing something silly.

bartlomieju · 2019-10-21T12:30:26Z

Also, should do more research into other JavaScript implementations that might be more performance tuned. We maybe doing something silly.

I believe that is the case, we've got GC all over the place:

deno -A --v8-flags=--trace-gc,--trace-gc-verbose a.ts

... 

[19928:0x7fa5ed808000]     9248 ms: Scavenge 737.7 (755.0) -> 735.7 (769.0) MB, 35.9 / 0.0 ms  (average mu = 0.728, current mu = 0.728) allocation failure
[19928:0x7fa5ed808000] Memory allocator,       used: 787424 KB, available: 695328 KB
[19928:0x7fa5ed808000] Read-only space,        used:     28 KB, available:      0 KB, committed:    256 KB
[19928:0x7fa5ed808000] New space,              used:     31 KB, available:  16334 KB, committed:  32768 KB
[19928:0x7fa5ed808000] New large object space, used:      0 KB, available:  16366 KB, committed:      0 KB
[19928:0x7fa5ed808000] Old space,              used: 491425 KB, available:      9 KB, committed: 491976 KB
[19928:0x7fa5ed808000] Code space,             used:    148 KB, available:      0 KB, committed:    412 KB
[19928:0x7fa5ed808000] Map space,              used:     56 KB, available:      0 KB, committed:    260 KB
[19928:0x7fa5ed808000] Large object space,     used: 261701 KB, available:      0 KB, committed: 261704 KB
[19928:0x7fa5ed808000] Code large object space,     used:      2 KB, available:      0 KB, committed:     48 KB
[19928:0x7fa5ed808000] All spaces,             used: 753394 KB, available: 728037 KB, committed: 787424 KB
[19928:0x7fa5ed808000] Unmapper buffering 0 chunks of committed:      0 KB
[19928:0x7fa5ed808000] External memory reported:  24415 KB
[19928:0x7fa5ed808000] Backing store memory:  24415 KB
[19928:0x7fa5ed808000] External memory global 0 KB
[19928:0x7fa5ed808000] Total time spent in GC  : 3161.7 ms

....

[19928:0x7fa5ed808000] Fast promotion mode: false survival rate: 87%
[19928:0x7fa5ed808000]    10131 ms: Scavenge 929.4 (947.0) -> 927.5 (961.0) MB, 49.3 / 0.0 ms  (average mu = 0.728, current mu = 0.728) allocation failure
[19928:0x7fa5ed808000] Memory allocator,       used: 984032 KB, available: 498720 KB
[19928:0x7fa5ed808000] Read-only space,        used:     28 KB, available:      0 KB, committed:    256 KB
[19928:0x7fa5ed808000] New space,              used:     31 KB, available:  16334 KB, committed:  32768 KB
[19928:0x7fa5ed808000] New large object space, used:      0 KB, available:  16366 KB, committed:      0 KB
[19928:0x7fa5ed808000] Old space,              used: 687817 KB, available:      9 KB, committed: 688584 KB
[19928:0x7fa5ed808000] Code space,             used:    148 KB, available:      0 KB, committed:    412 KB
[19928:0x7fa5ed808000] Map space,              used:     56 KB, available:      0 KB, committed:    260 KB
[19928:0x7fa5ed808000] Large object space,     used: 261701 KB, available:      0 KB, committed: 261704 KB
[19928:0x7fa5ed808000] Code large object space,     used:      2 KB, available:      0 KB, committed:     48 KB
[19928:0x7fa5ed808000] All spaces,             used: 949786 KB, available: 531429 KB, committed: 984032 KB
[19928:0x7fa5ed808000] Unmapper buffering 0 chunks of committed:      0 KB
[19928:0x7fa5ed808000] External memory reported:  24415 KB
[19928:0x7fa5ed808000] Backing store memory:  24415 KB
[19928:0x7fa5ed808000] External memory global 0 KB
[19928:0x7fa5ed808000] Total time spent in GC  : 3973.4 ms
[19928:0x7fa5ed808000] Fast promotion mode: false survival rate: 100%
[19928:0x7fa5ed808000]    10172 ms: Scavenge 943.4 (961.0) -> 945.4 (963.0) MB, 36.2 / 0.0 ms  (average mu = 0.728, current mu = 0.728) allocation failure
[19928:0x7fa5ed808000] Memory allocator,       used: 986080 KB, available: 496672 KB
[19928:0x7fa5ed808000] Read-only space,        used:     28 KB, available:      0 KB, committed:    256 KB
[19928:0x7fa5ed808000] New space,              used:  16334 KB, available:     31 KB, committed:  32768 KB
[19928:0x7fa5ed808000] New large object space, used:      0 KB, available:  16366 KB, committed:      0 KB
[19928:0x7fa5ed808000] Old space,              used: 689847 KB, available:     24 KB, committed: 690632 KB
[19928:0x7fa5ed808000] Code space,             used:    148 KB, available:      0 KB, committed:    412 KB
[19928:0x7fa5ed808000] Map space,              used:     56 KB, available:      0 KB, committed:    260 KB
[19928:0x7fa5ed808000] Large object space,     used: 261701 KB, available:      0 KB, committed: 261704 KB
[19928:0x7fa5ed808000] Code large object space,     used:      2 KB, available:      0 KB, committed:     48 KB
[19928:0x7fa5ed808000] All spaces,             used: 968119 KB, available: 513094 KB, committed: 986080 KB
[19928:0x7fa5ed808000] Unmapper buffering 0 chunks of committed:      0 KB
[19928:0x7fa5ed808000] External memory reported:  24415 KB
[19928:0x7fa5ed808000] Backing store memory:  24415 KB
[19928:0x7fa5ed808000] External memory global 0 KB
[19928:0x7fa5ed808000] Total time spent in GC  : 4009.6 ms
[19928:0x7fa5ed808000] Fast promotion mode: false survival rate: 87%
[19928:0x7fa5ed808000]    10213 ms: Scavenge 945.4 (963.0) -> 943.5 (977.0) MB, 40.4 / 0.0 ms  (average mu = 0.728, current mu = 0.728) allocation failure
[19928:0x7fa5ed808000] Memory allocator,       used: 1000416 KB, available: 482336 KB
[19928:0x7fa5ed808000] Read-only space,        used:     28 KB, available:      0 KB, committed:    256 KB
[19928:0x7fa5ed808000] New space,              used:     31 KB, available:  16334 KB, committed:  32768 KB
[19928:0x7fa5ed808000] New large object space, used:      0 KB, available:  16366 KB, committed:      0 KB
[19928:0x7fa5ed808000] Old space,              used: 704183 KB, available:      9 KB, committed: 704968 KB
[19928:0x7fa5ed808000] Code space,             used:    148 KB, available:      0 KB, committed:    412 KB
[19928:0x7fa5ed808000] Map space,              used:     56 KB, available:      0 KB, committed:    260 KB
[19928:0x7fa5ed808000] Large object space,     used: 261701 KB, available:      0 KB, committed: 261704 KB
[19928:0x7fa5ed808000] Code large object space,     used:      2 KB, available:      0 KB, committed:     48 KB
[19928:0x7fa5ed808000] All spaces,             used: 966152 KB, available: 515045 KB, committed: 1000416 KB
[19928:0x7fa5ed808000] Unmapper buffering 0 chunks of committed:      0 KB
[19928:0x7fa5ed808000] External memory reported:  24415 KB
[19928:0x7fa5ed808000] Backing store memory:  24415 KB
[19928:0x7fa5ed808000] External memory global 0 KB

700MB of memory to decode 25MB suggest there's a problem in our decoder implementation.

ry · 2019-10-21T14:30:24Z

If we could get a benchmark on the benchmarks page, I will happily invest the time to make it faster / use less memory.

It's not at all obvious that doing it in rust would be faster. Boundary crossing is a significant cost and V8 optimizes code very well.

700MB of memory to decode 25MB suggest there's a problem in our decoder implementation.

Indeed. Maybe it's a just a small fix to the JS.

qwerasd205 · 2019-10-21T23:27:58Z

It's not at all obvious that doing it in rust would be faster. Boundary crossing is a significant cost and V8 optimizes code very well.

Yes, on second thought that's probably right.

I did some research on high performance UTF-8 decoding and looked at some algorithm options and will be working on creating a high performance UTF-8 decoder in JavaScript soon. I will make a comment on this issue with a link to it when I'm satisfied with it, where hopefully someone can implement it in Deno. (I would do it myself and make a PR, but I really can't stand using git more than I have to.)

kitsonk · 2019-10-21T23:31:26Z

@qwerasd205 feel free to contribute however you wish, would be glad to get it to fit into Deno.

qwerasd205 · 2019-10-22T02:23:56Z

I've created a UTF-8 decode function in JavaScript that does 80-130 MB/s on my machine, which I think would be more than sufficient. The function can be found at this gist. Can't wait to see it in Deno!

ry · 2019-10-22T02:38:45Z

@qwerasd205 Please supply a benchmark program and I will do the work of getting it in for you.

qwerasd205 · 2019-10-22T02:44:02Z

@ry A program that benchmarks the TextDecoder decode function? What should the format of the output be?

ry · 2019-10-22T02:53:47Z

No output - I just need it to do something that's currently expensive.

qwerasd205 · 2019-10-22T02:53:59Z

Okay, can do.

kitsonk · 2019-10-22T03:06:29Z

@ry feels like we should also invert it to... keep track of TextEncoder performance as well.

qwerasd205 · 2019-10-22T03:11:29Z

@ry here's a script which randomly generates 10MB of utf-8 data and feeds it to the text decoder: https://gist.github.com/qwerasd205/a7616010915180b0fde08a52454e4fe4

qwerasd205 · 2019-10-22T03:12:45Z

It takes ~2.8 seconds for TextDecoder, and ~0.16 seconds for my decode function.

qwerasd205 · 2019-10-22T03:14:42Z

Actually I just noticed my generate random utf-8 function is flawed, I copied it from somewhere without thinking, but it gets the point across.
You could replace that function with a read of a large-ish file, would be better, honestly.

Fixes denoland#3163 Co-authored-by: Kitson Kelly <me@kitsonkelly.com> Co-authored-by: Qwerasd <qwerasd205@users.noreply.github.com>

Fixes #3163 Co-authored-by: Kitson Kelly <me@kitsonkelly.com> Co-authored-by: Qwerasd <qwerasd205@users.noreply.github.com>

qwerasd205 changed the title ~~File reading performance abysmally slow.~~ Text decoding performance abysmally slow. Oct 20, 2019

qwerasd205 mentioned this issue Oct 20, 2019

OOM error when uploading large file #1691

Closed

ry added the perf performance related label Oct 21, 2019

ry mentioned this issue Oct 22, 2019

Add TextDecoder benchmark #3180

Merged

kitsonk added a commit to kitsonk/deno that referenced this issue Oct 26, 2019

Use a more performant utf8 decoder algorithm.

df5a20a

Fixes denoland#3163 Co-authored-by: Kitson Kelly <me@kitsonkelly.com> Co-authored-by: Qwerasd <qwerasd205@users.noreply.github.com>

kitsonk mentioned this issue Oct 26, 2019

Use a more performant utf8 decoder algorithm. #3204

Merged

ry closed this as completed in #3204 Oct 26, 2019

ry pushed a commit that referenced this issue Oct 26, 2019

Use a more performant utf8 decoder algorithm. (#3204)

c5fe657

Fixes #3163 Co-authored-by: Kitson Kelly <me@kitsonkelly.com> Co-authored-by: Qwerasd <qwerasd205@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Text decoding performance abysmally slow. #3163

Text decoding performance abysmally slow. #3163

qwerasd205 commented Oct 20, 2019 •

edited

Loading

bartlomieju commented Oct 20, 2019

qwerasd205 commented Oct 20, 2019

qwerasd205 commented Oct 20, 2019

bartlomieju commented Oct 20, 2019

qwerasd205 commented Oct 20, 2019 •

edited

Loading

kitsonk commented Oct 20, 2019

bartlomieju commented Oct 21, 2019

ry commented Oct 21, 2019 •

edited

Loading

qwerasd205 commented Oct 21, 2019

kitsonk commented Oct 21, 2019

qwerasd205 commented Oct 22, 2019

ry commented Oct 22, 2019

qwerasd205 commented Oct 22, 2019

ry commented Oct 22, 2019

qwerasd205 commented Oct 22, 2019

kitsonk commented Oct 22, 2019

qwerasd205 commented Oct 22, 2019

qwerasd205 commented Oct 22, 2019

qwerasd205 commented Oct 22, 2019 •

edited

Loading

Text decoding performance abysmally slow. #3163

Text decoding performance abysmally slow. #3163

Comments

qwerasd205 commented Oct 20, 2019 • edited Loading

bartlomieju commented Oct 20, 2019

qwerasd205 commented Oct 20, 2019

qwerasd205 commented Oct 20, 2019

bartlomieju commented Oct 20, 2019

qwerasd205 commented Oct 20, 2019 • edited Loading

kitsonk commented Oct 20, 2019

bartlomieju commented Oct 21, 2019

ry commented Oct 21, 2019 • edited Loading

qwerasd205 commented Oct 21, 2019

kitsonk commented Oct 21, 2019

qwerasd205 commented Oct 22, 2019

ry commented Oct 22, 2019

qwerasd205 commented Oct 22, 2019

ry commented Oct 22, 2019

qwerasd205 commented Oct 22, 2019

kitsonk commented Oct 22, 2019

qwerasd205 commented Oct 22, 2019

qwerasd205 commented Oct 22, 2019

qwerasd205 commented Oct 22, 2019 • edited Loading

qwerasd205 commented Oct 20, 2019 •

edited

Loading

qwerasd205 commented Oct 20, 2019 •

edited

Loading

ry commented Oct 21, 2019 •

edited

Loading

qwerasd205 commented Oct 22, 2019 •

edited

Loading