Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text decoding performance abysmally slow. #3163

Closed
qwerasd205 opened this issue Oct 20, 2019 · 19 comments · Fixed by #3204
Closed

Text decoding performance abysmally slow. #3163

qwerasd205 opened this issue Oct 20, 2019 · 19 comments · Fixed by #3204
Labels
perf performance related

Comments

@qwerasd205
Copy link
Contributor

qwerasd205 commented Oct 20, 2019

This issue was originally focused on file reading performance but it was brought to my attention that the issue is in fact with the text decoding, and that the file read operation does not take very long. This issue does still directly impact reading large files as strings, though.

The following paragraph is from the original issue, and still holds true to an extent. I have updated the rest of the issue to talk about the text decoding, though.

The file reading performance of Deno is, well, abysmal. On my machine it takes 9.25 seconds to read a 25mb file using deno_std's fs readFileStrSync function, and additionally, while it's reading the file it pins CPU usage at 100%+. For comparison, perl5 reads the same file in 0.025 seconds.

I have updated the Repro code below to be a test of the text decoder speed. When you run the code, note the 100%+ CPU usage.

Repro code:
Deno:

'use strict';

const decoder = new TextDecoder('utf-8');
const data = new Uint8Array(new Array(25e6).fill(65)); //25 mb of ASCII capital letter 'A";
console.log('Starting decode...');
const start = performance.now();
decoder.decode(data);
console.log('Decode finished in ' + ((performance.now() - start)/1000).toFixed(3) + 's');

Example output:
Decode finished in 6.134s

There must be something wrong somewhere with the file reading text decoding for it to take so long (and to pin the CPU usage at 100%+ while it's at it). Until this is fixed I can't really continue working on my project in Deno which involves saving and loading significant amounts of data, because as it is, with ~30mb of data between two files it's taking 10+ seconds to load.

% deno -v
deno: 0.21.0
v8: 7.9.304
typescript: 3.6.3

macOS Catalina 10.15 Beta

@bartlomieju
Copy link
Member

Just a quick note: reading operation is actually very fast (0.047s on my machine). It's decoding step that takes so long.

@qwerasd205
Copy link
Contributor Author

Just a quick note: reading operation is actually very fast (0.047s on my machine). It's decoding step that takes so long.

This is a very good note. I did not think to check that and just assumed it was the read operation, but I suppose that doesn't make much sense.

@qwerasd205
Copy link
Contributor Author

It remains that reading a file to a string is horribly inefficient and CPU heavy, due to the decoding step. Should I update the issue to be about the decoding?

@bartlomieju
Copy link
Member

Yes please

@qwerasd205 qwerasd205 changed the title File reading performance abysmally slow. Text decoding performance abysmally slow. Oct 20, 2019
@qwerasd205
Copy link
Contributor Author

qwerasd205 commented Oct 20, 2019

I think perhaps the TextDecoder code should be moved to Rust, as I have a feeling it's the fact that it's in JS that's slowing it down. It's an operation that needs to be high performance (for purposes of ingesting large files, and likely more). An alternative to that (though that is still preferable) would be to provide a readFileStr[Sync] function in the Deno namespace, to bypass having to decode from within the JS.

I'm not sure there's any high performance way to implement the TextDecoder in JS.

@kitsonk
Copy link
Contributor

kitsonk commented Oct 20, 2019

Hmmm... it is going to be tricky to do this fast given our current op model. There is the zero copy buffer, but Rust doesn't generate JavaScript strings, which is what we need to end up with. The way this is done in the likes of Chromium is injecting the decode function into the isolate, which then utilises native C++ to perform it.

The other option is to author it in Rust, transpile it to WASM and utilise that in the runtime.

It would be good to add this to the benchmarks, because obviously string encoding and decoding is pretty important.

Also, should do more research into other JavaScript implementations that might be more performance tuned. We maybe doing something silly.

@bartlomieju
Copy link
Member

Also, should do more research into other JavaScript implementations that might be more performance tuned. We maybe doing something silly.

I believe that is the case, we've got GC all over the place:

deno -A --v8-flags=--trace-gc,--trace-gc-verbose a.ts
... 

[19928:0x7fa5ed808000]     9248 ms: Scavenge 737.7 (755.0) -> 735.7 (769.0) MB, 35.9 / 0.0 ms  (average mu = 0.728, current mu = 0.728) allocation failure
[19928:0x7fa5ed808000] Memory allocator,       used: 787424 KB, available: 695328 KB
[19928:0x7fa5ed808000] Read-only space,        used:     28 KB, available:      0 KB, committed:    256 KB
[19928:0x7fa5ed808000] New space,              used:     31 KB, available:  16334 KB, committed:  32768 KB
[19928:0x7fa5ed808000] New large object space, used:      0 KB, available:  16366 KB, committed:      0 KB
[19928:0x7fa5ed808000] Old space,              used: 491425 KB, available:      9 KB, committed: 491976 KB
[19928:0x7fa5ed808000] Code space,             used:    148 KB, available:      0 KB, committed:    412 KB
[19928:0x7fa5ed808000] Map space,              used:     56 KB, available:      0 KB, committed:    260 KB
[19928:0x7fa5ed808000] Large object space,     used: 261701 KB, available:      0 KB, committed: 261704 KB
[19928:0x7fa5ed808000] Code large object space,     used:      2 KB, available:      0 KB, committed:     48 KB
[19928:0x7fa5ed808000] All spaces,             used: 753394 KB, available: 728037 KB, committed: 787424 KB
[19928:0x7fa5ed808000] Unmapper buffering 0 chunks of committed:      0 KB
[19928:0x7fa5ed808000] External memory reported:  24415 KB
[19928:0x7fa5ed808000] Backing store memory:  24415 KB
[19928:0x7fa5ed808000] External memory global 0 KB
[19928:0x7fa5ed808000] Total time spent in GC  : 3161.7 ms

....

[19928:0x7fa5ed808000] Fast promotion mode: false survival rate: 87%
[19928:0x7fa5ed808000]    10131 ms: Scavenge 929.4 (947.0) -> 927.5 (961.0) MB, 49.3 / 0.0 ms  (average mu = 0.728, current mu = 0.728) allocation failure
[19928:0x7fa5ed808000] Memory allocator,       used: 984032 KB, available: 498720 KB
[19928:0x7fa5ed808000] Read-only space,        used:     28 KB, available:      0 KB, committed:    256 KB
[19928:0x7fa5ed808000] New space,              used:     31 KB, available:  16334 KB, committed:  32768 KB
[19928:0x7fa5ed808000] New large object space, used:      0 KB, available:  16366 KB, committed:      0 KB
[19928:0x7fa5ed808000] Old space,              used: 687817 KB, available:      9 KB, committed: 688584 KB
[19928:0x7fa5ed808000] Code space,             used:    148 KB, available:      0 KB, committed:    412 KB
[19928:0x7fa5ed808000] Map space,              used:     56 KB, available:      0 KB, committed:    260 KB
[19928:0x7fa5ed808000] Large object space,     used: 261701 KB, available:      0 KB, committed: 261704 KB
[19928:0x7fa5ed808000] Code large object space,     used:      2 KB, available:      0 KB, committed:     48 KB
[19928:0x7fa5ed808000] All spaces,             used: 949786 KB, available: 531429 KB, committed: 984032 KB
[19928:0x7fa5ed808000] Unmapper buffering 0 chunks of committed:      0 KB
[19928:0x7fa5ed808000] External memory reported:  24415 KB
[19928:0x7fa5ed808000] Backing store memory:  24415 KB
[19928:0x7fa5ed808000] External memory global 0 KB
[19928:0x7fa5ed808000] Total time spent in GC  : 3973.4 ms
[19928:0x7fa5ed808000] Fast promotion mode: false survival rate: 100%
[19928:0x7fa5ed808000]    10172 ms: Scavenge 943.4 (961.0) -> 945.4 (963.0) MB, 36.2 / 0.0 ms  (average mu = 0.728, current mu = 0.728) allocation failure
[19928:0x7fa5ed808000] Memory allocator,       used: 986080 KB, available: 496672 KB
[19928:0x7fa5ed808000] Read-only space,        used:     28 KB, available:      0 KB, committed:    256 KB
[19928:0x7fa5ed808000] New space,              used:  16334 KB, available:     31 KB, committed:  32768 KB
[19928:0x7fa5ed808000] New large object space, used:      0 KB, available:  16366 KB, committed:      0 KB
[19928:0x7fa5ed808000] Old space,              used: 689847 KB, available:     24 KB, committed: 690632 KB
[19928:0x7fa5ed808000] Code space,             used:    148 KB, available:      0 KB, committed:    412 KB
[19928:0x7fa5ed808000] Map space,              used:     56 KB, available:      0 KB, committed:    260 KB
[19928:0x7fa5ed808000] Large object space,     used: 261701 KB, available:      0 KB, committed: 261704 KB
[19928:0x7fa5ed808000] Code large object space,     used:      2 KB, available:      0 KB, committed:     48 KB
[19928:0x7fa5ed808000] All spaces,             used: 968119 KB, available: 513094 KB, committed: 986080 KB
[19928:0x7fa5ed808000] Unmapper buffering 0 chunks of committed:      0 KB
[19928:0x7fa5ed808000] External memory reported:  24415 KB
[19928:0x7fa5ed808000] Backing store memory:  24415 KB
[19928:0x7fa5ed808000] External memory global 0 KB
[19928:0x7fa5ed808000] Total time spent in GC  : 4009.6 ms
[19928:0x7fa5ed808000] Fast promotion mode: false survival rate: 87%
[19928:0x7fa5ed808000]    10213 ms: Scavenge 945.4 (963.0) -> 943.5 (977.0) MB, 40.4 / 0.0 ms  (average mu = 0.728, current mu = 0.728) allocation failure
[19928:0x7fa5ed808000] Memory allocator,       used: 1000416 KB, available: 482336 KB
[19928:0x7fa5ed808000] Read-only space,        used:     28 KB, available:      0 KB, committed:    256 KB
[19928:0x7fa5ed808000] New space,              used:     31 KB, available:  16334 KB, committed:  32768 KB
[19928:0x7fa5ed808000] New large object space, used:      0 KB, available:  16366 KB, committed:      0 KB
[19928:0x7fa5ed808000] Old space,              used: 704183 KB, available:      9 KB, committed: 704968 KB
[19928:0x7fa5ed808000] Code space,             used:    148 KB, available:      0 KB, committed:    412 KB
[19928:0x7fa5ed808000] Map space,              used:     56 KB, available:      0 KB, committed:    260 KB
[19928:0x7fa5ed808000] Large object space,     used: 261701 KB, available:      0 KB, committed: 261704 KB
[19928:0x7fa5ed808000] Code large object space,     used:      2 KB, available:      0 KB, committed:     48 KB
[19928:0x7fa5ed808000] All spaces,             used: 966152 KB, available: 515045 KB, committed: 1000416 KB
[19928:0x7fa5ed808000] Unmapper buffering 0 chunks of committed:      0 KB
[19928:0x7fa5ed808000] External memory reported:  24415 KB
[19928:0x7fa5ed808000] Backing store memory:  24415 KB
[19928:0x7fa5ed808000] External memory global 0 KB

700MB of memory to decode 25MB suggest there's a problem in our decoder implementation.

@ry
Copy link
Member

ry commented Oct 21, 2019

If we could get a benchmark on the benchmarks page, I will happily invest the time to make it faster / use less memory.

It's not at all obvious that doing it in rust would be faster. Boundary crossing is a significant cost and V8 optimizes code very well.

700MB of memory to decode 25MB suggest there's a problem in our decoder implementation.

Indeed. Maybe it's a just a small fix to the JS.

@ry ry added the perf performance related label Oct 21, 2019
@qwerasd205
Copy link
Contributor Author

It's not at all obvious that doing it in rust would be faster. Boundary crossing is a significant cost and V8 optimizes code very well.

Yes, on second thought that's probably right.

I did some research on high performance UTF-8 decoding and looked at some algorithm options and will be working on creating a high performance UTF-8 decoder in JavaScript soon. I will make a comment on this issue with a link to it when I'm satisfied with it, where hopefully someone can implement it in Deno. (I would do it myself and make a PR, but I really can't stand using git more than I have to.)

@kitsonk
Copy link
Contributor

kitsonk commented Oct 21, 2019

@qwerasd205 feel free to contribute however you wish, would be glad to get it to fit into Deno.

@qwerasd205
Copy link
Contributor Author

I've created a UTF-8 decode function in JavaScript that does 80-130 MB/s on my machine, which I think would be more than sufficient. The function can be found at this gist. Can't wait to see it in Deno!

@ry
Copy link
Member

ry commented Oct 22, 2019

@qwerasd205 Please supply a benchmark program and I will do the work of getting it in for you.

@qwerasd205
Copy link
Contributor Author

@ry A program that benchmarks the TextDecoder decode function? What should the format of the output be?

@ry
Copy link
Member

ry commented Oct 22, 2019

No output - I just need it to do something that's currently expensive.

@qwerasd205
Copy link
Contributor Author

Okay, can do.

@kitsonk
Copy link
Contributor

kitsonk commented Oct 22, 2019

@ry feels like we should also invert it to... keep track of TextEncoder performance as well.

@qwerasd205
Copy link
Contributor Author

@ry here's a script which randomly generates 10MB of utf-8 data and feeds it to the text decoder: https://gist.github.com/qwerasd205/a7616010915180b0fde08a52454e4fe4

@qwerasd205
Copy link
Contributor Author

It takes ~2.8 seconds for TextDecoder, and ~0.16 seconds for my decode function.

@qwerasd205
Copy link
Contributor Author

qwerasd205 commented Oct 22, 2019

Actually I just noticed my generate random utf-8 function is flawed, I copied it from somewhere without thinking, but it gets the point across.
You could replace that function with a read of a large-ish file, would be better, honestly.

kitsonk added a commit to kitsonk/deno that referenced this issue Oct 26, 2019
Fixes denoland#3163

Co-authored-by: Kitson Kelly <me@kitsonkelly.com>
Co-authored-by: Qwerasd <qwerasd205@users.noreply.github.com>
@ry ry closed this as completed in #3204 Oct 26, 2019
ry pushed a commit that referenced this issue Oct 26, 2019
Fixes #3163

Co-authored-by: Kitson Kelly <me@kitsonkelly.com>
Co-authored-by: Qwerasd <qwerasd205@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf performance related
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants