Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Profiling deno_tcp.ts #2758

Closed
bartlomieju opened this issue Aug 9, 2019 · 1 comment
Closed

perf: Profiling deno_tcp.ts #2758

bartlomieju opened this issue Aug 9, 2019 · 1 comment

Comments

@bartlomieju
Copy link
Member

bartlomieju commented Aug 9, 2019

For the last two weeks I spent quite some time on profiling deno_tcp.ts. I can't really figure it out on my own and hit a lot of problems along the way so I'm sharing my findings to get more eyes on it.

First I generated some flamegraphs to see what's going on. I used flamegraph which is installable via cargo.

Wrk command used for benchmarking:

third_party/wrk/mac/wrk -d 10s --latency http://127.0.0.1:4555

deno core:

sudo flamegraph -o deno_tcp.svg target/debug/deno tools/deno_core_http_bench

deno_core.svg

deno_core

deno_tcp:

sudo flamegraph -o deno_tcp.svg target/debug/deno tools/deno_tcp.ts --allow-net 127.0.0.1:4544

deno_tcp.svg

deno_tcp

It seems one of the main bottlenecks is garbage collection by V8 in the bottom right hand side. It's "Scavenger" process that reclaims memory and it constitutes roughly 6% of whole graph. deno_core doesn't suffer from same problem.

Checking out GC logs from V8 confirms this suspicion.

deno core:

$ target/debug/deno_core_http_bench ---trace-gc --trace-gc-verbose
[21344:0x10bb6f000]     6829 ms: Scavenge 1.8 (7.1) -> 0.8 (7.1) MB, 0.3 / 0.0 ms  (average mu = 1.000, current mu = 1.000) allocation failure
[21344:0x10bb6f000] Memory allocator,       used:   7312 KB, available: 1475440 KB
[21344:0x10bb6f000] Read-only space,        used:   3072 KB, available:    251 KB, committed:   3328 KB
[21344:0x10bb6f000] New space,              used:      7 KB, available:   1015 KB, committed:   2048 KB
[21344:0x10bb6f000] New large object space, used:      0 KB, available:   1022 KB, committed:      0 KB
[21344:0x10bb6f000] Old space,              used:    367 KB, available:    480 KB, committed:    956 KB
[21344:0x10bb6f000] Code space,             used:    155 KB, available:      0 KB, committed:    412 KB
[21344:0x10bb6f000] Map space,              used:     30 KB, available:      0 KB, committed:    260 KB
[21344:0x10bb6f000] Large object space,     used:    256 KB, available:      0 KB, committed:    260 KB
[21344:0x10bb6f000] Code large object space,     used:      3 KB, available:      0 KB, committed:     48 KB
[21344:0x10bb6f000] All spaces,             used:   3892 KB, available: 1478209 KB, committed:   7312 KB
[21344:0x10bb6f000] Unmapper buffering 0 chunks of committed:      0 KB
[21344:0x10bb6f000] External memory reported:     65 KB
[21344:0x10bb6f000] Backing store memory:     65 KB
[21344:0x10bb6f000] External memory global 0 KB
[21344:0x10bb6f000] Total time spent in GC  : 13.6 ms
[21344:0x10bb6f000] Fast promotion mode: false survival rate: 0%
[21344:0x10bb6f000]     6855 ms: Scavenge 1.8 (7.1) -> 0.8 (7.1) MB, 0.3 / 0.0 ms  (average mu = 1.000, current mu = 1.000) allocation failure
[21344:0x10bb6f000] Memory allocator,       used:   7312 KB, available: 1475440 KB
[21344:0x10bb6f000] Read-only space,        used:   3072 KB, available:    251 KB, committed:   3328 KB
[21344:0x10bb6f000] New space,              used:      7 KB, available:   1015 KB, committed:   2048 KB
[21344:0x10bb6f000] New large object space, used:      0 KB, available:   1022 KB, committed:      0 KB
[21344:0x10bb6f000] Old space,              used:    367 KB, available:    480 KB, committed:    956 KB
[21344:0x10bb6f000] Code space,             used:    155 KB, available:      0 KB, committed:    412 KB
[21344:0x10bb6f000] Map space,              used:     30 KB, available:      0 KB, committed:    260 KB
[21344:0x10bb6f000] Large object space,     used:    256 KB, available:      0 KB, committed:    260 KB
[21344:0x10bb6f000] Code large object space,     used:      3 KB, available:      0 KB, committed:     48 KB
[21344:0x10bb6f000] All spaces,             used:   3892 KB, available: 1478209 KB, committed:   7312 KB
[21344:0x10bb6f000] Unmapper buffering 0 chunks of committed:      0 KB
[21344:0x10bb6f000] External memory reported:     65 KB
[21344:0x10bb6f000] Backing store memory:     65 KB
[21344:0x10bb6f000] External memory global 0 KB
[21344:0x10bb6f000] Total time spent in GC  : 13.9 ms
[21344:0x10bb6f000] Fast promotion mode: false survival rate: 0%
[21344:0x10bb6f000]     6882 ms: Scavenge 1.8 (7.1) -> 0.8 (7.1) MB, 0.3 / 0.0 ms  (average mu = 1.000, current mu = 1.000) allocation failure
[21344:0x10bb6f000] Memory allocator,       used:   7312 KB, available: 1475440 KB
[21344:0x10bb6f000] Read-only space,        used:   3072 KB, available:    251 KB, committed:   3328 KB
[21344:0x10bb6f000] New space,              used:      7 KB, available:   1015 KB, committed:   2048 KB
[21344:0x10bb6f000] New large object space, used:      0 KB, available:   1022 KB, committed:      0 KB
[21344:0x10bb6f000] Old space,              used:    367 KB, available:    480 KB, committed:    956 KB
[21344:0x10bb6f000] Code space,             used:    155 KB, available:      0 KB, committed:    412 KB
[21344:0x10bb6f000] Map space,              used:     30 KB, available:      0 KB, committed:    260 KB
[21344:0x10bb6f000] Large object space,     used:    256 KB, available:      0 KB, committed:    260 KB
[21344:0x10bb6f000] Code large object space,     used:      3 KB, available:      0 KB, committed:     48 KB
[21344:0x10bb6f000] All spaces,             used:   3892 KB, available: 1478209 KB, committed:   7312 KB
[21344:0x10bb6f000] Unmapper buffering 0 chunks of committed:      0 KB
[21344:0x10bb6f000] External memory reported:     65 KB
[21344:0x10bb6f000] Backing store memory:     65 KB
[21344:0x10bb6f000] External memory global 0 KB
[21344:0x10bb6f000] Total time spent in GC  : 14.2 ms
[21344:0x10bb6f000] Fast promotion mode: false survival rate: 0%
[21344:0x10bb6f000]     6908 ms: Scavenge 1.8 (7.1) -> 0.8 (7.1) MB, 0.3 / 0.0 ms  (average mu = 1.000, current mu = 1.000) allocation failure
[21344:0x10bb6f000] Memory allocator,       used:   7312 KB, available: 1475440 KB
[21344:0x10bb6f000] Read-only space,        used:   3072 KB, available:    251 KB, committed:   3328 KB
[21344:0x10bb6f000] New space,              used:      7 KB, available:   1015 KB, committed:   2048 KB
[21344:0x10bb6f000] New large object space, used:      0 KB, available:   1022 KB, committed:      0 KB
[21344:0x10bb6f000] Old space,              used:    367 KB, available:    480 KB, committed:    956 KB
[21344:0x10bb6f000] Code space,             used:    155 KB, available:      0 KB, committed:    412 KB
[21344:0x10bb6f000] Map space,              used:     30 KB, available:      0 KB, committed:    260 KB
[21344:0x10bb6f000] Large object space,     used:    256 KB, available:      0 KB, committed:    260 KB
[21344:0x10bb6f000] Code large object space,     used:      3 KB, available:      0 KB, committed:     48 KB
[21344:0x10bb6f000] All spaces,             used:   3892 KB, available: 1478209 KB, committed:   7312 KB
[21344:0x10bb6f000] Unmapper buffering 0 chunks of committed:      0 KB
[21344:0x10bb6f000] External memory reported:     65 KB
[21344:0x10bb6f000] Backing store memory:     65 KB
[21344:0x10bb6f000] External memory global 0 KB
[21344:0x10bb6f000] Total time spent in GC  : 14.5 ms
[21344:0x10bb6f000] Fast promotion mode: false survival rate: 0%

deno_tcp:

$ deno --v8-flags=--trace-gc,--trace-gc-verbose --allow-net tools/deno_tcp.ts 127.0.0.1:4544

[21467:0x11664a000] Fast promotion mode: false survival rate: 16%
[21467:0x11664a000]     3443 ms: Scavenge 14.3 (26.6) -> 9.3 (44.9) MB, 13.9 / 0.0 ms  (average mu = 1.000, current mu = 1.000) allocation failure
[21467:0x11664a000] Memory allocator,       used:  45952 KB, available: 1436800 KB
[21467:0x11664a000] Read-only space,        used:   3072 KB, available:    251 KB, committed:   3328 KB
[21467:0x11664a000] New space,              used:   1670 KB, available:  14693 KB, committed:  32768 KB
[21467:0x11664a000] New large object space, used:      0 KB, available:  16364 KB, committed:      0 KB
[21467:0x11664a000] Old space,              used:   6325 KB, available:   1461 KB, committed:   7900 KB
[21467:0x11664a000] Code space,             used:    173 KB, available:    214 KB, committed:    412 KB
[21467:0x11664a000] Map space,              used:     73 KB, available:    185 KB, committed:    260 KB
[21467:0x11664a000] Large object space,     used:   1228 KB, available:      0 KB, committed:   1236 KB
[21467:0x11664a000] Code large object space,     used:      3 KB, available:      0 KB, committed:     48 KB
[21467:0x11664a000] All spaces,             used:  12547 KB, available: 1469971 KB, committed:  45952 KB
[21467:0x11664a000] Unmapper buffering 0 chunks of committed:      0 KB
[21467:0x11664a000] External memory reported:     12 KB
[21467:0x11664a000] Backing store memory:     12 KB
[21467:0x11664a000] External memory global 0 KB
[21467:0x11664a000] Total time spent in GC  : 52.9 ms
[21467:0x11664a000] Fast promotion mode: false survival rate: 28%
[21467:0x11664a000]     3841 ms: Scavenge 25.1 (45.1) -> 14.2 (47.4) MB, 5.7 / 0.0 ms  (average mu = 1.000, current mu = 1.000) allocation failure
[21467:0x11664a000] Memory allocator,       used:  48512 KB, available: 1434240 KB
[21467:0x11664a000] Read-only space,        used:   3072 KB, available:    251 KB, committed:   3328 KB
[21467:0x11664a000] New space,              used:   3713 KB, available:  12651 KB, committed:  32768 KB
[21467:0x11664a000] New large object space, used:      0 KB, available:  16364 KB, committed:      0 KB
[21467:0x11664a000] Old space,              used:   9323 KB, available:    901 KB, committed:  10460 KB
[21467:0x11664a000] Code space,             used:    173 KB, available:    214 KB, committed:    412 KB
[21467:0x11664a000] Map space,              used:     73 KB, available:    185 KB, committed:    260 KB
[21467:0x11664a000] Large object space,     used:   1228 KB, available:      0 KB, committed:   1236 KB
[21467:0x11664a000] Code large object space,     used:      3 KB, available:      0 KB, committed:     48 KB
[21467:0x11664a000] All spaces,             used:  17588 KB, available: 1464809 KB, committed:  48512 KB
[21467:0x11664a000] Unmapper buffering 0 chunks of committed:      0 KB
[21467:0x11664a000] External memory reported:     12 KB
[21467:0x11664a000] Backing store memory:     12 KB
[21467:0x11664a000] External memory global 0 KB
[21467:0x11664a000] Total time spent in GC  : 58.6 ms
[21467:0x11664a000] Fast promotion mode: false survival rate: 36%
[21467:0x11664a000]     4192 ms: Scavenge 27.9 (47.6) -> 18.1 (51.9) MB, 7.4 / 0.0 ms  (average mu = 1.000, current mu = 1.000) allocation failure
[21467:0x11664a000] Memory allocator,       used:  53120 KB, available: 1429632 KB
[21467:0x11664a000] Read-only space,        used:   3072 KB, available:    251 KB, committed:   3328 KB
[21467:0x11664a000] New space,              used:   3199 KB, available:  13164 KB, committed:  32768 KB
[21467:0x11664a000] New large object space, used:      0 KB, available:  16364 KB, committed:      0 KB
[21467:0x11664a000] Old space,              used:  13855 KB, available:   1192 KB, committed:  15068 KB
[21467:0x11664a000] Code space,             used:    173 KB, available:    214 KB, committed:    412 KB
[21467:0x11664a000] Map space,              used:     73 KB, available:    185 KB, committed:    260 KB
[21467:0x11664a000] Large object space,     used:   1228 KB, available:      0 KB, committed:   1236 KB
[21467:0x11664a000] Code large object space,     used:      3 KB, available:      0 KB, committed:     48 KB
[21467:0x11664a000] All spaces,             used:  21607 KB, available: 1461005 KB, committed:  53120 KB
[21467:0x11664a000] Unmapper buffering 0 chunks of committed:      0 KB
[21467:0x11664a000] External memory reported:     12 KB
[21467:0x11664a000] Backing store memory:     12 KB
[21467:0x11664a000] External memory global 0 KB
[21467:0x11664a000] Total time spent in GC  : 66.0 ms
[21467:0x11664a000] Fast promotion mode: false survival rate: 34%
[21467:0x11664a000]     4556 ms: Scavenge 32.3 (52.1) -> 22.3 (56.4) MB, 6.9 / 0.0 ms  (average mu = 1.000, current mu = 1.000) allocation failure
[21467:0x11664a000] Memory allocator,       used:  57728 KB, available: 1425024 KB
[21467:0x11664a000] Read-only space,        used:   3072 KB, available:    251 KB, committed:   3328 KB
[21467:0x11664a000] New space,              used:   3390 KB, available:  12973 KB, committed:  32768 KB
[21467:0x11664a000] New large object space, used:      0 KB, available:  16364 KB, committed:      0 KB
[21467:0x11664a000] Old space,              used:  18002 KB, available:   1633 KB, committed:  19676 KB
[21467:0x11664a000] Code space,             used:    173 KB, available:    214 KB, committed:    412 KB
[21467:0x11664a000] Map space,              used:     73 KB, available:    185 KB, committed:    260 KB
[21467:0x11664a000] Large object space,     used:   1228 KB, available:      0 KB, committed:   1236 KB
[21467:0x11664a000] Code large object space,     used:      3 KB, available:      0 KB, committed:     48 KB
[21467:0x11664a000] All spaces,             used:  25945 KB, available: 1456647 KB, committed:  57728 KB

Profiling data

I generated some profiling data using following commands:

$ target/debug/deno_core_http_bench --prof --log-internal-timer-events -log-timer-events --log-source-code

$ target/debug/deno --v8-flags=--prof,--log-internal-timer-events,-log-timer-events,--log-source-code --allow-net --current-thread tools/deno_tcp.ts 127.0.0.1:4544

Then generate profiling file from V8 log:

// `linux-tick-processor` also available
$ third_party/v8/tools/mac-tick-processor --preprocess isolate-[**WILDCARD**].log > v8_prof.json

Open profview in your browser, it's located in V8 tools directory

third_party/v8/tools/profview/index.html

Processed files:

deno_core_http_bench

v8_prof_deno_core

deno_tcp.ts

v8_prof_deno_tcp

Here we can clearly see that running deno_tcp.ts V8 spents ~12% of time in __kernelrpc_vm_remap. In case of deno_core_http_bench it is only 0.45% of time.

Looking for problem

By using --trace-gc-object-stats we can see what objects are being collected:

$ target/debug/deno --allow-net --v8-flags=--trace-gc-object-stats tools/deno_tcp.ts 127.0.0.1:4544 > v8_gc_stats.json

Open it in: https://mlippautz.github.io/v8-heap-stats/

You can see this:

heap_stats

I found some presentation that explains V8's GC logs:
https://www.slideshare.net/NodejsFoundation/are-your-v8-garbage-collection-logs-speaking-to-youjoyee-cheung-alibaba-cloudalibaba-group

It suggests that we've got a lot of closures and heap snapshot confirms that - there's a lot of "contexts" and "function types" in growing heap.

That's how far I've got, I want to get more eyes on this as I hit the wall for now. I tried using llnode but unfortunately it's lagging a lot in supported V8 version (we're using 7.7.200, llnode doesn't support 7.*.*).

All files are available in this gist: https://gist.github.com/bartlomieju/a341a36e9ef4ba04bf0357641be736b2

CC @ry @kevinkassimo @piscisaureus @afinch7 @kitsonk

@bartlomieju
Copy link
Member Author

Stale, lots of optimizations were introduces since flamegraphs were created

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant