-
Notifications
You must be signed in to change notification settings - Fork 30.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spawn() is not asynchronous, blocks event loop for 2-3 seconds #14917
Comments
Some sleuthing with perf(1) should tell you where the time is spent. I have a few hunches but why guess when you can know for sure, right? |
Thanks Ben. Here's a script to reproduce. Would you try your perf magic on it to see what you see? It grows the heap every 2 seconds by 256 mb and then spawns
I get the following:
I think this is the explanation but perhaps there's a better:
|
We removed the offending The problem is down to http://docs.libuv.org/en/v1.x/process.html#c.uv_spawn being run in the main event loop thread, and not in the thread pool. Forking a new process causes the RSS of the parent process to be copied. As RSS increases, spawn time increases. Anyone familiar with In the meantime, I'm willing to open a PR to add a warning to the
|
That depends. Linux preforks the working set (to reduce page faults afterwards) but platforms that do simple copy-on-write profit from synchronicity because they only copy a few pages of stack. In fact, such platforms would be penalized when thread A forks just when thread B is collecting garbage and touching all pages. |
Blocking the event loop for upwards of 2-3 seconds whenever There must be plenty of people affected by this issue who are yet to realize where all their throughput is disappearing to.
It's good to know the problem is limited to Linux but Linux is a major platform for Node. If anyone is running Node servers with large RSS they are probably doing that on Linux. Surely calling
I'm not sure about that. GC is free to touch all pages in any event, regardless of whether fork is called from the main event loop thread or from a background thread. |
You are the first to bring it up, to my knowledge, so it can't be that prevalent. What's more, I don't think your particular use case would benefit from threading so much as it would from marking everything but a few pages That said, have you verified with perf that the fork() or clone() system call is the actual cost center?
I think you misunderstand. The fact that uv_spawn() does not return until the fork calls execve() means there is never a concurrent compacting GC (except concurrent marking, but that's limited in scope and frequency.) No GC is good because otherwise the parent incurs a page fault for every page it mutates until the execve() call. That's easily tens or even hundreds of thousands of page faults. It's not a hypothetical. uv_spawn() initially didn't wait for the fork to complete. When I changed that, it had the pleasant side effect of improving a number of benchmarks, sometimes dramatically so. |
It took a few months to track down. My point was actually that most people wouldn't think that Node's async I'm not in fact the first to bring it up. @davepacheco ran into this issue at Joyent in production (https://github.com/davepacheco/node-spawn-async). Notwithstanding, whether it's prevalent is irrelevant. It should be fixed without regard to whether people happen to know it's broken or not.
Sure, that would also work, but we just stopped using Node's
See the script to reproduce here #14917 (comment). You are welcome to verify with
Thanks for expanding. The current design is very tightly coupled to the GC. At present, if I understand correctly, the story is that Why not just keep the event loop running, run the fork in another thread and use another method to keep the compacting GC from running while a fork is in progress? Just pause the compacting GC? |
My point is that the number of bug reports is proportional to the number of affected users. We haven't received many bug reports so there likely aren't many affected users.
Issues are prioritized based on their impact. This is not a clear "always wrong" bug, it's suboptimal behavior under specific conditions. In fact, it's not even clear if there is a way of addressing this without regressing the common case.
The reason I ask you to is that there may be something particular to your setup (kernel, configuration, something else.) perf should be able to capture that.
No, hence "side effect" (although one I half expected when I made the change.)
You can't. It's like saying "why don't you stop breathing while you're under water?" |
@jorangreef I'm still interested to see the output of |
I don't understand the details of process spawning as well as I'd like, but I thought it might be helpful to report that the synchronous nature of As a solution, we moved all Git spawning to a separate Electron process. Since that eats way more memory than we'd like, I'm tinkering with a native spawn server that will serve a similar role to the server in Before proceeding though, I thought I'd check my understanding.
This makes sense to me, but I'm not clear on the consequences of excessive page faults. If we built a native module that performed the spawn off the main thread would we be just treading into the same issues you fixed by moving it there to begin with? Would this still lead to poor responsiveness? Basically, is direct spawning from a Node process always a bad idea? That's my intuition based on this thread, but I wanted to check. How do other garbage-collected languages deal with spawning? Do they all suffer from this same pitfall? Are there platform-specific tools for spawning that could avoid forking? Apologies if these are somewhat naive questions. The answer may be that I just need to go spend more time researching, but I thought other people might have similar questions. |
@nathansobo No problem, happy to answer. The thread and later the fork still share memory with the parent until execve so yes, you'd have the same issue. A Linux-only solution might be to call clone(2) with the right set of flags to create a thread or fork with unshared memory that spawns the new process, although it's probably not exactly trivial to combine with the other setup that uv_spawn() does. If you have time to test something: see this comment in libuv's sources; I'd be interested to know if disabling the pipe impacts Electron positively or negatively. |
Disagreed here. We have had unexplainable event loop blocks for years. The reason why we didn't file a bug report was that: a) The problem was not big enough for us to worry about (RSS sizes not high enough). Point is: For most other problems I'd agree with your thesis. But this one is different. I was really happy to find this issue. 👍 |
I had some time to look into With a ~600 MB heap, the test completes 10x faster when memory is marked with Patch: diff --git a/deps/v8/src/base/platform/platform-posix.cc b/deps/v8/src/base/platform/platform-posix.cc
index b873197d3b..6d5fa1b6f6 100644
--- a/deps/v8/src/base/platform/platform-posix.cc
+++ b/deps/v8/src/base/platform/platform-posix.cc
@@ -136,6 +136,9 @@ void* Allocate(void* address, size_t size, OS::MemoryPermission access) {
void* result =
mmap(address, actual_size, prot, flags, kMmapFd, kMmapFdOffset);
if (result == MAP_FAILED) return nullptr;
+#if defined(V8_OS_LINUX)
+ madvise(address, actual_size, MADV_DONTFORK);
+#endif
return result;
}
I filed https://bugs.chromium.org/p/v8/issues/detail?id=7381 to discuss. Test: 'use strict';
const { spawnSync } = require('child_process');
for (global.a = []; a.length < 1e7; a.push({ a }));
const start = process.hrtime();
for (let i = 0; i < 2500; ++i) spawnSync('./a.out', {stdio:'inherit'});
const [sec, nsec] = process.hrtime(start);
console.log(`${sec}.${nsec}`); Binary: // cc -static -nostdlib -s
#include <sys/syscall.h>
void __attribute__((noreturn)) _start(void) {
for (;;) asm volatile("syscall" :: "a" (SYS_exit_group), "D" (0));
} And for a 'told you so' moment: I also checked whether removing the wait-for-fork logic made a difference. The answer is 'no', it's in fact 5-10% slower and highly variable for the reasons I've mentioned earlier. diff --git a/deps/uv/src/unix/process.c b/deps/uv/src/unix/process.c
index 9842710d0e..775e2f7188 100644
--- a/deps/uv/src/unix/process.c
+++ b/deps/uv/src/unix/process.c
@@ -418,16 +418,12 @@ int uv_spawn(uv_loop_t* loop,
/* fork is marked __WATCHOS_PROHIBITED __TVOS_PROHIBITED. */
return -ENOSYS;
#else
- int signal_pipe[2] = { -1, -1 };
int pipes_storage[8][2];
int (*pipes)[2];
int stdio_count;
- ssize_t r;
pid_t pid;
int err;
- int exec_errorno;
int i;
- int status;
assert(options->file != NULL);
assert(!(options->flags & ~(UV_PROCESS_DETACHED |
@@ -462,30 +458,6 @@ int uv_spawn(uv_loop_t* loop,
goto error;
}
- /* This pipe is used by the parent to wait until
- * the child has called `execve()`. We need this
- * to avoid the following race condition:
- *
- * if ((pid = fork()) > 0) {
- * kill(pid, SIGTERM);
- * }
- * else if (pid == 0) {
- * execve("/bin/cat", argp, envp);
- * }
- *
- * The parent sends a signal immediately after forking.
- * Since the child may not have called `execve()` yet,
- * there is no telling what process receives the signal,
- * our fork or /bin/cat.
- *
- * To avoid ambiguity, we create a pipe with both ends
- * marked close-on-exec. Then, after the call to `fork()`,
- * the parent polls the read end until it EOFs or errors with EPIPE.
- */
- err = uv__make_pipe(signal_pipe, 0);
- if (err)
- goto error;
-
uv_signal_start(&loop->child_watcher, uv__chld, SIGCHLD);
/* Acquire write lock to prevent opening new fds in worker threads */
@@ -495,42 +467,17 @@ int uv_spawn(uv_loop_t* loop,
if (pid == -1) {
err = -errno;
uv_rwlock_wrunlock(&loop->cloexec_lock);
- uv__close(signal_pipe[0]);
- uv__close(signal_pipe[1]);
goto error;
}
if (pid == 0) {
- uv__process_child_init(options, stdio_count, pipes, signal_pipe[1]);
+ uv__process_child_init(options, stdio_count, pipes, -1);
abort();
}
/* Release lock in parent process */
uv_rwlock_wrunlock(&loop->cloexec_lock);
- uv__close(signal_pipe[1]);
-
process->status = 0;
- exec_errorno = 0;
- do
- r = read(signal_pipe[0], &exec_errorno, sizeof(exec_errorno));
- while (r == -1 && errno == EINTR);
-
- if (r == 0)
- ; /* okay, EOF */
- else if (r == sizeof(exec_errorno)) {
- do
- err = waitpid(pid, &status, 0); /* okay, read errorno */
- while (err == -1 && errno == EINTR);
- assert(err == pid);
- } else if (r == -1 && errno == EPIPE) {
- do
- err = waitpid(pid, &status, 0); /* okay, got EPIPE */
- while (err == -1 && errno == EINTR);
- assert(err == pid);
- } else
- abort();
-
- uv__close_nocheckstdio(signal_pipe[0]);
for (i = 0; i < options->stdio_count; i++) {
err = uv__process_open_stream(options->stdio + i, pipes[i], i == 0);
@@ -543,11 +490,8 @@ int uv_spawn(uv_loop_t* loop,
goto error;
}
- /* Only activate this handle if exec() happened successfully */
- if (exec_errorno == 0) {
- QUEUE_INSERT_TAIL(&loop->process_handles, &process->queue);
- uv__handle_start(process);
- }
+ QUEUE_INSERT_TAIL(&loop->process_handles, &process->queue);
+ uv__handle_start(process);
process->pid = pid;
process->exit_cb = options->exit_cb;
@@ -555,7 +499,7 @@ int uv_spawn(uv_loop_t* loop,
if (pipes != pipes_storage)
uv__free(pipes);
- return exec_errorno;
+ return 0;
error:
if (pipes != NULL) { |
Thanks Ben. With your patch, how long is the event loop blocked when the RSS is 16 GB, using the test I provided? Fork may still block significantly just iterating across so many pages, even if they are marked with
Thanks also for filing the V8 issue. I think the real issue is really that fork is still being called from the main thread. Fork should be called asynchronously from the libuv threadpool (and any GC issues which arise can be resolved differently). No one wants to go around making sure every buffer is marked It's punting the issue back to userland, when fork could just be moved off the main thread into the threadpool. |
I also noticed that your test case is actually benchmarking Just to confirm that we are on the same page, that this issue is about |
Perhaps I didn't explain it clearly but the point is that fork() is inexpensive when it doesn't have to copy huge amounts of memory. Overhead inside node.js starts to dominate once that's eliminated. Moving things into the threadpool makes no sense, that just adds extra latency and variance.
Yes, but it doesn't matter. They're pretty similar, implementation-wise.
You'll need to explain to me how you reached that conclusion... |
Your
Sure, we both know it makes sense to move a 20ms, 200ms or 2000ms Granted, Node could still do much better on the threadpool front (increasing the size of the threadpool to avoid head of line blocking, separating IO tasks from CPU tasks to optimize throughput and latency independently) but those are separate issues and no reason not to move Then again, moving
It matters because Node advertises No one using Node should have to be writing things like node-spawn-async and spawn-server. Node is supposed to be a proponent of non-blocking IO.
Sure, the script I provided specifically benchmarks RSS and not just the V8 heap as yours does since most people with RSS of 16 GB are probably already keeping everything off the heap to minimize GC. Are you thinking that Node should mark all Buffers as Moving |
That's the "the bigger the heap, the bigger the difference" I mentioned. Whether it's 600 MB or 16 GB doesn't matter because it isn't copied. Think O(1) vs. O(n).
Yes, and no, that's not a breaking change. Yes, add-ons would need to do that too if they allocate a lot of memory but that's a good change in and of itself. Forking from a thread won't help because of page faults; even if it did, it'd still waste a lot of CPU cycles, only less visible. If you still disagree, the onus is on you to prove it with code and numbers. |
I am probably wrong, but I would speculate that To be clear, Node and V8 doing |
That's what the kernel does; the madvise applies to the vma ( |
Hi peeps, very interesting discussion indeed! I've been racking my brains and searching through dump files for a potential memory leak but I think this is most likely what's slowing my scraper down so much over time. I'm relying on Full disclosure: I'm certainly not a node ninja just yet and am self taught, so apologies if the answer to this question seems obvious. If I call main.js > spawn os.numpCPU() workers using custer.fork() > workers scrape in parallel using spawn() to hit the local cli > get needed information, send back to master > master delegates next block to scrape, if any, and handles writing to the exported csv file
When I read stuff like that I get a bit nervous - would be great to get a better understanding of how Keep up the good work! |
@GrayedFox - the problem context does not involve any cluster. Instead, a child process forking site (the child process itself can be a cluster member though). The issue is: there is a series of activities the parent has to perform before it can let the child on its own and it can let the program get onto the next set of instructions. This series of actions has to be synchronous, and causes block in the uv loop, with the latency as a function of the |
@gireeshpunathil thanks for the response, however I'm not sure I get what you mean by
I get that when a parent forks a child process this takes time (and is either asynchronous or not) and that the latency of this fork function is related to size of the parent process. What I want to know is if, say, I have a single master process with 4 workers, and each of these workers individually spawn child processes too (many of them, in fact) - will the spawned children 2 levels deep cause blocking in my master proc? Here's a visualisation: MASTER |
no, activities in (fully born) child processes do not bother the parent (or master) process in any manner. As you already guessed it right, this concerns only if a process directly uses (invokes) spawn* family of functions:
So in this simple case, we are talking about the |
I guess this discussion has run its course with a conclusion:
Closing, thanks everyone for the great insights. |
Speed up child_process.spawn by enabling the new V8 build flag which makes fork/exec faster. Here are the results of running the existing benchmark. Note that this optimization helps more for applications with larger heaps, so this is somewhat of an underestimate of the real world performance benefits. ```console $ ./node benchmark/compare.js --runs 15 \ --new ./node \ --old ~/node-v20/out/Release/node \ --filter params child_process > cpr $ node-benchmark-compare cpr confidence improvement (***) methodName='exec' n=1000 *** 60.84 % ±5.43% methodName='execFile' n=1000 *** 53.72 % ±3.33% methodName='execFileSync' n=1000 *** 9.10 % ±0.84% methodName='execSync' n=1000 *** 10.44 % ±0.97% methodName='spawn' n=1000 *** 53.10 % ±2.90% methodName='spawnSync' n=1000 *** 8.64 % ±1.22% 0.01 false positives, when considering a 0.1% risk acceptance (***) ``` Fixes: #25382 Fixes: #14917 Refs: nodejs/performance#93 Refs: nodejs/performance#89 PR-URL: #48523 Refs: v8/v8@1a782f6 Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com> Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com> Reviewed-By: Debadree Chatterjee <debadree333@gmail.com>
Original commit message: [base] add build flag to use MADV_DONTFORK Embedders like Node.js and Electron expose fork(2)/execve(2) to their users. Unfortunately when the V8 heap is very large, these APIs become rather slow on Linux, due to the kernel needing to do all the bookkeeping for the forked process (in clone's dup_mmap and execve's exec_mmap). Of course, this is useless because the forked child thread will never actually need to access the V8 heap. Add a new build flag v8_enable_private_mapping_fork_optimization which marks all pages allocated by OS::Allocate as MADV_DONTFORK. This improves the performance of Node.js's fork/execve combination by 10x on a 600 MB heap. Fixed: v8:7381 Change-Id: Ib649f774d4a932b41886313ce89acc369923699d Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/4602858 Commit-Queue: Michael Lippautz <mlippautz@chromium.org> Reviewed-by: Michael Lippautz <mlippautz@chromium.org> Cr-Commit-Position: refs/heads/main@{#88447} Refs: v8/v8@1a782f6 PR-URL: #48523 Fixes: #25382 Fixes: #14917 Refs: nodejs/performance#93 Refs: nodejs/performance#89 Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com> Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com> Reviewed-By: Debadree Chatterjee <debadree333@gmail.com>
Speed up child_process.spawn by enabling the new V8 build flag which makes fork/exec faster. Here are the results of running the existing benchmark. Note that this optimization helps more for applications with larger heaps, so this is somewhat of an underestimate of the real world performance benefits. ```console $ ./node benchmark/compare.js --runs 15 \ --new ./node \ --old ~/node-v20/out/Release/node \ --filter params child_process > cpr $ node-benchmark-compare cpr confidence improvement (***) methodName='exec' n=1000 *** 60.84 % ±5.43% methodName='execFile' n=1000 *** 53.72 % ±3.33% methodName='execFileSync' n=1000 *** 9.10 % ±0.84% methodName='execSync' n=1000 *** 10.44 % ±0.97% methodName='spawn' n=1000 *** 53.10 % ±2.90% methodName='spawnSync' n=1000 *** 8.64 % ±1.22% 0.01 false positives, when considering a 0.1% risk acceptance (***) ``` Fixes: #25382 Fixes: #14917 Refs: nodejs/performance#93 Refs: nodejs/performance#89 PR-URL: #48523 Refs: v8/v8@1a782f6 Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com> Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com> Reviewed-By: Debadree Chatterjee <debadree333@gmail.com>
Original commit message: [base] add build flag to use MADV_DONTFORK Embedders like Node.js and Electron expose fork(2)/execve(2) to their users. Unfortunately when the V8 heap is very large, these APIs become rather slow on Linux, due to the kernel needing to do all the bookkeeping for the forked process (in clone's dup_mmap and execve's exec_mmap). Of course, this is useless because the forked child thread will never actually need to access the V8 heap. Add a new build flag v8_enable_private_mapping_fork_optimization which marks all pages allocated by OS::Allocate as MADV_DONTFORK. This improves the performance of Node.js's fork/execve combination by 10x on a 600 MB heap. Fixed: v8:7381 Change-Id: Ib649f774d4a932b41886313ce89acc369923699d Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/4602858 Commit-Queue: Michael Lippautz <mlippautz@chromium.org> Reviewed-by: Michael Lippautz <mlippautz@chromium.org> Cr-Commit-Position: refs/heads/main@{#88447} Refs: v8/v8@1a782f6 PR-URL: nodejs#48523 Fixes: nodejs#25382 Fixes: nodejs#14917 Refs: nodejs/performance#93 Refs: nodejs/performance#89 Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com> Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com> Reviewed-By: Debadree Chatterjee <debadree333@gmail.com>
Original commit message: [base] add build flag to use MADV_DONTFORK Embedders like Node.js and Electron expose fork(2)/execve(2) to their users. Unfortunately when the V8 heap is very large, these APIs become rather slow on Linux, due to the kernel needing to do all the bookkeeping for the forked process (in clone's dup_mmap and execve's exec_mmap). Of course, this is useless because the forked child thread will never actually need to access the V8 heap. Add a new build flag v8_enable_private_mapping_fork_optimization which marks all pages allocated by OS::Allocate as MADV_DONTFORK. This improves the performance of Node.js's fork/execve combination by 10x on a 600 MB heap. Fixed: v8:7381 Change-Id: Ib649f774d4a932b41886313ce89acc369923699d Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/4602858 Commit-Queue: Michael Lippautz <mlippautz@chromium.org> Reviewed-by: Michael Lippautz <mlippautz@chromium.org> Cr-Commit-Position: refs/heads/main@{#88447} Refs: v8/v8@1a782f6 PR-URL: nodejs#48523 Fixes: nodejs#25382 Fixes: nodejs#14917 Refs: nodejs/performance#93 Refs: nodejs/performance#89 Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com> Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com> Reviewed-By: Debadree Chatterjee <debadree333@gmail.com>
Original commit message: [base] add build flag to use MADV_DONTFORK Embedders like Node.js and Electron expose fork(2)/execve(2) to their users. Unfortunately when the V8 heap is very large, these APIs become rather slow on Linux, due to the kernel needing to do all the bookkeeping for the forked process (in clone's dup_mmap and execve's exec_mmap). Of course, this is useless because the forked child thread will never actually need to access the V8 heap. Add a new build flag v8_enable_private_mapping_fork_optimization which marks all pages allocated by OS::Allocate as MADV_DONTFORK. This improves the performance of Node.js's fork/execve combination by 10x on a 600 MB heap. Fixed: v8:7381 Change-Id: Ib649f774d4a932b41886313ce89acc369923699d Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/4602858 Commit-Queue: Michael Lippautz <mlippautz@chromium.org> Reviewed-by: Michael Lippautz <mlippautz@chromium.org> Cr-Commit-Position: refs/heads/main@{#88447} Refs: v8/v8@1a782f6 PR-URL: nodejs#48523 Fixes: nodejs#25382 Fixes: nodejs#14917 Refs: nodejs/performance#93 Refs: nodejs/performance#89 Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com> Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com> Reviewed-By: Debadree Chatterjee <debadree333@gmail.com>
Original commit message: [base] add build flag to use MADV_DONTFORK Embedders like Node.js and Electron expose fork(2)/execve(2) to their users. Unfortunately when the V8 heap is very large, these APIs become rather slow on Linux, due to the kernel needing to do all the bookkeeping for the forked process (in clone's dup_mmap and execve's exec_mmap). Of course, this is useless because the forked child thread will never actually need to access the V8 heap. Add a new build flag v8_enable_private_mapping_fork_optimization which marks all pages allocated by OS::Allocate as MADV_DONTFORK. This improves the performance of Node.js's fork/execve combination by 10x on a 600 MB heap. Fixed: v8:7381 Change-Id: Ib649f774d4a932b41886313ce89acc369923699d Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/4602858 Commit-Queue: Michael Lippautz <mlippautz@chromium.org> Reviewed-by: Michael Lippautz <mlippautz@chromium.org> Cr-Commit-Position: refs/heads/main@{#88447} Refs: v8/v8@1a782f6 PR-URL: nodejs#48523 Fixes: nodejs#25382 Fixes: nodejs#14917 Refs: nodejs/performance#93 Refs: nodejs/performance#89 Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com> Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com> Reviewed-By: Debadree Chatterjee <debadree333@gmail.com>
Original commit message: [base] add build flag to use MADV_DONTFORK Embedders like Node.js and Electron expose fork(2)/execve(2) to their users. Unfortunately when the V8 heap is very large, these APIs become rather slow on Linux, due to the kernel needing to do all the bookkeeping for the forked process (in clone's dup_mmap and execve's exec_mmap). Of course, this is useless because the forked child thread will never actually need to access the V8 heap. Add a new build flag v8_enable_private_mapping_fork_optimization which marks all pages allocated by OS::Allocate as MADV_DONTFORK. This improves the performance of Node.js's fork/execve combination by 10x on a 600 MB heap. Fixed: v8:7381 Change-Id: Ib649f774d4a932b41886313ce89acc369923699d Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/4602858 Commit-Queue: Michael Lippautz <mlippautz@chromium.org> Reviewed-by: Michael Lippautz <mlippautz@chromium.org> Cr-Commit-Position: refs/heads/main@{#88447} Refs: v8/v8@1a782f6 PR-URL: nodejs#48523 Fixes: nodejs#25382 Fixes: nodejs#14917 Refs: nodejs/performance#93 Refs: nodejs/performance#89 Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com> Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com> Reviewed-By: Debadree Chatterjee <debadree333@gmail.com>
Original commit message: [base] add build flag to use MADV_DONTFORK Embedders like Node.js and Electron expose fork(2)/execve(2) to their users. Unfortunately when the V8 heap is very large, these APIs become rather slow on Linux, due to the kernel needing to do all the bookkeeping for the forked process (in clone's dup_mmap and execve's exec_mmap). Of course, this is useless because the forked child thread will never actually need to access the V8 heap. Add a new build flag v8_enable_private_mapping_fork_optimization which marks all pages allocated by OS::Allocate as MADV_DONTFORK. This improves the performance of Node.js's fork/execve combination by 10x on a 600 MB heap. Fixed: v8:7381 Change-Id: Ib649f774d4a932b41886313ce89acc369923699d Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/4602858 Commit-Queue: Michael Lippautz <mlippautz@chromium.org> Reviewed-by: Michael Lippautz <mlippautz@chromium.org> Cr-Commit-Position: refs/heads/main@{#88447} Refs: v8/v8@1a782f6 PR-URL: nodejs#48523 Fixes: nodejs#25382 Fixes: nodejs#14917 Refs: nodejs/performance#93 Refs: nodejs/performance#89 Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com> Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com> Reviewed-By: Debadree Chatterjee <debadree333@gmail.com>
Speed up child_process.spawn by enabling the new V8 build flag which makes fork/exec faster. Here are the results of running the existing benchmark. Note that this optimization helps more for applications with larger heaps, so this is somewhat of an underestimate of the real world performance benefits. ```console $ ./node benchmark/compare.js --runs 15 \ --new ./node \ --old ~/node-v20/out/Release/node \ --filter params child_process > cpr $ node-benchmark-compare cpr confidence improvement (***) methodName='exec' n=1000 *** 60.84 % ±5.43% methodName='execFile' n=1000 *** 53.72 % ±3.33% methodName='execFileSync' n=1000 *** 9.10 % ±0.84% methodName='execSync' n=1000 *** 10.44 % ±0.97% methodName='spawn' n=1000 *** 53.10 % ±2.90% methodName='spawnSync' n=1000 *** 8.64 % ±1.22% 0.01 false positives, when considering a 0.1% risk acceptance (***) ``` Fixes: nodejs#25382 Fixes: nodejs#14917 Refs: nodejs/performance#93 Refs: nodejs/performance#89 PR-URL: nodejs#48523 Refs: v8/v8@1a782f6 Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com> Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com> Reviewed-By: Debadree Chatterjee <debadree333@gmail.com>
Original commit message: [base] add build flag to use MADV_DONTFORK Embedders like Node.js and Electron expose fork(2)/execve(2) to their users. Unfortunately when the V8 heap is very large, these APIs become rather slow on Linux, due to the kernel needing to do all the bookkeeping for the forked process (in clone's dup_mmap and execve's exec_mmap). Of course, this is useless because the forked child thread will never actually need to access the V8 heap. Add a new build flag v8_enable_private_mapping_fork_optimization which marks all pages allocated by OS::Allocate as MADV_DONTFORK. This improves the performance of Node.js's fork/execve combination by 10x on a 600 MB heap. Fixed: v8:7381 Change-Id: Ib649f774d4a932b41886313ce89acc369923699d Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/4602858 Commit-Queue: Michael Lippautz <mlippautz@chromium.org> Reviewed-by: Michael Lippautz <mlippautz@chromium.org> Cr-Commit-Position: refs/heads/main@{#88447} Refs: v8/v8@1a782f6 PR-URL: nodejs#48523 Fixes: nodejs#25382 Fixes: nodejs#14917 Refs: nodejs/performance#93 Refs: nodejs/performance#89 Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com> Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com> Reviewed-By: Debadree Chatterjee <debadree333@gmail.com>
Speed up child_process.spawn by enabling the new V8 build flag which makes fork/exec faster. Here are the results of running the existing benchmark. Note that this optimization helps more for applications with larger heaps, so this is somewhat of an underestimate of the real world performance benefits. ```console $ ./node benchmark/compare.js --runs 15 \ --new ./node \ --old ~/node-v20/out/Release/node \ --filter params child_process > cpr $ node-benchmark-compare cpr confidence improvement (***) methodName='exec' n=1000 *** 60.84 % ±5.43% methodName='execFile' n=1000 *** 53.72 % ±3.33% methodName='execFileSync' n=1000 *** 9.10 % ±0.84% methodName='execSync' n=1000 *** 10.44 % ±0.97% methodName='spawn' n=1000 *** 53.10 % ±2.90% methodName='spawnSync' n=1000 *** 8.64 % ±1.22% 0.01 false positives, when considering a 0.1% risk acceptance (***) ``` Fixes: nodejs#25382 Fixes: nodejs#14917 Refs: nodejs/performance#93 Refs: nodejs/performance#89 PR-URL: nodejs#48523 Refs: v8/v8@1a782f6 Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com> Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com> Reviewed-By: Debadree Chatterjee <debadree333@gmail.com>
Original commit message: [base] add build flag to use MADV_DONTFORK Embedders like Node.js and Electron expose fork(2)/execve(2) to their users. Unfortunately when the V8 heap is very large, these APIs become rather slow on Linux, due to the kernel needing to do all the bookkeeping for the forked process (in clone's dup_mmap and execve's exec_mmap). Of course, this is useless because the forked child thread will never actually need to access the V8 heap. Add a new build flag v8_enable_private_mapping_fork_optimization which marks all pages allocated by OS::Allocate as MADV_DONTFORK. This improves the performance of Node.js's fork/execve combination by 10x on a 600 MB heap. Fixed: v8:7381 Change-Id: Ib649f774d4a932b41886313ce89acc369923699d Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/4602858 Commit-Queue: Michael Lippautz <mlippautz@chromium.org> Reviewed-by: Michael Lippautz <mlippautz@chromium.org> Cr-Commit-Position: refs/heads/main@{#88447} Refs: v8/v8@1a782f6 PR-URL: nodejs#48523 Fixes: nodejs#25382 Fixes: nodejs#14917 Refs: nodejs/performance#93 Refs: nodejs/performance#89 Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com> Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com> Reviewed-By: Debadree Chatterjee <debadree333@gmail.com>
Speed up child_process.spawn by enabling the new V8 build flag which makes fork/exec faster. Here are the results of running the existing benchmark. Note that this optimization helps more for applications with larger heaps, so this is somewhat of an underestimate of the real world performance benefits. ```console $ ./node benchmark/compare.js --runs 15 \ --new ./node \ --old ~/node-v20/out/Release/node \ --filter params child_process > cpr $ node-benchmark-compare cpr confidence improvement (***) methodName='exec' n=1000 *** 60.84 % ±5.43% methodName='execFile' n=1000 *** 53.72 % ±3.33% methodName='execFileSync' n=1000 *** 9.10 % ±0.84% methodName='execSync' n=1000 *** 10.44 % ±0.97% methodName='spawn' n=1000 *** 53.10 % ±2.90% methodName='spawnSync' n=1000 *** 8.64 % ±1.22% 0.01 false positives, when considering a 0.1% risk acceptance (***) ``` Fixes: nodejs#25382 Fixes: nodejs#14917 Refs: nodejs/performance#93 Refs: nodejs/performance#89 PR-URL: nodejs#48523 Refs: v8/v8@1a782f6 Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com> Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com> Reviewed-By: Debadree Chatterjee <debadree333@gmail.com>
Original commit message: [base] add build flag to use MADV_DONTFORK Embedders like Node.js and Electron expose fork(2)/execve(2) to their users. Unfortunately when the V8 heap is very large, these APIs become rather slow on Linux, due to the kernel needing to do all the bookkeeping for the forked process (in clone's dup_mmap and execve's exec_mmap). Of course, this is useless because the forked child thread will never actually need to access the V8 heap. Add a new build flag v8_enable_private_mapping_fork_optimization which marks all pages allocated by OS::Allocate as MADV_DONTFORK. This improves the performance of Node.js's fork/execve combination by 10x on a 600 MB heap. Fixed: v8:7381 Change-Id: Ib649f774d4a932b41886313ce89acc369923699d Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/4602858 Commit-Queue: Michael Lippautz <mlippautz@chromium.org> Reviewed-by: Michael Lippautz <mlippautz@chromium.org> Cr-Commit-Position: refs/heads/main@{#88447} Refs: v8/v8@1a782f6 PR-URL: #48523 Backport-PR-URL: #50098 Fixes: #25382 Fixes: #14917 Refs: nodejs/performance#93 Refs: nodejs/performance#89 Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com> Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com> Reviewed-By: Debadree Chatterjee <debadree333@gmail.com>
Speed up child_process.spawn by enabling the new V8 build flag which makes fork/exec faster. Here are the results of running the existing benchmark. Note that this optimization helps more for applications with larger heaps, so this is somewhat of an underestimate of the real world performance benefits. ```console $ ./node benchmark/compare.js --runs 15 \ --new ./node \ --old ~/node-v20/out/Release/node \ --filter params child_process > cpr $ node-benchmark-compare cpr confidence improvement (***) methodName='exec' n=1000 *** 60.84 % ±5.43% methodName='execFile' n=1000 *** 53.72 % ±3.33% methodName='execFileSync' n=1000 *** 9.10 % ±0.84% methodName='execSync' n=1000 *** 10.44 % ±0.97% methodName='spawn' n=1000 *** 53.10 % ±2.90% methodName='spawnSync' n=1000 *** 8.64 % ±1.22% 0.01 false positives, when considering a 0.1% risk acceptance (***) ``` Fixes: #25382 Fixes: #14917 Refs: nodejs/performance#93 Refs: nodejs/performance#89 PR-URL: #48523 Backport-PR-URL: #50098 Refs: v8/v8@1a782f6 Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com> Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com> Reviewed-By: Debadree Chatterjee <debadree333@gmail.com>
Original commit message: [base] add build flag to use MADV_DONTFORK Embedders like Node.js and Electron expose fork(2)/execve(2) to their users. Unfortunately when the V8 heap is very large, these APIs become rather slow on Linux, due to the kernel needing to do all the bookkeeping for the forked process (in clone's dup_mmap and execve's exec_mmap). Of course, this is useless because the forked child thread will never actually need to access the V8 heap. Add a new build flag v8_enable_private_mapping_fork_optimization which marks all pages allocated by OS::Allocate as MADV_DONTFORK. This improves the performance of Node.js's fork/execve combination by 10x on a 600 MB heap. Fixed: v8:7381 Change-Id: Ib649f774d4a932b41886313ce89acc369923699d Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/4602858 Commit-Queue: Michael Lippautz <mlippautz@chromium.org> Reviewed-by: Michael Lippautz <mlippautz@chromium.org> Cr-Commit-Position: refs/heads/main@{#88447} Refs: v8/v8@1a782f6 PR-URL: nodejs/node#48523 Backport-PR-URL: nodejs/node#50098 Fixes: nodejs/node#25382 Fixes: nodejs/node#14917 Refs: nodejs/performance#93 Refs: nodejs/performance#89 Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com> Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com> Reviewed-By: Debadree Chatterjee <debadree333@gmail.com>
Speed up child_process.spawn by enabling the new V8 build flag which makes fork/exec faster. Here are the results of running the existing benchmark. Note that this optimization helps more for applications with larger heaps, so this is somewhat of an underestimate of the real world performance benefits. ```console $ ./node benchmark/compare.js --runs 15 \ --new ./node \ --old ~/node-v20/out/Release/node \ --filter params child_process > cpr $ node-benchmark-compare cpr confidence improvement (***) methodName='exec' n=1000 *** 60.84 % ±5.43% methodName='execFile' n=1000 *** 53.72 % ±3.33% methodName='execFileSync' n=1000 *** 9.10 % ±0.84% methodName='execSync' n=1000 *** 10.44 % ±0.97% methodName='spawn' n=1000 *** 53.10 % ±2.90% methodName='spawnSync' n=1000 *** 8.64 % ±1.22% 0.01 false positives, when considering a 0.1% risk acceptance (***) ``` Fixes: nodejs/node#25382 Fixes: nodejs/node#14917 Refs: nodejs/performance#93 Refs: nodejs/performance#89 PR-URL: nodejs/node#48523 Backport-PR-URL: nodejs/node#50098 Refs: v8/v8@1a782f6 Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com> Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com> Reviewed-By: Debadree Chatterjee <debadree333@gmail.com>
Original commit message: [base] add build flag to use MADV_DONTFORK Embedders like Node.js and Electron expose fork(2)/execve(2) to their users. Unfortunately when the V8 heap is very large, these APIs become rather slow on Linux, due to the kernel needing to do all the bookkeeping for the forked process (in clone's dup_mmap and execve's exec_mmap). Of course, this is useless because the forked child thread will never actually need to access the V8 heap. Add a new build flag v8_enable_private_mapping_fork_optimization which marks all pages allocated by OS::Allocate as MADV_DONTFORK. This improves the performance of Node.js's fork/execve combination by 10x on a 600 MB heap. Fixed: v8:7381 Change-Id: Ib649f774d4a932b41886313ce89acc369923699d Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/4602858 Commit-Queue: Michael Lippautz <mlippautz@chromium.org> Reviewed-by: Michael Lippautz <mlippautz@chromium.org> Cr-Commit-Position: refs/heads/main@{#88447} Refs: v8/v8@1a782f6 PR-URL: nodejs/node#48523 Backport-PR-URL: nodejs/node#50098 Fixes: nodejs/node#25382 Fixes: nodejs/node#14917 Refs: nodejs/performance#93 Refs: nodejs/performance#89 Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com> Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com> Reviewed-By: Debadree Chatterjee <debadree333@gmail.com>
Speed up child_process.spawn by enabling the new V8 build flag which makes fork/exec faster. Here are the results of running the existing benchmark. Note that this optimization helps more for applications with larger heaps, so this is somewhat of an underestimate of the real world performance benefits. ```console $ ./node benchmark/compare.js --runs 15 \ --new ./node \ --old ~/node-v20/out/Release/node \ --filter params child_process > cpr $ node-benchmark-compare cpr confidence improvement (***) methodName='exec' n=1000 *** 60.84 % ±5.43% methodName='execFile' n=1000 *** 53.72 % ±3.33% methodName='execFileSync' n=1000 *** 9.10 % ±0.84% methodName='execSync' n=1000 *** 10.44 % ±0.97% methodName='spawn' n=1000 *** 53.10 % ±2.90% methodName='spawnSync' n=1000 *** 8.64 % ±1.22% 0.01 false positives, when considering a 0.1% risk acceptance (***) ``` Fixes: nodejs/node#25382 Fixes: nodejs/node#14917 Refs: nodejs/performance#93 Refs: nodejs/performance#89 PR-URL: nodejs/node#48523 Backport-PR-URL: nodejs/node#50098 Refs: v8/v8@1a782f6 Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com> Reviewed-By: Juan José Arboleda <soyjuanarbol@gmail.com> Reviewed-By: Debadree Chatterjee <debadree333@gmail.com>
This comment was marked as abuse.
This comment was marked as abuse.
FWIW the issue you linked, while I still see as invalid, would be a duplicate of this one in any case. |
This comment was marked as abuse.
This comment was marked as abuse.
Yeah, that attitude is really going to take you far. @redyetidev asked you to post a reproducer without 3rd-party deps. Do that first, then we'll talk. |
spawn()
is not asynchronous while launching a child process.I think this is a known issue but it took a few months to track down.
Our event loop was blocking frequently:
The system has many components and the RSS is around 14 GB. We had had issues with GC pause times so we moved everything off-heap and reduced the number of pointers. We may still have issues with GC pause times, but I thought it might also be something else.
In addition, over time, the majority of components were rewritten to run in the threadpool.
I thought it was finally down to deduplication chunking and hashing, but when this was recently moved to the threadpool, I took another look at a spawn routine which was calling out to
clamdscan
(moving to a socket protocol was the long term plan but spawningclamdscan
was the first draft):I came across nodejs/node-v0.x-archive#9250 today and https://github.com/davepacheco/node-spawn-async seems to explain it in terms of the large heap or RSS size:
Wrapping the spawn code above in a simple timer shows:
All correlated with the event loop blocks reported above.
I understand that spawn is not meant to be called every few seconds, but I never expected average latency of 300ms?
Is there any way to improve spawn along the lines of what Dave Pacheco has done?
Or at least to document that
spawn()
will block the event loop for around 300ms depending on heap size?The text was updated successfully, but these errors were encountered: