-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test hangs on OSX 10.9.5 test-domain-no-error-handler-abort-on-uncaught.js #9979
Comments
Thanks for reporting the issue. I am unable to replicate this on Darwin Kernel Version 16.1.0 and I have not seen it on the test machines. I'm not sure what could be causing it. Hopefully someone else will have more insight. Please do comment here if you manage to get it working. |
I've run into this a few times today too, and I can reproduce this one when running I got "Illegal instruction: 4" when trying master, v4.4.7 and v6.9.1 to run individual test cases(passing |
Turns out I can finish the test, it just takes a long time(long enough to fail the test because of timeout when running with the python script).
another one
and another one
And I notice that there are two |
Looks like this test launches 13 child processes at once. Can one or both of you see what happens if you only have the first six items in the array and remove the last seven? And then again but with the last seven in the array and the first six removed? If both of those still exhibit problems, how about if there's just one item in the array? |
Just for clarity: The array I'm talking about is |
I have tried some combinations, it looks like whenever I hit the else branch(the one launching child processes from the array), the test will run for a long time. I have tried all the individual cases with BTW aren't there 10 functions in that array? That's what I get from if (process.argv[2] === 'child') {
const testIndex = +process.argv[3];
+ console.log('running test ' + testIndex);
tests[testIndex]();
} else {
-
tests.forEach(function(test, testIndex) {
var testCmd = '';
if (!common.isWindows) {
// Do not create core files, as it can take a lot of disk space on
@@ -156,6 +175,7 @@ if (process.argv[2] === 'child') {
testCmd += ' ' + 'child';
testCmd += ' ' + testIndex;
+ console.log('spawn child ' + testIndex);
var child = child_process.exec(testCmd);
child.on('exit', function onExit(exitCode, signal) {
@@ -164,5 +184,9 @@ if (process.argv[2] === 'child') {
' and signal ' + signal;
assert(common.nodeProcessAborted(exitCode, signal), errMsg);
});
+
+ child.stdout.on('data', function(data) {
+ console.log(data);
+ });
});
} I've tried adding some logs in the tests and see what they would do. All test cases can run to the point before they throw an error out. |
Update: I tried to log in the onexit callback. Most of the times half of the test cases hit that line and log immediately, and after dozens of seconds, another half hit there and log and then quit. It's pretty random which ones will be in the first batch, which ones the second, and it's not ordered. Is there a race or anything? @@ -156,13 +175,19 @@ if (process.argv[2] === 'child') {
testCmd += ' ' + 'child';
testCmd += ' ' + testIndex;
var child = child_process.exec(testCmd);
child.on('exit', function onExit(exitCode, signal) {
+ console.log(exitCode, signal, testIndex);
const errMsg = 'Test at index ' + testIndex + ' should have aborted ' +
'but instead exited with exit code ' + exitCode +
' and signal ' + signal;
assert(common.nodeProcessAborted(exitCode, signal), errMsg);
});
+
+ child.stdout.on('data', function(data) {
+ console.log(data);
+ });
});
} |
When I do |
Given how many processes this spawns, and given how we've seen timers be less reliable on heavily-loaded machines, and how some of these tests cases rely on timers, I wonder if the solution might be to move those tests to their own files. In a way, this is a test that is bypassing the way Other possible solution might be to rewrite this to run the tests one at a time and have each child process kick off the next one. But it will probably be easier code to understand to just have the tests in separate files. @nodejs/testing |
There could be use cases like this test(spawning multiple processes to max CPU utilization, using If there is no objection I'll be happy to pick up the splitting task.
I get 10 when I |
@joyeecheung I figured it out. You are correctly looking at |
@joyeecheung Yes, please feel free to split this test up into multiple test files to reduce concurrency! |
@Trott Ok, working on it! I think I still need to spawn one child process per test file, because they need |
Sounds about right to me, if I'm understanding correctly. |
Hmm..turns out there is something off with the 'use strict';
const domain = require('domain');
function test() {
const d = domain.create();
d.run(function() {
throw new Error('boom!');
});
}
test(); and run it with
But, if I run this test without I think the problem is not about child processes. It's about
The abortion is triggered with
and that is why I get "Illegal instruction: 4" when the process aborted.
But if I simply do I am not very familiar with the domains and how |
The original test lauches 10 child processes at once and bypass `test.py`'s process regulation. This PR reduces the unmanaged parallelism and is a temporary workaround for nodejs#9979 (not a real fix).
After splitting the test, the hanging time doesn't change significantly, but at least now they are under more control of test.py. The hanging time on my mac is still high compared to what I get from a ubuntu docker container. Lauching this:
In a ubuntu container(16.04):
On the host machine(Darwin Kernel Version 15.6.0)
Without further investigation all I observe from the lldb debugger is that it keeps hitting this line at
Making a PR for the test splitting anyways... |
Uh, this may be a long shot, but is there a signal handler attached that does not terminate the process (e.g. one that was attached by libuv/from JS land)? That would lead to a “hanging” loop in that place, since every time the signal handler returns the process attempts to execute the crashing instruction again? |
I am not sure will there be any signal handlers attached if I run these tests with #if defined(__clang__)
#if defined(__GNUC__) // Clang in gcc mode.
# define V8_CC_GNU 1
#endif
#elif defined(__GNUC__)
# define V8_CC_GNU 1
#endif
#if V8_CC_GNU
#define V8_IMMEDIATE_CRASH() __builtin_trap()
#else
#define V8_IMMEDIATE_CRASH() ((void(*)())0)()
#endif
// the lines above are copied from the V8 source
int main(int argc, char *argv[]) {
V8_IMMEDIATE_CRASH();
return 0;
} crashes immediately on my machine. Maybe there are some compile flags that I should try? |
The original test lauches 10 child processes at once and bypass `test.py`'s process regulation. This PR reduces the unmanaged parallelism and is a temporary workaround for nodejs#9979 (not a real fix). PR-URL: nodejs#10329 Reviewed-By: Anna Henningsen <anna@addaleax.net>
The original test lauches 10 child processes at once and bypass `test.py`'s process regulation. This PR reduces the unmanaged parallelism and is a temporary workaround for nodejs#9979 (not a real fix). PR-URL: nodejs#10329 Reviewed-By: Anna Henningsen <anna@addaleax.net>
The original test lauches 10 child processes at once and bypass `test.py`'s process regulation. This PR reduces the unmanaged parallelism and is a temporary workaround for nodejs#9979 (not a real fix). PR-URL: nodejs#10329 Reviewed-By: Anna Henningsen <anna@addaleax.net>
The original test lauches 10 child processes at once and bypass `test.py`'s process regulation. This PR reduces the unmanaged parallelism and is a temporary workaround for nodejs#9979 (not a real fix). PR-URL: nodejs#10329 Reviewed-By: Anna Henningsen <anna@addaleax.net>
The original test lauches 10 child processes at once and bypass `test.py`'s process regulation. This PR reduces the unmanaged parallelism and is a temporary workaround for nodejs#9979 (not a real fix). PR-URL: nodejs#10329 Reviewed-By: Anna Henningsen <anna@addaleax.net>
The original test lauches 10 child processes at once and bypass `test.py`'s process regulation. This PR reduces the unmanaged parallelism and is a temporary workaround for nodejs#9979 (not a real fix). PR-URL: nodejs#10329 Reviewed-By: Anna Henningsen <anna@addaleax.net>
The original test lauches 10 child processes at once and bypass `test.py`'s process regulation. This PR reduces the unmanaged parallelism and is a temporary workaround for nodejs#9979 (not a real fix). PR-URL: nodejs#10329 Reviewed-By: Anna Henningsen <anna@addaleax.net>
@joyeecheung Does this still hang for you? If so, what is the output of |
@Trott Can not reproduce anymore, although I have updated my OS since then. I think we can reopen this issue when it shows up again. |
This test hangs on:
=== release test-domain-no-error-handler-abort-on-uncaught ===
Path: parallel/test-domain-no-error-handler-abort-on-uncaught
Command: out/Release/node /Users/youngj/JsProjects/node/test/parallel/test-domain-no-error-handler-abort-on-uncaught.js
The text was updated successfully, but these errors were encountered: