-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate flaky test/async-hooks/test-callback-error #13527
Comments
I think we should make the test more robust by only running the |
Happened again today: https://ci.nodejs.org/job/node-test-commit-osx/10360/nodes=osx1010/console not ok 167 async-hooks/test-callback-error
---
duration_ms: 60.68
severity: fail
stack: |-
timeout |
/cc @nodejs/async_hooks |
It might be good to split the three test cases in that file into three separate tests/test files so that we can see which one is the one that's timing out. (I'm pretty sure it's |
could add |
So finally I have data, seems like it is
|
/cc @DavidCai1993 |
We don't have a good reason to expect |
-1 on skip this case on macOS at this time, too. Seems that the problem is somehow related to |
I added a timeout on const c3 = spawnSync(process.execPath, a, { timeout: 30000 });
console.log((new Date()).toISOString() + ' finished test_callback_abort');
assert.ifError(c3.error); and if (common.isOSX && c3.error && c3.error.code === 'ETIMEDOUT') return common.skip() Or simply |
I ran this test 1000 times on my Mac, all tests finished within 200-400 ms. Thanks for explaining the does |
@Trott why does the |
I think the idea was to avoid possibly creating a bunch of core files. |
I see, then maybe we should move the test to that directory. |
Yes, that's what it means, and yes it's confusing, but |
I'd agree with that. Curious if we ever run the |
Oh, by the way, another one on CI today: https://ci.nodejs.org/job/node-test-commit-osx/10391/nodes=osx1010/console not ok 175 async-hooks/test-callback-error
---
duration_ms: 60.25
severity: fail
stack: |-
timeout |
Or at least moving the abort-on-uncaught-exception thing into its own test? I don't know what I think anymore. |
Looks like what we have here may be a bug in Node.js master branch on macOS and not a problem with the test. With this one-line test file called throw new Error('foo'); If I run it with current master: $ ./node --abort-on-uncaught-exception test.js
Uncaught Error: foo
FROM
Object.<anonymous> (/Users/trott/io.js/test.js:1:1)
Module._compile (module.js:1:1)
Object.Module._extensions..js (module.js:1:1)
Module.load (module.js:1:1)
tryModuleLoad (module.js:1:1)
Function.Module._load (module.js:1:1)
Function.Module.runMain (module.js:1:1)
startup (bootstrap_node.js:1:1)
bootstrap_node.js:1:1
Received signal 4 <unknown> 000100c84632
==== C stack trace ===============================
[0x000100c83442]
[0x7fff8200652a]
[0x000000000000]
[0x000100954b4b]
[0x0db5628843fd]
[0x0db562987c7a]
[0x0db56293be7c]
[end of stack trace]
Illegal instruction: 4
$ There is a looooong pause between printing If I use Node.js 8.1.0, I get similar output, but there's no long pause. Can someone try to replicate that just to confirm it's not some oddity in my own setup? |
@Trott I cannot reproduce it on macOS with neither the master branch, nor 8.1.0. It just aborts immediately. |
@aqrln Do you get the |
I can't reproduce on master, this is what I'm getting:
|
|
Interesting that the stack traces are so very different.... |
I did a
|
Now the stack traces match at least, but the time difference is certainly troubling. |
When run as root, there is no delay and there is no "Illegal instruction: 4". |
I think the issue is in V8 (or perhaps more precisely, an interaction of macOS with V8 code--not sure if the bug is in one place, the other, or really neither). If I don't pass |
Yes. Although I now have the confounding additional information that the issue manifests itself with only some users on my laptop. If I switch to other users, the issue goes away. It's entirely possible the problem is entirely my setup, but I suspect it's not quite that simple because we're seeing this in CI as well. |
OK, it's almost definitely user setup at this point. When I run the program, while it's hanging, a process named ReportCrash goes way up to 99% CPU. Main thing at this point is probably to:
|
Although...if I run as root, things work just fine, no delay, etc. So maybe the real issue is to figure that out rather than disabling crash reporting... ¯\(ツ)/¯ Again, I wouldn't be going on and on about problems with my own personal setup if it weren't for the fact that we're seeing it in CI too, and we need to fix it there.... |
I wonder what the node::Chdir is about in the stack trace... |
Logging on to the CI machine, I don't get the long delay with simple cases like that described in test-nodesource-osx1010-x64-1:osx1010 iojs$ time ./node --abort-on-uncaught-exception test/async-hooks/test-callback-error.js test_callback_abort
Error: test_callback_abort
at ActivityCollector.initHooks.oninit.common.mustCall (/Users/iojs/build/workspace/node-stress-single-test/nodes/osx1010/test/async-hooks/test-callback-error.js:30:45)
at ActivityCollector.oninit (/Users/iojs/build/workspace/node-stress-single-test/nodes/osx1010/test/common/index.js:517:15)
at ActivityCollector._init (/Users/iojs/build/workspace/node-stress-single-test/nodes/osx1010/test/async-hooks/init-hooks.js:170:10)
at init (async_hooks.js:467:43)
at Object.emitInitS [as emitInit] (async_hooks.js:337:3)
at Object.<anonymous> (/Users/iojs/build/workspace/node-stress-single-test/nodes/osx1010/test/async-hooks/test-callback-error.js:33:17)
at Module._compile (module.js:569:30)
at Object.Module._extensions..js (module.js:580:10)
at Module.load (module.js:503:32)
at tryModuleLoad (module.js:466:12)
1: node::Abort() [/Users/iojs/build/workspace/node-stress-single-test/nodes/osx1010/./node]
2: node::Chdir(v8::FunctionCallbackInfo<v8::Value> const&) [/Users/iojs/build/workspace/node-stress-single-test/nodes/osx1010/./node]
3: v8::internal::FunctionCallbackArguments::Call(void (*)(v8::FunctionCallbackInfo<v8::Value> const&)) [/Users/iojs/build/workspace/node-stress-single-test/nodes/osx1010/./node]
4: v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) [/Users/iojs/build/workspace/node-stress-single-test/nodes/osx1010/./node]
5: v8::internal::Builtin_Impl_HandleApiCall(v8::internal::BuiltinArguments, v8::internal::Isolate*) [/Users/iojs/build/workspace/node-stress-single-test/nodes/osx1010/./node]
6: 0x3a7608e043fd
7: 0x3a7608ef7c49
Received signal 6
==== C stack trace ===============================
[0x000100c4402b]
[0x7fff8d42bf1a]
[0x000000000002]
[0x7fff8a9d79a3]
[0x000100ac4e0d]
[0x000100ac8a2e]
[0x000100216c92]
[0x0001002925d0]
[0x000100291b59]
[0x3a7608e043fd]
[0x3a7608ef7c49]
[end of stack trace]
Abort trap: 6
real 2m2.042s
user 0m0.072s
sys 0m0.051s
test-nodesource-osx1010-x64-1:osx1010 iojs$ And if I run I don't have root or sudo access on the machine so I can't see what happens if I run as root and I can't experiment with using |
Assuming you're using @rvagg could you put the test key in the |
...
real 2m2.042s
... 😵 |
If the plan is to move to MacStadium and not use requireio for Mac testing, then this may all be a moot point, at least in terms of CI. How soon can we move to MacStadium? |
Good news! @rvagg reports that disabling ReportCrash with |
This has been fixed since @rvagg ran these two commands on the host: launchctl unload -w /System/Library/LaunchAgents/com.apple.ReportCrash.plist
sudo launchctl unload -w /System/Library/LaunchDaemons/com.apple.ReportCrash.Root.plist Note that they need to be run from a GUI login or else they error out. There's an issue open to get this into the provisioning for OS X machines. But for now, the problem is resolved, so I'm going to close this. |
https://ci.nodejs.org/job/node-test-commit-osx/10336/nodes=osx1010/console
The text was updated successfully, but these errors were encountered: