-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate flaky test test-tick-processor #2471
Comments
I opened an issue (#2431) and it was thought not to be an issue, so I guess we'll see? |
I just saw this on armv7-wheezy:
|
it's flaking out on armv7-wheezy still https://jenkins-iojs.nodesource.com/job/node-test-commit-arm/267/nodes=armv7-wheezy/console something's not happy there at all, fine on raspberry pi 2 though which is armv7 and wheezy-based, tho not exactly wheezy |
I got another error with the same test:
|
I ran this test repeatedly on Linux locally (Ubuntu 14.04). It failed 3 times out of 1188 runs (0.25%). It seems that this test is flaky at least on all Linux platforms (likely with a much higher failure rate in armv7). |
I was able to reproduce this on Ubuntu locally by putting my machine under heavy load and gather some data. The raw ticks are too large for pastebin but here's the processed output: A single JS tick was collected and it did not include my test function. I need to investigate more to determine why it's missing |
I was able to get this one to blow up after 58 runs on OS X with this:
Here's the error output:
|
Not sure if it would address the indeterminacy (but I suspect it might), and I hope it's not a foolish question, but (ignoring the more-than-80-chars-in-a-line issue) is there any reason this part of the test: cp.execFileSync(process.execPath, ['-prof', '-pe',
'function foo(n) {' +
'require(\'vm\').runInDebugContext(\'Debug\');' +
'return n < 2 ? n : setImmediate(function() { foo(n-1) + foo(n-2);}); };' +
'setTimeout(function() { process.exit(0); }, 2000);' +
'foo(40);']); ...couldn't be something more like this?: cp.execFileSync(process.execPath, ['-prof', '-pe',
'function foo(n) {' +
'require(\'vm\').runInDebugContext(\'Debug\');' +
'return n < 2 ? process.exit(0); : setImmediate(function() { foo(n-1) + foo(n-2);}); };' +
'foo(4);']); ...or this?: cp.execFileSync(process.execPath, ['-prof', '-pe',
'function foo(n) {' +
'require(\'vm\').runInDebugContext(\'Debug\');' +
'return n < 2 ? setImmediate(function () {process.exit(0);}) : setImmediate(function() { foo(n-1) + foo(n-2);}); };' +
'foo(4);']); /cc @matthewloring |
The hope was to have the code run for a fixed time as opposed to a fixed number of cycles. We didn't want it to slow down too much on the slow machines or finish before sufficient samples had been taken on faster ones. We could try switching to something like this but it may be hard to find a sweet spot. |
Inside of exports.platformTimeout = function(ms) {
if (process.arch !== 'arm')
return ms;
if (process.config.variables.arm_version === '6')
return 7 * ms; // ARMv6
return 2 * ms; // ARMv7 and up.
}; Maybe something similar to it could be created and used to tweak the number of iterations so ARM would only run (say) 20 times through the loop before |
I'll look into it that but I'm worried something more tricky is going on. I tried both of the proposed alternatives above. The first flaked after 23 iterations and the second flaked after 166 iterations. Let me try to figure out why the C++/Javascript tick ratio is so high in the failing runs. |
I looked into the logs posted by @matthewloring. The test wants to make sure it finds ticks in the javascript function Looking at the ticks recorded by V8, The anonymous inner function at column 99 does show up higher up. The fix would be to either increase the callGraph size printed by the tick-processor or to grep for "LazyCompile.*[eval]:1". This should pattern match both the foo method and the inner anonymous function. |
This test is also fails on PPC. The issue for PPC is that on the V8 side the functionality used by test-tick-processor only includes symbols for functions if they are in the executable range and since PPC uses function descriptors as the entry points, and they are not in the executable range they symbols for most of the functions are discarded. I was going to open a new issue but thought I'd mention it here first. |
Per the discussion on #2471, the JS symbols checked for by this test were occasionally too deep in the stack and were being ignored by the tick processor. I have addressed this by increasing the stack depth inspected by the tick processor and looking for the eval symbol which is more likely to be present. Additional flakiness was caused by occasional misses of the code creation event for the JS function being executed. I now have separate code snippets to test for JS and C++ symbols and if the code creation event is missed for the JS symbol test then I check for a percentage of UNKNOWN symbols in processed output. This is considered a success as the processing scripts in the node repository are still correctly processing the ticks recieved from the v8 scripts. Further investigation is needed into the v8 profiling scripts to determine why code creation events are being missed. PR-URL: #2694 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Per the discussion on #2471, the JS symbols checked for by this test were occasionally too deep in the stack and were being ignored by the tick processor. I have addressed this by increasing the stack depth inspected by the tick processor and looking for the eval symbol which is more likely to be present. Additional flakiness was caused by occasional misses of the code creation event for the JS function being executed. I now have separate code snippets to test for JS and C++ symbols and if the code creation event is missed for the JS symbol test then I check for a percentage of UNKNOWN symbols in processed output. This is considered a success as the processing scripts in the node repository are still correctly processing the ticks recieved from the v8 scripts. Further investigation is needed into the v8 profiling scripts to determine why code creation events are being missed. PR-URL: #2694 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Per the discussion on #2471, the JS symbols checked for by this test were occasionally too deep in the stack and were being ignored by the tick processor. I have addressed this by increasing the stack depth inspected by the tick processor and looking for the eval symbol which is more likely to be present. Additional flakiness was caused by occasional misses of the code creation event for the JS function being executed. I now have separate code snippets to test for JS and C++ symbols and if the code creation event is missed for the JS symbol test then I check for a percentage of UNKNOWN symbols in processed output. This is considered a success as the processing scripts in the node repository are still correctly processing the ticks recieved from the v8 scripts. Further investigation is needed into the v8 profiling scripts to determine why code creation events are being missed. PR-URL: #2694 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
@dnakamura could you add a raw tick log @matthewloring we have debugged the problem to the point were we believe the issue is due to PPC 64 BE using function descriptors (as does AIX 64 and 32 were we see the same problem). The issue seems to be on the V8 side where it only adds symbols in address ranges marked executable which is not the case for function descriptors. If we remove that check the test passes. We are still looking at how best to address the issue. At this point I think we should likely create a new issue to track the PPC issue as opposed to using this one unless you think otherwise. |
@mhdawson I think tracking it separately at this point makes sense, especially if the fixes will be going in on the V8 side. |
Since the AIX problem isn't related to flakiness, I think it should be a separate issue. Can this issue be closed now? |
My fix is in and to my knowledge the flakiness is gone. Was hoping someone who has watched the CI since #2694 went in could confirm. |
Created new issue for PPC failures here: #2957 |
At this point if nobody is complaining I think we should close this. Please re-open if you disagree |
Examples of failures:
armv7-wheezy
The text was updated successfully, but these errors were encountered: