Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: make test-tick-processor.js non-flaky #8542

Closed
wants to merge 6 commits into from

Conversation

indutny
Copy link
Member

@indutny indutny commented Sep 14, 2016

Checklist
  • make -j4 test (UNIX), or vcbuild test nosign (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines
Affected core subsystem(s)

test

Description of change

Wait for a sought-for symbol to appear instead of just hard-killing
subprocesses at 2s timeout.

Fix: #4427

Wait for a sought-for symbol to appear instead of just hard-killing
subprocesses at 2s timeout.

Fix: nodejs#4427
@nodejs-github-bot nodejs-github-bot added the test Issues and PRs related to the tests. label Sep 14, 2016
@indutny
Copy link
Member Author

indutny commented Sep 14, 2016

cc @Trott

@indutny
Copy link
Member Author

indutny commented Sep 14, 2016

cc @nodejs/testing

@indutny
Copy link
Member Author

indutny commented Sep 14, 2016

@indutny
Copy link
Member Author

indutny commented Sep 14, 2016

If CI is green - I'd suggest stress-run for that particular test in this PR.

@mscdex mscdex added the process Issues and PRs related to the process subsystem. label Sep 14, 2016
if (common.isWindows ||
common.isSunOS ||
common.isAix ||
common.isLinuxPPCBE ||
common.isFreeBSD) {
common.skip('C++ symbols are not mapped for this os.');
return;
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to include the rest of the tests inside this else block? Either that or the return keyword has to be restored, I would think...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes! Fixed now.

@Trott
Copy link
Member

Trott commented Sep 14, 2016

Also, if this PR ends up fixing the test's flakiness across all platforms, I'm totally naming my next goldfish Fedor in your honor. Well, probably not because that would be creepy. But I'm thinking about it anyway.

@indutny
Copy link
Member Author

indutny commented Sep 14, 2016

/me googled "creepy goldfish":

image

@indutny
Copy link
Member Author

indutny commented Sep 15, 2016

@Trott
Copy link
Member

Trott commented Sep 15, 2016

A test failure now will be a timeout as the test retries and retries and retries and never finds what it's looking for? (I am more than OK with that. Just making sure I understand the test and the intention.)

@indutny
Copy link
Member Author

indutny commented Sep 15, 2016

@Trott your analysis is correct.

@Trott
Copy link
Member

Trott commented Sep 15, 2016

Still some timeouts/failures. I wonder if splitting it into three test files would help so that:

  • each test can get the full 60 seconds (or whatever, depending on the platform) to succeed
  • when a test case fails, we know which one is failing

@indutny
Copy link
Member Author

indutny commented Sep 15, 2016

It won't help, looks like you can't open a locked file on Windows. I wonder if we can use pipes for this.

@indutny
Copy link
Member Author

indutny commented Sep 15, 2016

The newest version splits files into three and collects stdout output of --prof child, let's run CI one more time.

@indutny
Copy link
Member Author

indutny commented Sep 15, 2016

@indutny
Copy link
Member Author

indutny commented Sep 15, 2016

@indutny
Copy link
Member Author

indutny commented Sep 15, 2016

I think it should be generally ok, if we will increase allowed timeout values for new tick-processor tests. @Trott what do you think?

@Trott
Copy link
Member

Trott commented Sep 15, 2016

I think it should be generally ok, if we will increase allowed timeout values for new tick-processor tests. @Trott what do you think?

I'm not sure what you mean exactly. You mean the retry timeout? And for the Raspberry Pi devices only (because they look like the only ones failing right now, thankfully)? If so, we can probably set it to common.platformTimeout(750) instead of 750.

Would need run some stress tests but even as is, if this means the test are only flaky on the Raspberry Pi devices and not anywhere else, that's still a huge improvement!

@indutny
Copy link
Member Author

indutny commented Sep 15, 2016

@Trott I meant the test timeout, it looks like the test is killed too early on pi's.

@Trott
Copy link
Member

Trott commented Sep 15, 2016

@Trott I meant the test timeout, it looks like the test is killed too early on pi's.

Ah, yeah, that would be up to the @nodejs/build group I suppose. I think the timeout for Pi devices is already double or more that of most everything else, though.

An alternative might be to skip some or all (depending on whether some or all are failing) of these three tests on Pi devices by checking common.enoughTestMem and doing the common.skip() + return thing if it is false. (Does memory affect this test? Or is it more about CPU speed or something like that? If it's mostly a memory, then at least checking common.enoughTestMem makes sense and we're not using it as a proxy for "is Raspberry Pi".)

@indutny
Copy link
Member Author

indutny commented Sep 15, 2016

@Trott this seems to be CPU bound to me.

@Trott
Copy link
Member

Trott commented Sep 15, 2016

@Trott this seems to be CPU bound to me.

In that case, I guess long term, I'd like to see if @nodejs/build and @nodejs/testing would be OK with bumping the test timeout on the Raspberry Pi devices up. That might allow us to re-enable a handful of the other tests that we turned off for those devices.

In the meantime, just to get this fixed until something like that can happen, would it be too much of a code smell (and would it even work) to check the speed property of the objects in the array returned by os.cpus() and decide whether or not to skip based on that?

@indutny
Copy link
Member Author

indutny commented Sep 15, 2016

@Trott I think it should be safer to just mark these tests as FLAKY on ARM.

@Trott
Copy link
Member

Trott commented Sep 15, 2016

@Trott I think it should be safer to just mark these tests as FLAKY on ARM.

Sure, that works. The downside there is that it will also mark it flaky on the four ARM test platforms that aren't Raspberry Pi. But that seems acceptable to me.

@indutny
Copy link
Member Author

indutny commented Sep 15, 2016

@Trott we don't have ARM-specific code for these features, so it should be safe.

@indutny
Copy link
Member Author

indutny commented Sep 15, 2016

@Trott pushed an update that should supposedly mark these tests as FLAKY on arm only.

@Trott
Copy link
Member

Trott commented Sep 15, 2016

LGTM

@indutny
Copy link
Member Author

indutny commented Sep 15, 2016

@indutny
Copy link
Member Author

indutny commented Sep 15, 2016

@Trott I decided to re-run it to test FLAKY specifiers for the arch (we don't seem to use them anywhere else)

@Trott
Copy link
Member

Trott commented Sep 15, 2016

@Trott I decided to re-run it to test FLAKY specifiers for the arch (we don't seem to use them anywhere else)

Oh, yeah, the FLAKY test designation was ignored on Raspberry Pi for a while, IIRC. Might still be for all I know... Guess we're about to find out...

@indutny
Copy link
Member Author

indutny commented Sep 15, 2016

@Trott it looks like it didn't quite work. Should we land this as it is and defer the rest to the @nodejs/testing ?

@Trott
Copy link
Member

Trott commented Sep 16, 2016

@Trott it looks like it didn't quite work. Should we land this as it is and defer the rest to the @nodejs/testing ?

I think that would be OK.

@Trott
Copy link
Member

Trott commented Sep 16, 2016

If you want to be bold and bump up the Raspberry Pi devices from 3 minute timeout to 5 minute timeout, I think you just need to change the 3 on line 819 of test.py to 5. If that's all it takes, I would support that. :-D

And if you really don't want to spend more time tweaking things just to accommodate Raspberry Pi, I can also understand that. :-|

@indutny
Copy link
Member Author

indutny commented Sep 16, 2016

Landed in 5a17139, thank you!

@indutny indutny closed this Sep 16, 2016
@indutny indutny deleted the fix/gh-4427 branch September 16, 2016 12:18
indutny added a commit that referenced this pull request Sep 16, 2016
Wait for a sought-for symbol to appear instead of just hard-killing
subprocesses at 2s timeout.

Fix: #4427
PR-URL: #8542
Reviewed-By: Rich Trott <rtrott@gmail.com>
Fishrock123 pushed a commit that referenced this pull request Oct 11, 2016
Wait for a sought-for symbol to appear instead of just hard-killing
subprocesses at 2s timeout.

Fix: #4427
PR-URL: #8542
Reviewed-By: Rich Trott <rtrott@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
process Issues and PRs related to the process subsystem. test Issues and PRs related to the tests.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate flaky test-tick-processor
5 participants