Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: move --cpu-prof tests to sequential #28210

Closed
wants to merge 1 commit into from

Conversation

joyeecheung
Copy link
Member

@joyeecheung joyeecheung commented Jun 13, 2019

The tests still fail after being split into multiple files,
(2 out of 30 runs in roughly 48 hours) and the causes are missing
target frames in the samples. This patch moves them to sequential
to observe if the flakiness can be fixed when the tests are
run on a system with less load.

If the flake ever shows up again even after the tests are moved
to sequential, we should consider make the tests more
lenient - that is, we would only assert that there are some frames
in the generated CPU profile but do not look for the target
function there.

Refs: #27611

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • commit message follows commit guidelines

The tests still fail after being split into multiple files,
(2 out of 30 runs in roughly 48 hours) and the causes are missing
target frames in the samples. This patch moves them to sequential
to observe if the flakiness can be fixed when the tests are
run on a system with less load.

If the flake ever shows up again even after the tests are moved
to sequential, we should consider make the test conditions more
lenient - that is, we would only assert that there are *some* frames
in the generated CPU profile but do not look for the target
function there.
@nodejs-github-bot
Copy link
Collaborator

Sadly, an error occurred when I tried to trigger a build. :(

@nodejs-github-bot nodejs-github-bot added the test Issues and PRs related to the tests. label Jun 13, 2019
@joyeecheung
Copy link
Member Author

joyeecheung commented Jun 13, 2019

hmm, the stats above may be wrong because out of those 2 failed runs, one of them is https://ci.nodejs.org/job/node-test-pull-request/23810/ which was

ReferenceError: common is not defined

But it is still flaky as it affects https://ci.nodejs.org/job/node-test-pull-request/23832/

Copy link
Member

@Trott Trott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but I'll kick off a pair of node-stress-single-test runs to compare parallel vs. sequential to gather some data.

@Trott
Copy link
Member

Trott commented Jun 13, 2019

@Trott
Copy link
Member

Trott commented Jun 13, 2019

hmm, the stats above may be wrong because out of those 2 failed runs, one of them is https://ci.nodejs.org/job/node-test-pull-request/23810/ which was

ReferenceError: common is not defined

But it is still flaky as it affects https://ci.nodejs.org/job/node-test-pull-request/23832/

FWIW, here are some failures from node-test-pull-request. It's 5 in roughly the last 24 hours:

https://ci.nodejs.org/job/node-test-commit-freebsd/26873/ (test-cpu-prof-dir-worker)
https://ci.nodejs.org/job/node-test-commit-freebsd/26850/ (test-cpu-prof-dir-and-name)
https://ci.nodejs.org/job/node-test-commit-freebsd/26845/ (test-cpu-prof-drained)
https://ci.nodejs.org/job/node-test-commit-freebsd/26842/ (test-cpu-prof-dir-absolute)
https://ci.nodejs.org/job/node-test-commit-freebsd/26840/ (test-cpu-prof-dir-and-name)

It's all on FreeBSD, but that's not FeeBSD per se but rather our specific config in CI which has a lot of cores so runs a lot of things in parallel

@Trott
Copy link
Member

Trott commented Jun 13, 2019

Hmmm...FreeBSD 11 build failed for both stress tests. Will run again on FreeBSD 10 and FreeBSD Latest if the other platforms don't show anything interesting, since this definitely seems to affect FreeBSD a lot.

Copy link
Member

@mhdawson mhdawson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@joyeecheung
Copy link
Member Author

@Trott Thanks for starting the stress tests. It looks like on macOS which manage to build and run the tests, there was a difference though (17 v.s. 0 out of 1000 runs).

I am wondering whether running that on FreeBSD 10 alone is enough - the flake has only showed up in 11 in the CI ever since the tests were split.

@Trott
Copy link
Member

Trott commented Jun 14, 2019

Unfortunately, the FreeBSD hosts are not successfully building in the stress tests for either master or this branch. I'll take a look to see if I can figure anything out, or maybe someone else on @nodejs/build will have an idea.

But I think the OS X results are enough to show that it's unreliable in parallel (19 failures in 1000 runs) and seems reliable in sequential (0 failures in 1000 runs). And I strongly suspect the FreeBSD hosts would show a significantly higher rate of failure (because they usually do in these types of cases).

@Trott
Copy link
Member

Trott commented Jun 16, 2019

Looks like this landed this in 7e5e1c2 but wasn't closed? Closing, but of course, re-open if I'm wrong somehow. /ping @joyeecheung

@Trott Trott closed this Jun 16, 2019
BridgeAR pushed a commit that referenced this pull request Jun 17, 2019
The tests still fail after being split into multiple files,
(2 out of 30 runs in roughly 48 hours) and the causes are missing
target frames in the samples. This patch moves them to sequential
to observe if the flakiness can be fixed when the tests are
run on a system with less load.

If the flake ever shows up again even after the tests are moved
to sequential, we should consider make the test conditions more
lenient - that is, we would only assert that there are *some* frames
in the generated CPU profile but do not look for the target
function there.

PR-URL: #28210
Refs: #27611
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
@BridgeAR BridgeAR mentioned this pull request Jun 17, 2019
targos pushed a commit that referenced this pull request Jun 18, 2019
The tests still fail after being split into multiple files,
(2 out of 30 runs in roughly 48 hours) and the causes are missing
target frames in the samples. This patch moves them to sequential
to observe if the flakiness can be fixed when the tests are
run on a system with less load.

If the flake ever shows up again even after the tests are moved
to sequential, we should consider make the test conditions more
lenient - that is, we would only assert that there are *some* frames
in the generated CPU profile but do not look for the target
function there.

PR-URL: #28210
Refs: #27611
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test Issues and PRs related to the tests.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants