Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: fix flaky doctool and test #29979

Merged
merged 0 commits into from
Oct 15, 2019
Merged

Conversation

Trott
Copy link
Member

@Trott Trott commented Oct 15, 2019

Doctool tests have been failing a lot in CI on Win2008 R2. It appears
async functions and callback-based functions are being used in
combination such that the callback-based function cannot guarantee that
it will invoke its callback. Convert the callback-based functions to
async functions so we have one paradigm and reliable results.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • commit message follows commit guidelines

@nodejs-github-bot nodejs-github-bot added doc Issues and PRs related to the documentations. tools Issues and PRs related to the tools directory. labels Oct 15, 2019
@nodejs-github-bot
Copy link
Collaborator

@Trott
Copy link
Member Author

Trott commented Oct 15, 2019

Relevant Windows CI is green. Taking this out of Draft mode. Collaborators please 👍 here to fast-track to unbreak CI.

@Trott Trott added the fast-track PRs that do not need to wait for 48 hours to land. label Oct 15, 2019
@Trott Trott marked this pull request as ready for review October 15, 2019 06:35
@nodejs-github-bot
Copy link
Collaborator

@Trott
Copy link
Member Author

Trott commented Oct 15, 2019

Stress test won't work for this because doctool isn't built in that job, but in addition to the green Windows run in the regular CI, here's four more Windows CI runs with this code to confirm it fixes the doctool test issue there:

✅ = Windows job was fully green
✔️ = Windows job was yellow or red, but the doctool tests on win2008r2 passed and that's what matters here

https://ci.nodejs.org/job/node-test-binary-windows-2/3531/
https://ci.nodejs.org/job/node-test-binary-windows-2/3533/ ✔️
https://ci.nodejs.org/job/node-test-binary-windows-2/3534/ ✔️
https://ci.nodejs.org/job/node-test-binary-windows-2/3535/ ✔️
https://ci.nodejs.org/job/node-test-binary-windows-2/3536/ ✔️

@Trott
Copy link
Member Author

Trott commented Oct 15, 2019

@richardlau
Copy link
Member

And to show it failing on master, here are five from master:

❌= doctool test failed on win2008r2

https://ci.nodejs.org/job/node-test-binary-windows-2/3532/
https://ci.nodejs.org/job/node-test-binary-windows-2/3537/
https://ci.nodejs.org/job/node-test-binary-windows-2/3538/
https://ci.nodejs.org/job/node-test-binary-windows-2/3539/
https://ci.nodejs.org/job/node-test-binary-windows-2/3540/

Given the failure is so prominent on master are we sure it's not a real recent regression? The test has only failed in the two most recent node-daily-master runs (out of the twelve we have history for):
https://ci.nodejs.org/job/node-test-binary-windows-2/3495/
https://ci.nodejs.org/job/node-test-binary-windows-2/3532/

The change LGTM btw but I just want to make sure we're not papering over something that has recently changed to suddenly make this test fail consistently.

@Trott
Copy link
Member Author

Trott commented Oct 15, 2019

And to show it failing on master, here are five from master:
❌= doctool test failed on win2008r2
https://ci.nodejs.org/job/node-test-binary-windows-2/3532/
https://ci.nodejs.org/job/node-test-binary-windows-2/3537/
https://ci.nodejs.org/job/node-test-binary-windows-2/3538/
https://ci.nodejs.org/job/node-test-binary-windows-2/3539/
https://ci.nodejs.org/job/node-test-binary-windows-2/3540/

Given the failure is so prominent on master are we sure it's not a real recent regression? The test has only failed in the two most recent node-daily-master runs (out of the twelve we have history for):
https://ci.nodejs.org/job/node-test-binary-windows-2/3495/
https://ci.nodejs.org/job/node-test-binary-windows-2/3532/

The change LGTM btw but I just want to make sure we're not papering over something that has recently changed to suddenly make this test fail consistently.

Pretty sure it's not a regression. I bisected (by running tests on CI) and the supposed problematic change was 5f80df8 which seems rather unlikely to me given that there's no http in this test.

@Trott
Copy link
Member Author

Trott commented Oct 15, 2019

(One more 👍 to approve fast-tracking over at #29979 (comment) please?)

@Trott
Copy link
Member Author

Trott commented Oct 15, 2019

Pretty sure it's not a regression. I bisected (by running tests on CI) and the supposed problematic change was 5f80df8 which seems rather unlikely to me given that there's no http in this test.

The likely scenarios to me seem to be that either that change (or some other nearby change) caused a timing change on win2008r2 such that it triggered this issue a lot more, or that something changed on the Windows host/configuration itself to make this problem occur far more often. (But yes, I'm speculating.)

@richardlau
Copy link
Member

And to show it failing on master, here are five from master:
❌= doctool test failed on win2008r2
https://ci.nodejs.org/job/node-test-binary-windows-2/3532/
https://ci.nodejs.org/job/node-test-binary-windows-2/3537/
https://ci.nodejs.org/job/node-test-binary-windows-2/3538/
https://ci.nodejs.org/job/node-test-binary-windows-2/3539/
https://ci.nodejs.org/job/node-test-binary-windows-2/3540/

Given the failure is so prominent on master are we sure it's not a real recent regression? The test has only failed in the two most recent node-daily-master runs (out of the twelve we have history for):
https://ci.nodejs.org/job/node-test-binary-windows-2/3495/
https://ci.nodejs.org/job/node-test-binary-windows-2/3532/
The change LGTM btw but I just want to make sure we're not papering over something that has recently changed to suddenly make this test fail consistently.

Pretty sure it's not a regression. I bisected (by running tests on CI) and the supposed problematic change was 5f80df8 which seems rather unlikely to me given that there's no http in this test.

Not in the tests themselves but the doctool itself does a https request (hence #29918).

@Trott Trott added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Oct 15, 2019
@Trott
Copy link
Member Author

Trott commented Oct 15, 2019

I'm going to bed because it's far too late here, but if this gets another fast track approval and someone wants to land it, please do! Bonus points for going to the CI jobs associated with #29976, #29969, and #15735 and using "Rebuild" to start them over again with this change.

@Trott
Copy link
Member Author

Trott commented Oct 15, 2019

Pretty sure it's not a regression. I bisected (by running tests on CI) and the supposed problematic change was 5f80df8 which seems rather unlikely to me given that there's no http in this test.

The likely scenarios to me seem to be that either that change (or some other nearby change) caused a timing change on win2008r2 such that it triggered this issue a lot more, or that something changed on the Windows host/configuration itself to make this problem occur far more often. (But yes, I'm speculating.)

Old PRs are still passing when run on CI, so that suggests that the second suggestion by me isn't correct. The first one seems likely: A code change caused a timing change on win2008r2 such that it triggered this issue a lot more.

const target = path.join(outputDir, `${basename}.json`);
fs.writeFileSync(target, JSON.stringify(content.json, null, 2));
const jsonTarget = path.join(outputDir, `${basename}.json`);
fs.writeFileSync(jsonTarget, JSON.stringify(content.json, null, 2));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These could now be fsPromises.writeFile(...), although I can do that in a follow up if you prefer.

@Trott Trott closed this Oct 15, 2019
@Trott Trott force-pushed the async-all-the-way-down branch from b545682 to f4f856b Compare October 15, 2019 20:23
@Trott Trott merged commit f4f856b into nodejs:master Oct 15, 2019
@Trott
Copy link
Member Author

Trott commented Oct 15, 2019

Landed in f4f856b

targos pushed a commit that referenced this pull request Nov 8, 2019
Doctool tests have been failing a lot in CI on Win2008 R2. It appears
async functions and callback-based functions are being used in
combination such that the callback-based function cannot guarantee that
it will invoke its callback. Convert the callback-based functions to
async functions so we have one paradigm and reliable results.

PR-URL: #29979
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
targos pushed a commit that referenced this pull request Nov 10, 2019
Doctool tests have been failing a lot in CI on Win2008 R2. It appears
async functions and callback-based functions are being used in
combination such that the callback-based function cannot guarantee that
it will invoke its callback. Convert the callback-based functions to
async functions so we have one paradigm and reliable results.

PR-URL: #29979
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
richardlau pushed a commit to richardlau/node-1 that referenced this pull request Apr 6, 2020
Doctool tests have been failing a lot in CI on Win2008 R2. It appears
async functions and callback-based functions are being used in
combination such that the callback-based function cannot guarantee that
it will invoke its callback. Convert the callback-based functions to
async functions so we have one paradigm and reliable results.

Backport-PR-URL: nodejs#32642
PR-URL: nodejs#29979
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
BethGriggs pushed a commit that referenced this pull request Apr 6, 2020
Doctool tests have been failing a lot in CI on Win2008 R2. It appears
async functions and callback-based functions are being used in
combination such that the callback-based function cannot guarantee that
it will invoke its callback. Convert the callback-based functions to
async functions so we have one paradigm and reliable results.

Backport-PR-URL: #32642
PR-URL: #29979
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
@BethGriggs BethGriggs mentioned this pull request Apr 7, 2020
@Trott Trott deleted the async-all-the-way-down branch January 13, 2022 22:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. doc Issues and PRs related to the documentations. fast-track PRs that do not need to wait for 48 hours to land. tools Issues and PRs related to the tools directory.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants