build: fail on CI if leftover processes #11269

Trott · 2017-02-09T18:50:35Z

If any tests leave processes running after testing results are complete, fail the test run.

Dependent on #11246

Checklist

make -j4 test (UNIX), or vcbuild test (Windows) passes
commit message follows commit guidelines

Affected core subsystem(s)

build test

jasnell

rubber-stamp LGTM... tho someone with better Makefile kung-fu should also review

mhdawson · 2017-02-09T22:52:21Z

Makefile

+endif
+clear-stalled:
+	ps awwx | grep Release/node | grep -v grep | cat
+	ps awwx | grep Release/node | grep -v grep | awk '{print $$1}' | $(XARGS) kill


If there are no node processes won't this result in calling kill with no argument which will then print out usage

@mhdawson If it did, the CI run for this would have failed hard!

On AIX and OS X (which have BSD-based xargs), empty stdin is a no-op so kill is not called. On all others in CI (which have GNU-based xargs), we add -r (a few lines above) which causes the same behavior.

(Whoops, CI hasn't run for this. I was thinking of #11246 which contains these identical lines.)

Ok thanks for the clarification. I think I ran the command on linux but without the -r

mhdawson · 2017-02-09T22:56:29Z

Makefile

 	$(PYTHON) tools/test.py $(PARALLEL_ARGS) -p tap --logfile test.tap \
 		--mode=release --flaky-tests=$(FLAKY_TESTS) \
 		$(TEST_CI_ARGS) $(CI_JS_SUITES)
+	! ( ps awwx | grep Release/node | grep -v grep )


Would it be better to print out the info on what's still running here and then cleanup. This will leave them running until the next ci run right ?

@mhdawson Yes, that's probably a better approach. I'll try to learn enough about make to figure out how to do that. :-D

gibfahn · 2017-02-09T23:28:18Z

Took me a while to work out that this "Depends on #11246" because it's a one line change on top of it (of the two commits, the first one is #11246).

Is there a reason to do this as a separate PR?

I still think that if this was done in tools/test.py it'd be more useful, not least for users manually running tests, or to get it included in the tap/junit output.

Trott · 2017-02-10T00:35:17Z

Is there a reason to do this as a separate PR?

Mostly so the other PR containing minimum required functionality can land as soon as possible and prevent spurious CI failures at the earliest possible time.

Trying to put this post-run stuff into that other PR would mean the stuff in that other PR (that is IMO ready-to-go, at least as a stopgap measure) would have to wait until we resolve whether/how this post-run stuff should terminate processes it find, feasibility of putting it in test.py instead, and whatever else comes up.

Trott · 2017-02-10T21:29:12Z

OK, the other PR landed, so this one now has a cleaner and shorter diff and I am ready to discuss colors for the bike shed!!!! :-D

Trott · 2017-02-10T23:06:49Z

On the suggestion that in addition to failing if there are leftover processes, terminate those processes... seems appealing, but as I look at doing it, I'm starting to feel pretty YAGNI about it:

As currently set up, I don't think this buys us anything, since we clear stalled jobs at the start of CI runs now.
But it has a downside which is that it means the processes can't be inspected after the fact to try to figure out what happened.
Every way I'm finding to do it makes the Makefile considerably more complicated or requires an additional external script. Not the end of the world, but just sayin'.

I'm inclined to leave this reporting/failing bit as is and save the terminate-the-processes for a future enhancement if we find we really need it. (Like I said, we lose the ability to inspect the processes if we go that route, so maybe we don't want to do that after all?)

mhdawson

I'm ok with this change going in and termination being handled in a follow on if we agree we want to do it.

gibfahn · 2017-02-14T01:49:16Z

@Trott

As currently set up, I don't think this buys us anything, since we clear stalled jobs at the start of CI runs now.

We run a bunch of other tests on the machines (e.g. citgm tests, node-report tests, manually running individual tests). Clearing up at the start is a good way to make sure our test runs pass properly, but clearing up at the end is IMHO the right thing to do as users of a shared CI.

Every way I'm finding to do it makes the Makefile considerably more complicated or requires an additional external script. Not the end of the world, but just sayin'.

Did you try doing it in tools/test.py? That way we'd be able to be more specific about it. You could also then error out when users were running make test, as I assume most users would rather an error than something that passed (but left processes lying around) and then failed on CI.

But it has a downside which is that it means the processes can't be inspected after the fact to try to figure out what happened.

I've never done anything with a process other than look at it with ps -ef, so I'm not clear about what else people actually do.

However, if you implemented this in tools/test.py, I assume you'd have it on by default, and provide a --ps-check=ignore|failkill (or equivalent) to turn off the process killing. Then you could run with it turned off to get processes.

thefourtheye · 2017-02-14T02:18:18Z

Makefile

@@ -223,13 +223,15 @@ test-ci-js: | clear-stalled
 	$(PYTHON) tools/test.py $(PARALLEL_ARGS) -p tap --logfile test.tap \
 		--mode=release --flaky-tests=$(FLAKY_TESTS) \
 		$(TEST_CI_ARGS) $(CI_JS_SUITES)
+	! ( ps awwx | grep Release/node | grep -v grep )


Just a question. Are all these flags portable?

I'm pretty sure they're all used in #11246, so if they aren't we should find out pretty quickly 😅

Yes, they're all portable, or at least sufficiently portable that they work on all the non-Windows CI machines.

If any tests leave processes running after testing results are complete, fail the test run.

Trott · 2017-02-21T23:22:28Z

This will now both fail if there are leftover processes and try to clean them up. PTAL

(If anyone wants to move it into test.py, be my guest! I'd be happy with that as a subsequent PR, though.)

Trott · 2017-02-22T14:59:20Z

CI: https://ci.nodejs.org/job/node-test-pull-request/6546/

Trott · 2017-02-22T18:31:30Z

@nodejs/build

Trott · 2017-02-23T19:47:39Z

Landed in 189b49a

If any tests leave processes running after testing results are complete, fail the test run. PR-URL: nodejs#11269 Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com> Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com> Reviewed-By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com> Reviewed-By: Johan Bergström <bugs@bergstroem.nu>

If any tests leave processes running after testing results are complete, fail the test run. PR-URL: #11269 Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com> Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com> Reviewed-By: Sakthipriyan Vairamani <thechargingvolcano@gmail.com> Reviewed-By: Johan Bergström <bugs@bergstroem.nu>

Trott added build Issues and PRs related to build files or the CI. test Issues and PRs related to the tests. labels Feb 9, 2017

nodejs-github-bot added the build Issues and PRs related to build files or the CI. label Feb 9, 2017

jasnell approved these changes Feb 9, 2017

View reviewed changes

mhdawson reviewed Feb 9, 2017

View reviewed changes

Trott force-pushed the fail-if-stalled branch from 48693fc to c5051e4 Compare February 10, 2017 21:28

Trott force-pushed the fail-if-stalled branch from c5051e4 to 5589bd8 Compare February 10, 2017 22:19

santigimeno approved these changes Feb 13, 2017

View reviewed changes

mhdawson approved these changes Feb 13, 2017

View reviewed changes

thefourtheye reviewed Feb 14, 2017

View reviewed changes

thefourtheye approved these changes Feb 14, 2017

View reviewed changes

build: fail on CI if leftover processes

db5c314

If any tests leave processes running after testing results are complete, fail the test run.

Trott force-pushed the fail-if-stalled branch from 5589bd8 to db5c314 Compare February 21, 2017 23:21

jbergstroem approved these changes Feb 22, 2017

View reviewed changes

Trott closed this Feb 23, 2017

italoacasas mentioned this pull request Feb 25, 2017

7.7.0 Proposal #11553

Merged

MylesBorins mentioned this pull request Mar 9, 2017

v6.10.1 proposal #11759

Merged

MylesBorins mentioned this pull request Mar 9, 2017

v4.8.1 proposal #11760

Merged

Trott deleted the fail-if-stalled branch January 13, 2022 22:34

Uh oh!

build: fail on CI if leftover processes #11269

build: fail on CI if leftover processes #11269

Uh oh!

Conversation

Trott commented Feb 9, 2017

Checklist

Affected core subsystem(s)

Uh oh!

jasnell left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhdawson Feb 13, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gibfahn commented Feb 9, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Trott commented Feb 10, 2017

Uh oh!

Trott commented Feb 10, 2017

Uh oh!

Trott commented Feb 10, 2017

Uh oh!

mhdawson left a comment

Choose a reason for hiding this comment

Uh oh!

gibfahn commented Feb 14, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Trott commented Feb 21, 2017

Uh oh!

Trott commented Feb 22, 2017

Uh oh!

Trott commented Feb 22, 2017

Uh oh!

Trott commented Feb 23, 2017

Uh oh!

Uh oh!

mhdawson Feb 13, 2017 •

edited

Loading

gibfahn commented Feb 9, 2017 •

edited

Loading