Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

child process workers now attempt to terminate themselves via message #197

Merged
merged 2 commits into from
Oct 28, 2020

Conversation

boneskull
Copy link
Contributor

This causes workers run via child_process.fork() to respond to a special message by exiting.

On Windows especially, a subprocess killed via SIGINT cannot "clean up" after itself, and must exit immediately. This has many implications, but importantly, it can cause nyc to fail to report code coverage on worker scripts. A workaround is to listen for a "terminate" message from the parent, perform cleanup (if necessary), and call process.exit() itself. nyc will then be successful in determining the coverage.

Since a child process worker may be very busy (e.g., in an infinite loop) and unable to handle a message from its parent, workerpool now has a hardcoded timeout of 1000 ms; if the child does not shut down cleanly within this window, workerpool will ask the OS to signal a shutdown via process.kill() (which was the old behavior).

Short of adding nyc and asserting it calculates coverage, the current test suite does a reasonable job of avoiding regressions here.


notes:

I've found that the undesirable behavior (failure to collect coverage from a forked process) only sometimes occurs on macOS (probably system-state-dependent), but it almost always occurs on Windows. Spawned processes do not seem to exhibit this behavior, but that is not helpful for workerpool.

Ref: mochajs/mocha#4372, istanbuljs/nyc#762

@josdejong
Copy link
Owner

wow, that is more complicated than I had expected. Thanks for working out this fix, looks good to me! I'm not sure how I could verify whether the fix works in practice but I trust you in that. I'll merge right after you've fixed the accidental .only in the unit test.

This causes workers run via `child_process.fork()` to respond to a special message by exiting.

On Windows especially, a subprocess killed via `SIGINT` cannot "clean up" after itself, and must exit immediately.  This has many implications, but importantly, it can cause `nyc` to fail to report code coverage on worker scripts.  A workaround is to listen for a "terminate" message from the parent, perform cleanup (if necessary), and call `process.exit()` itself.  `nyc` will then be successful in determining the coverage.

Since a child process worker may be very busy (e.g., in an infinite loop) and unable to _handle_ a message from its parent, `workerpool` now has a hardcoded timeout of `1000` ms; if the child does not shut down cleanly within this window, `workerpool` will ask the OS to signal a shutdown via `process.kill()` (which was the old behavior).

Short of adding `nyc` and asserting it calculates coverage, the current test suite does a reasonable job of avoiding regressions here.
@boneskull
Copy link
Contributor Author

I've removed the .only and added a .mocharc.js which will enable the forbid-only flag when running in CI (when the environment variable CI is present)

@boneskull
Copy link
Contributor Author

Let me run another test to double-check; I'll paste some screenshots

@boneskull
Copy link
Contributor Author

boneskull commented Oct 21, 2020

I have a minimal reproduction repo which I'm using to test the interaction between nyc and various subprocess start/stop strategies. Since I learned of this issue via mochajs/mocha#4372, I found that the root cause was how workerpool terminates its child processes. To be fair, this really should work properly with signals, but alas...

In my test repo, I have this test:

// test/foo-workerpool.spec.js
const workerpool = require("workerpool");
const assert = require("assert");

(async function () {
  const pool = workerpool.pool(require.resolve("../src/foo-workerpool"), {
    workerType: "process",
  });

  const result = await pool.exec("foo");

  assert.strictEqual(result, "foo");

  console.error("done");

  return pool.terminate();
})();

and the source of the worker script:

// src/foo-workerpool.js
const workerpool = require("workerpool");

workerpool.worker({
  foo() {
    console.error('returning "foo"')
    return "foo";
  },
});

When running the test against workerpool@6.0.2, we see the problem:

image

Now, trying again with my changes installed from my working copy:

image

Anyway, I think this is ready for your review again.

@boneskull
Copy link
Contributor Author

I think it may be helpful to add something that notifies the process has been forcibly shut down. a consumer may or may not care, but in my case, it could cause flaky coverage reporting... and/or a configurable timeout

@josdejong
Copy link
Owner

Thanks Christopher, I'm really glad you where able to track down this tricky issue 👍

I think it may be helpful to add something that notifies the process has been forcibly shut down. a consumer may or may not care, but in my case, it could cause flaky coverage reporting... and/or a configurable timeout

That is a good idea! Maybe we can do that behind a debug flag or something like that.

I'll now merge and publish your fix. And sorry for the delay.

@josdejong
Copy link
Owner

I created an issue in order not to forget about your idea: #201

@josdejong
Copy link
Owner

Fix is published now in v6.0.3. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants