child process workers now attempt to terminate themselves via message #197

boneskull · 2020-10-21T00:45:57Z

This causes workers run via child_process.fork() to respond to a special message by exiting.

On Windows especially, a subprocess killed via SIGINT cannot "clean up" after itself, and must exit immediately. This has many implications, but importantly, it can cause nyc to fail to report code coverage on worker scripts. A workaround is to listen for a "terminate" message from the parent, perform cleanup (if necessary), and call process.exit() itself. nyc will then be successful in determining the coverage.

Since a child process worker may be very busy (e.g., in an infinite loop) and unable to handle a message from its parent, workerpool now has a hardcoded timeout of 1000 ms; if the child does not shut down cleanly within this window, workerpool will ask the OS to signal a shutdown via process.kill() (which was the old behavior).

Short of adding nyc and asserting it calculates coverage, the current test suite does a reasonable job of avoiding regressions here.

notes:

I've found that the undesirable behavior (failure to collect coverage from a forked process) only sometimes occurs on macOS (probably system-state-dependent), but it almost always occurs on Windows. Spawned processes do not seem to exhibit this behavior, but that is not helpful for workerpool.

Ref: mochajs/mocha#4372, istanbuljs/nyc#762

test/WorkerHandler.test.js

josdejong · 2020-10-21T10:05:07Z

wow, that is more complicated than I had expected. Thanks for working out this fix, looks good to me! I'm not sure how I could verify whether the fix works in practice but I trust you in that. I'll merge right after you've fixed the accidental .only in the unit test.

This causes workers run via `child_process.fork()` to respond to a special message by exiting. On Windows especially, a subprocess killed via `SIGINT` cannot "clean up" after itself, and must exit immediately. This has many implications, but importantly, it can cause `nyc` to fail to report code coverage on worker scripts. A workaround is to listen for a "terminate" message from the parent, perform cleanup (if necessary), and call `process.exit()` itself. `nyc` will then be successful in determining the coverage. Since a child process worker may be very busy (e.g., in an infinite loop) and unable to _handle_ a message from its parent, `workerpool` now has a hardcoded timeout of `1000` ms; if the child does not shut down cleanly within this window, `workerpool` will ask the OS to signal a shutdown via `process.kill()` (which was the old behavior). Short of adding `nyc` and asserting it calculates coverage, the current test suite does a reasonable job of avoiding regressions here.

boneskull · 2020-10-21T20:51:37Z

I've removed the .only and added a .mocharc.js which will enable the forbid-only flag when running in CI (when the environment variable CI is present)

boneskull · 2020-10-21T20:52:24Z

Let me run another test to double-check; I'll paste some screenshots

boneskull · 2020-10-21T21:13:38Z

I have a minimal reproduction repo which I'm using to test the interaction between nyc and various subprocess start/stop strategies. Since I learned of this issue via mochajs/mocha#4372, I found that the root cause was how workerpool terminates its child processes. To be fair, this really should work properly with signals, but alas...

In my test repo, I have this test:

// test/foo-workerpool.spec.js
const workerpool = require("workerpool");
const assert = require("assert");

(async function () {
  const pool = workerpool.pool(require.resolve("../src/foo-workerpool"), {
    workerType: "process",
  });

  const result = await pool.exec("foo");

  assert.strictEqual(result, "foo");

  console.error("done");

  return pool.terminate();
})();

and the source of the worker script:

// src/foo-workerpool.js
const workerpool = require("workerpool");

workerpool.worker({
  foo() {
    console.error('returning "foo"')
    return "foo";
  },
});

When running the test against workerpool@6.0.2, we see the problem:

Now, trying again with my changes installed from my working copy:

Anyway, I think this is ready for your review again.

boneskull · 2020-10-23T19:10:31Z

I think it may be helpful to add something that notifies the process has been forcibly shut down. a consumer may or may not care, but in my case, it could cause flaky coverage reporting... and/or a configurable timeout

josdejong · 2020-10-28T10:19:58Z

Thanks Christopher, I'm really glad you where able to track down this tricky issue 👍

I think it may be helpful to add something that notifies the process has been forcibly shut down. a consumer may or may not care, but in my case, it could cause flaky coverage reporting... and/or a configurable timeout

That is a good idea! Maybe we can do that behind a debug flag or something like that.

I'll now merge and publish your fix. And sorry for the delay.

josdejong · 2020-10-28T10:22:22Z

I created an issue in order not to forget about your idea: #201

josdejong · 2020-10-28T10:45:32Z

Fix is published now in v6.0.3. Thanks again!

josdejong reviewed Oct 21, 2020

View reviewed changes

test/WorkerHandler.test.js Outdated Show resolved Hide resolved

boneskull added 2 commits October 21, 2020 13:50

add .mocharc.js to forbid usage of ".only" in CI

c76b530

boneskull force-pushed the terminate-for-coverage branch from 42d488d to c76b530 Compare October 21, 2020 20:50

josdejong merged commit 1239ed0 into josdejong:master Oct 28, 2020

josdejong mentioned this pull request Oct 28, 2020

Implement a debug flag to give more insight in crashing workers #201

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

child process workers now attempt to terminate themselves via message #197

child process workers now attempt to terminate themselves via message #197

boneskull commented Oct 21, 2020

josdejong commented Oct 21, 2020

boneskull commented Oct 21, 2020

boneskull commented Oct 21, 2020

boneskull commented Oct 21, 2020 •

edited

Loading

boneskull commented Oct 23, 2020

josdejong commented Oct 28, 2020

josdejong commented Oct 28, 2020

josdejong commented Oct 28, 2020

child process workers now attempt to terminate themselves via message #197

child process workers now attempt to terminate themselves via message #197

Conversation

boneskull commented Oct 21, 2020

josdejong commented Oct 21, 2020

boneskull commented Oct 21, 2020

boneskull commented Oct 21, 2020

boneskull commented Oct 21, 2020 • edited Loading

boneskull commented Oct 23, 2020

josdejong commented Oct 28, 2020

josdejong commented Oct 28, 2020

josdejong commented Oct 28, 2020

boneskull commented Oct 21, 2020 •

edited

Loading