Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf_hooks: add event loop delay sampler #25378

Closed
wants to merge 2 commits into from

Conversation

jasnell
Copy link
Member

@jasnell jasnell commented Jan 7, 2019

Inspired by (and code borrowed from) https://github.com/mafintosh/event-loop-delay

/cc @mafintosh @mcollina

Adds a simple event loop delay sampler to perf_hooks.

See included test for example use.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. perf_hooks Issues and PRs related to the implementation of the Performance Timing API. labels Jan 7, 2019
@jasnell
Copy link
Member Author

jasnell commented Jan 7, 2019

doc/api/perf_hooks.md Outdated Show resolved Hide resolved
doc/api/perf_hooks.md Outdated Show resolved Hide resolved
@Trott Trott added semver-minor PRs that contain new features and should be released in the next minor version. and removed semver-minor PRs that contain new features and should be released in the next minor version. labels Jan 7, 2019
lib/perf_hooks.js Outdated Show resolved Hide resolved
lib/perf_hooks.js Outdated Show resolved Hide resolved
src/node_perf.cc Outdated Show resolved Hide resolved
src/node_perf.cc Outdated Show resolved Hide resolved
src/node_perf.cc Outdated Show resolved Hide resolved
src/node_perf.cc Outdated Show resolved Hide resolved
src/node_perf.cc Outdated Show resolved Hide resolved
src/node_perf.h Outdated Show resolved Hide resolved
Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@addaleax addaleax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(just requesting changes to make sure this doesn’t land as-is)

doc/api/perf_hooks.md Outdated Show resolved Hide resolved
@sam-github
Copy link
Contributor

I could use some convincing that this time-based sampling approach is useful, or that I'm missing the point of the code.

Loop delays tend to be irregular, and to only block a single "turn" of the loop, don't they? Won't this approach miss those delays?

It seems to be much more useful to measure every loop, and track max and min (and possibly average) over some time interval.

addaleax
addaleax previously approved these changes Jan 8, 2019
src/node_perf.cc Outdated Show resolved Hide resolved
src/node_perf.cc Outdated Show resolved Hide resolved
@jasnell
Copy link
Member Author

jasnell commented Jan 8, 2019

@sam-github ... @mafintosh can likely better describe the rationale behind this particular algorithm (given that he's the one that wrote it :-) ...) ... For the most part, however, this is giving the accumulated delay if the length of time between iterations exceeds the threshold.

@jasnell
Copy link
Member Author

jasnell commented Jan 8, 2019

I've added some documentation. Essentially, this measures the accumulated delay if the previous turn of the event loop takes longer than a threshold determined by the resolution.

@sam-github ... what you're suggesting is a different way of measuring that we can also do relatively easily.

doc/api/perf_hooks.md Outdated Show resolved Hide resolved
doc/api/perf_hooks.md Outdated Show resolved Hide resolved
@jasnell
Copy link
Member Author

jasnell commented Jan 8, 2019

Attempted to clarify a different way

@jasnell
Copy link
Member Author

jasnell commented Jan 9, 2019

doc/api/perf_hooks.md Outdated Show resolved Hide resolved
doc/api/perf_hooks.md Outdated Show resolved Hide resolved
doc/api/perf_hooks.md Outdated Show resolved Hide resolved

The accumulated event loop delay (`delay`) is determined by:

* Calculating a `threshold` in milliseconds as `5e6 + (resolution * 1e6)`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If my unit conversions are right, isn't this an obscure way of saying the threshold is 1.5 times the resolution?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

@sam-github
Copy link
Contributor

Essentially, this measures the accumulated delay

Does it, though? The docs say it samples at resolution milliseconds, so it only measures periodically, making it easy to miss, for example, occaisonal calls to sync APIs.

Its not at all clear to me why the threshold is scaled from the sampling period, or why the sampling period is called resolution.

@mafintosh -- comments?

@jasnell
Copy link
Member Author

jasnell commented Jan 9, 2019

Does it, though? The docs say it samples at resolution milliseconds, so it only measures periodically, making it easy to miss, for example, occaisonal calls to sync APIs.

Well, yes, using a sampling approach means the data will miss some things but it also allows it to work with significantly less overhead. We certainly could also do the min/max/average calculation on every event loop tick but at a definitely higher overhead cost.

If you think that it's worthwhile, I can implement both approaches configurable via the options.

// Sampling-method
performance.startEventLoopTimer({ resolution: 10 })

// Per-event-tick method
performance.startEventLoopTimer({ resolution: 0 })

We can collect the min, max, rolling average in addition to the accumulated delay.

@addaleax
Copy link
Member

@nodejs/diagnostics Any thoughts on this?

@devsnek
Copy link
Member

devsnek commented Jan 28, 2019

why is this on the performance object instead of the perf_hooks export?

@gireeshpunathil
Copy link
Member

I guess the most common production workloads will only be interested to see a pattern (average delay over time), its steady state values, and outliers if any; not necessarily a high precision data that is accurate at the loop iteration level. So sampling method that strikes a balance between useful information and performance efficiency looks good to me.

On the contrary, having so many options to configure may lead to:

  • lot more usability issues, mis-configurations and questions
  • unwanted comparison at micro level only to spend time on debugging false positives

In short, I am good with the changes as is.

Add a sampling-based event loop delay monitor.

```js
const { monitorEventLoopDelay } = require('perf_hooks');

const h = monitorEventLoopDelay();

h.enable();

h.disable();

console.log(h.percentiles);
console.log(h.min);
console.log(h.max);
console.log(h.mean);
console.log(h.stddev);
console.log(h.percentile(50));
```
@jasnell
Copy link
Member Author

jasnell commented Feb 7, 2019

Turns out there's a patch @mcollina didn't tell me about ;-D and since he's hanging out in Hawaii right now I gotta give him a hard time about it :-)

New CI: https://ci.nodejs.org/job/node-test-pull-request/20631/

@Flarna
Copy link
Member

Flarna commented Feb 7, 2019

@jasnell I wonder a little bit about your test results as it seems that 1000ms tasks in the event loop would be not detected as that. Most likely caused by my test script which didn't allow to set busy time on command line, just "test modes". Updated script and retested also in other environments.

If event loop is busy with tasks longer then the 10ms sample interval it's stable detected in all setups. But measurement of an idle node.js app results in worse results compared to a node.js app busy with 10ms tasks even on the physical linux box.

Which size of "detectable delays" had you in mind here? Maybe the docs should include some hints on the reachable resolution?
I think some testing on other platforms should be also done to check if there are other limitations.
Or is the sample I use to simple and results in misleading interpretations from my side?

The VM on my private notebook is for sure not a relevant production setup (maybe delays are caused by CPU sleep states,...) so I think these results are not that important.

Linux VM on my old private notebook

start NaN
h.min=10.027008ms
h.max=161.349631ms
h.mean=13.187902367634125ms
h.stddev=5.629075695044025ms
start NaN
h.min=10.018816ms
h.max=32.292863ms
h.mean=11.474917361897475ms
h.stddev=2.308021168909449ms
start 0
h.min=9.232384ms
h.max=56.688639ms
h.mean=10.035087593038822ms
h.stddev=0.9816034647940286ms
start 0
h.min=9.33888ms
h.max=15.638527ms
h.mean=10.002511242161441ms
h.stddev=0.1461050645134329ms
start 10
h.min=10.010624ms
h.max=31.686655ms
h.mean=10.068520733131923ms
h.stddev=0.6172102376239719ms
start 10
h.min=10.002432ms
h.max=25.968639ms
h.mean=10.043363193569993ms
h.stddev=0.42167699874933723ms
start 100
h.min=100.007936ms
h.max=111.935487ms
h.mean=100.13551274666666ms
h.stddev=0.9452201842299545ms
start 100
h.min=100.007936ms
h.max=112.394239ms
h.mean=100.11191978666666ms
h.stddev=0.8513607073454165ms

Linux VM on my work PC

start NaN
h.min=10.010624ms
h.max=18.563071ms
h.mean=11.065823704905938ms
h.stddev=0.7107936502571325ms
start NaN
h.min=10.018816ms
h.max=15.474687ms
h.mean=11.278423034223392ms
h.stddev=0.759150300218026ms
start 0
h.min=9.0112ms
h.max=19.365887ms
h.mean=10.017919914495659ms
h.stddev=0.36377668699754595ms
start 0
h.min=9.125888ms
h.max=20.742143ms
h.mean=10.02488607486631ms
h.stddev=0.4253760072540978ms
start 10
h.min=10.010624ms
h.max=18.874367ms
h.mean=10.07886864516129ms
h.stddev=0.4418468141594953ms
start 10
h.min=10.010624ms
h.max=21.381119ms
h.mean=10.092213232839837ms
h.stddev=0.5521641804813799ms
start 100
h.min=100.007936ms
h.max=104.660991ms
h.mean=100.09597269333332ms
h.stddev=0.43897114293681383ms
start 100
h.min=100.007936ms
h.max=111.083519ms
h.mean=100.12764842666667ms
h.stddev=0.7444034454799928ms

Native Linux on a collegues PC

start NaN
h.min=10.027008ms
h.max=13.008895ms
h.mean=10.126542735989196ms
h.stddev=0.07465781045266873ms
start NaN
h.min=10.0352ms
h.max=19.038207ms
h.mean=10.136660093274756ms
h.stddev=0.17323677977705812ms
start 0
h.min=9.289728ms
h.max=10.092543ms
h.mean=10.000237177725909ms
h.stddev=0.015177624171784065ms
start 0
h.min=9.29792ms
h.max=10.166271ms
h.mean=9.999846562187397ms
h.stddev=0.016481384728421315ms
start 10
h.min=10.002432ms
h.max=12.623871ms
h.mean=10.012928427378965ms
h.stddev=0.08814869597128885ms
start 10
h.min=10.002432ms
h.max=12.967935ms
h.mean=10.012034905206942ms
h.stddev=0.08011965038652569ms
start 100
h.min=100.007936ms
h.max=103.350271ms
h.mean=100.05949098666666ms
h.stddev=0.228135157621938ms
start 100
h.min=100.007936ms
h.max=102.760447ms
h.mean=100.05796181333334ms
h.stddev=0.20355572563408403ms

Updated Script

const { monitorEventLoopDelay } = require('perf_hooks');

let delay = +process.argv[2];
const duration = 1000 * 30;
const ns2ms = 1000 * 1000;

function busy() {
  if (delay > 0) {
    const now = process.hrtime();
    let d;
    do {
      d = process.hrtime(now);
    } while (d[1] / ns2ms < delay);
  }
  setImmediate(busy);
}

const h = monitorEventLoopDelay({ resolution: 10 });

function endMeasurement() {
  h.disable();
  console.log(`h.min=${h.min / ns2ms}ms`);
  console.log(`h.max=${h.max / ns2ms}ms`);
  console.log(`h.mean=${h.mean / ns2ms}ms`);
  console.log(`h.stddev=${h.stddev / ns2ms}ms`);
  process.exit();
}

function start() {
  console.log(`start ${delay}`);
  h.enable();
  if (delay >= 0) {
    busy();
  }
}

start();
setTimeout(endMeasurement, duration);

@jasnell
Copy link
Member Author

jasnell commented Feb 7, 2019

Resume CI due to unrelated failure: https://ci.nodejs.org/job/node-test-pull-request/20637/

@jasnell
Copy link
Member Author

jasnell commented Feb 7, 2019

@nodejs/build ... any ideas here?

10:26:58 Makefile:449: recipe for target 'clear-stalled' failed
10:26:58 make[1]: *** [clear-stalled] Error 123
10:26:58 Makefile:533: recipe for target 'run-ci' failed
10:26:58 make: *** [run-ci] Error 2
10:26:59 Build step 'Conditional steps (multiple)' marked build as failure
10:26:59 Performing Post build task...
10:26:59 Match found for : : True
10:26:59 Logical operation result is TRUE
10:26:59 Running script  : #/bin/bash

https://ci.nodejs.org/job/node-test-commit-arm/21996/nodes=ubuntu1604-arm64/console

@sam-github
Copy link
Contributor

Like @mcollina, #25378 (comment), I agree that loop health is critical for production monitoring.

@jasnell There hasn't been any attempt at convincing me that sampling has been seen effective in the wild, see #25378 (comment) (no comment from mafintosh), and this question is open, #25378 (comment)

I've found @bnoordhuis 's approach of measuring every loop, but only recording min/max/avg to be low overhead (its a timer/check, a comparison, and some math per loop, the overhead is dwarfed by the actual code that runs per loop, and most loops are not going to exceed the min/max anyhow), and very useful.

But, if I'm such a big fan of an alternative approach, I should submit my own code to do it :-). Its you who are doing the work here, and any loop health stats are better than none. This approach doesn't prevent someone trying another approach in the future, so I've no objection.

@addaleax
Copy link
Member

addaleax commented Feb 8, 2019

@jasnell jasnell added the semver-minor PRs that contain new features and should be released in the next minor version. label Feb 8, 2019
jasnell added a commit that referenced this pull request Feb 8, 2019
Add a sampling-based event loop delay monitor.

```js
const { monitorEventLoopDelay } = require('perf_hooks');

const h = monitorEventLoopDelay();

h.enable();

h.disable();

console.log(h.percentiles);
console.log(h.min);
console.log(h.max);
console.log(h.mean);
console.log(h.stddev);
console.log(h.percentile(50));
```

PR-URL: #25378
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Stephen Belanger <admin@stephenbelanger.com>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
jasnell added a commit that referenced this pull request Feb 8, 2019
From:
mcollina/native-hdr-histogram@c63e97151dcff9b9aed1d8ea5e4f5964c69be32fideps:

PR-URL: #25378
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Stephen Belanger <admin@stephenbelanger.com>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
@jasnell
Copy link
Member Author

jasnell commented Feb 8, 2019

Landed in bcdd228 and d999b55

@jasnell jasnell closed this Feb 8, 2019
addaleax pushed a commit that referenced this pull request Feb 8, 2019
Add a sampling-based event loop delay monitor.

```js
const { monitorEventLoopDelay } = require('perf_hooks');

const h = monitorEventLoopDelay();

h.enable();

h.disable();

console.log(h.percentiles);
console.log(h.min);
console.log(h.max);
console.log(h.mean);
console.log(h.stddev);
console.log(h.percentile(50));
```

PR-URL: #25378
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Stephen Belanger <admin@stephenbelanger.com>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
addaleax pushed a commit that referenced this pull request Feb 8, 2019
From:
mcollina/native-hdr-histogram@c63e97151dcff9b9aed1d8ea5e4f5964c69be32fideps:

PR-URL: #25378
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Stephen Belanger <admin@stephenbelanger.com>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
@targos targos mentioned this pull request Feb 14, 2019
@BethGriggs
Copy link
Member

@jasnell, Should this land in v10.x? Please add the lts-watch label if so

@BethGriggs
Copy link
Member

Could you please raise a backport PR? This change doesn't land cleanly on v10.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. perf_hooks Issues and PRs related to the implementation of the Performance Timing API. semver-minor PRs that contain new features and should be released in the next minor version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.