Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf_hooks: add ReqWrap latency monitoring #26556

Closed
wants to merge 1 commit into from

Conversation

jasnell
Copy link
Member

@jasnell jasnell commented Mar 10, 2019

Similar to the recently added event loop relay, this adds a perf_hooks.monitorRequestWrapLatency() API that tracks the length of time between dispatching a ReqWrap and when the callback is invoked. This effectively tracks how long libuv is taking to perform requested actions. No differentiation is made between types of actions. The idea here is to provide a pulse on the general health of the application. If the histogram skews high, then it's taking libuv a longer time to process individual requests.

const { monitorRequestWrapLatency } = require('perf_hooks');
const m = monitorRequestWrapLatency();
m.enable();
// Do stuff
m.disable();

console.log(m.min);
console.log(m.max);
console.log(m.mean);
console.log(m.stddev);
console.log(m.percentile(50));
console.log(m.percentile(99));
console.log(m.percentiles);  // Map

Also outputs histogram data to the trace event log.

/cc @mcollina @mafintosh @nodejs/diagnostics

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines

@jasnell jasnell added the semver-minor PRs that contain new features and should be released in the next minor version. label Mar 10, 2019
@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. labels Mar 10, 2019
@jasnell jasnell added the perf_hooks Issues and PRs related to the implementation of the Performance Timing API. label Mar 10, 2019
Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

doc/api/perf_hooks.md Outdated Show resolved Hide resolved
src/node_perf.cc Outdated Show resolved Hide resolved
@BridgeAR
Copy link
Member

@jasnell
Copy link
Member Author

jasnell commented Mar 10, 2019

Heh. Looks like ci was a bust ;) will review those today

doc/api/tracing.md Show resolved Hide resolved
doc/api/perf_hooks.md Outdated Show resolved Hide resolved
doc/api/perf_hooks.md Outdated Show resolved Hide resolved
doc/api/perf_hooks.md Show resolved Hide resolved
@vsemozhetbyt
Copy link
Contributor

Is the omission of resolution option intended for this method. in other respects similar to the perf_hooks.monitorEventLoopDelay([options]) method?

@vsemozhetbyt vsemozhetbyt added performance Issues and PRs related to the performance of Node.js. trace_events Issues and PRs related to V8, Node.js core, and userspace code trace events. labels Mar 10, 2019
@jasnell
Copy link
Member Author

jasnell commented Mar 10, 2019

Yes, there is no resolution because this monitor is not timer or sampling based. Once enabled, it's on for all ReqWrap instances

src/node_perf.cc Outdated Show resolved Hide resolved
@Flarna
Copy link
Member

Flarna commented Mar 11, 2019

I think it would be nice to include the number of measurements done into the resulting histogram.

lib/perf_hooks.js Outdated Show resolved Hide resolved
doc/api/perf_hooks.md Outdated Show resolved Hide resolved
doc/api/perf_hooks.md Outdated Show resolved Hide resolved
doc/api/perf_hooks.md Outdated Show resolved Hide resolved
@jasnell
Copy link
Member Author

jasnell commented Mar 11, 2019

@jasnell
Copy link
Member Author

jasnell commented Mar 11, 2019

@jasnell
Copy link
Member Author

jasnell commented Mar 12, 2019

@Flarna @BridgeAR ... ok, please take a look. I update the API to accept a list of AsyncWrap provider types that the histogram will keep track of. Multiple monitors can be created and enabled independently of one another to watch for different combinations of things.

src/node_perf.h Outdated Show resolved Hide resolved
src/node_perf.cc Outdated Show resolved Hide resolved
if (*TRACE_EVENT_API_GET_CATEGORY_GROUP_ENABLED(
TRACING_CATEGORY_NODE2(perf, reqwrap)) != 0) {
auto value = tracing::TracedValue::Create();
DumpToTracedValue(value.get());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that it is an overkill to trace the whole Histogram on every call to Record.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can tweak this later if necessary. How often would be acceptable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually I would expect that traces act on raw data therefore trace type and delta on every call of Record.
I would not expect a trace holding the whole histogram in the same category as it is redundant information.

But as you said on summit the requirements what should be traced are missing in general so feel free to keep this as it is and adapt once there is a more clear picture in this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could tweak this a bit to be sensitive to detail level... e.g. output type and delta for category node.perf.reqwrap and the full histogram on node.perf.reqwrap.detail or something similar, but for now, I'd like to leave it as is.

@Flarna
Copy link
Member

Flarna commented Mar 12, 2019

@jasnell Thanks! I like this solution.

The only drawback I see in adding filters is that the names of the async providers get to the public API surface even they are more an implementation detail. This is also the case in async_hooks and caused some discussions there in the past.
Not sure if this is a problem or can be handled in a proper way in docs.

@jasnell
Copy link
Member Author

jasnell commented Mar 12, 2019

The only other way I could think of doing this would be to group by types of activity e.g. monitorRequestWrapLatency({ fs: true, net: true }) but we would have to be able to reasonably group the various types. There's some existing overlap there so I'm not sure if it's feasible. The only reason I went with the current approach is because we already make those names public in the async_hooks API.

@jasnell
Copy link
Member Author

jasnell commented Mar 12, 2019

Adds a histogram that tracks ReqWrap latency (defined as the
length of time it takes from when the reqwrap is dispatched
to when the callback is invoked)
@jasnell
Copy link
Member Author

jasnell commented Mar 13, 2019

CI is good.

@jasnell jasnell added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Mar 13, 2019
@jasnell
Copy link
Member Author

jasnell commented Mar 13, 2019

@Flarna @addaleax @BridgeAR ... can I ask you to take another look... should be ready to go

inline void Histogram::GetMin(
Histogram* histogram,
const FunctionCallbackInfo<Value>& args) {
double value = static_cast<double>(histogram->Min());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Min() returns int_max which results in min: 9223372036854776000 if no measurement was done (e.g. after reset()) which looks not that nice. Maybe set min/max to NAN in case count is 0?

@BridgeAR
Copy link
Member

BridgeAR commented Apr 4, 2019

This needs a rebase.

@BridgeAR BridgeAR removed the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Apr 4, 2019
@jasnell
Copy link
Member Author

jasnell commented Apr 4, 2019

Yep, I'll be getting back to this next week.

@jasnell
Copy link
Member Author

jasnell commented Oct 16, 2019

Sigh... looks like it took longer than a week to get back to this ;-) ... will be updating it soon

@fhinkel
Copy link
Member

fhinkel commented Nov 3, 2019

Should this remain open?

@jasnell
Copy link
Member Author

jasnell commented Nov 4, 2019 via email

@BridgeAR
Copy link
Member

Ping @jasnell

@Qard
Copy link
Member

Qard commented Dec 22, 2019

Took a quick pass at rebasing this but it seems to conflict with #29317, which makes ELDHistogram a HandleWrap now. I'm not sure how to resolve that without undoing that change as HistogramBase and RWLHistogram don't have the timer and therefore don't make sense as HandleWraps. However, HistogramBase extends from BaseObject, which means the ELDHistogram subclass couldn't independently extend from HandleWrap without introducing a diamond-of-death problem. 🤔

@jasnell
Copy link
Member Author

jasnell commented Dec 22, 2019

I'll be picking this back up in the next couple of weeks.

@jasnell
Copy link
Member Author

jasnell commented Feb 3, 2020

Going to go ahead and close this. I want to return to this but do not have time at present

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. perf_hooks Issues and PRs related to the implementation of the Performance Timing API. performance Issues and PRs related to the performance of Node.js. semver-minor PRs that contain new features and should be released in the next minor version. trace_events Issues and PRs related to V8, Node.js core, and userspace code trace events.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants