Make --perf-basic-prof toggleable #150

mike-kaufman · 2018-02-12T22:55:56Z

Goal is to make to enable --perf-basic-prof dynamically at runtime. Need to resolve fact that this is not a fully supported v8 feature.

mmarchini · 2018-02-13T02:24:51Z

Current implementation is available here if anyone is interested: nodejs/node@v6.12.3...mmarchini:v6.12.3-perf-basic-prof-toggling

mmarchini · 2018-02-13T16:55:08Z

Talking to @hashseed during the break he suggested that we could expose a public API to capture code events, which would allow userland modules to deal with those code events as necessary (for example, we could have a userland module capable of generating the same /tmp/perf-XXX.map generated by --perf-basic-prof today). This way we have a platform-agnostic approach to the current issue. Having a public API would also solve the runtime toggling problem as well.

Need to resolve fact that this is not a fully supported v8 feature.

--perf-basic-prof doesn't support Interpreted Frames because the call stack with Interpreted Frames looks like this:

The suggested approach so far is to (somehow) add a "fake" frame to the stack which points to a JIT-function with the purpose of keeping track of those interpreted function calls. This would essentially restore the previous functionality to Linux perf and any other stack recorder (DTrace, eBPF profile, etc.), so it's also platform-agnostic. The end result would look like this:

I'll work on a proof-of-concept version for this. @hashseed raised some concerns that this approach may bring a considerable overhead, but once we have a proof-of-concept we can optimize it.

hashseed · 2018-02-13T17:04:57Z

CodeEventListener is an internal interface that we use to implement various backends for profiling implementations internally. It does make sense to expose this in the V8 API and move backends to node modules and therefore shift the responsibility for maintenance.

My concern here is that code events are fairly low-level and the data we want to pass through it may change as we change the way V8 implements code objects. That would cause the ABI to change.

mmarchini · 2018-02-13T17:24:36Z

I think we can come up with some higher-level API/format. For instance, we only need three values to create a proper /tmp/perf-XXXX.map file: function begin address, function size, function name. Maybe having an API which only reports those values (and maybe some more we agree on) could keep the ABI stable?

hashseed · 2018-02-13T17:36:07Z

That might be a good idea. Can you maybe collect data to design this API?

mmarchini · 2018-02-13T18:21:23Z

I'll look into it.

ofrobots · 2018-02-13T18:47:48Z

In addition to function name, please include function url as well. Historically V8 was designed to not care about js files too much, but for Node.js use-case the filename is quite important.

hashseed · 2018-02-13T18:52:03Z

Good point. In CodeEventListener, the filename is extracted from the script, which is referenced by the shared function info.

Obviously we do not really want to expose shared function infos this way, so we'd have to extract this information before passing through the API.

mmarchini · 2018-02-13T20:03:49Z

Makes sense. I'll look into the SharedFunction class to see if there are other values we could pass through the API.

Maybe we could rename this issue to something like "Cross-platform call stack introspection"?

mmarchini · 2018-02-13T21:18:36Z

For the record: during the discussion about profiling at the Summit we mentioned that maybe in the future we could have a VM independent API to collect code-creation events. We also talked about creating a userland module for Linux perf support once V8 provides this API, and transfer that to the foundation once it's mature. cc/ @mhdawson @hashseed

mmarchini · 2018-02-13T21:26:21Z

Roadmap:

Collect data for the public CodeEventListener API
Introduce CodeEventListener public API to V8
Create userland module to provide Linux perf support to Node.js (similar to Java's perf-map-agent)
Do some implementation tests on restoring interpreted frames to call stack collectors (like Linux perf)

P.S.: Roadmap was moved to #148

joyeecheung · 2018-02-13T21:26:35Z

By the way: we usually don't create such a module inside the foundation from scratch - instead we create them in other namespaces and iterate on them, then transfer them into the foundation when it's mature/ready.

mmarchini · 2018-02-13T21:28:38Z

Updated my comment, thanks for pointing this out @joyeecheung

mmarchini · 2018-02-20T12:48:01Z

I collected data from several profilers to understand what we need for a public API for CodeEventListener. Here's a list of external profilers I investigated:

Linux perf
eBPF profile
DTrace
OSX Instruments
Windows xperf

I also got some data from several npm packages and tutorials using those profilers or the data collected by them:

Except for xperf, all other profilers have at least one tutorial or npm package using --perf-basic-prof to generate Flamegraphs and other useful insights. It's also worth noting that even xperf could leverage --perf-basic-prof + xperf_to_collapsedstacks + stackvis to generate Flamegraphs.

Based on those new findings, I think we could either keep --perf-basic-prof (with some improvements) since it already provides all the data needed by external profilers to get insights from stack samples, or we could write the public API for CodeEventListener which gives the same information we have today with --perf-basic-prof.

If we decide to keep --perf-basic-prof, we could probably give it a better name and an option to chose the output file location. We should also provide an API to enable/disable it at runtime and documentation/tests to make sure it doesn't break in future versions.

If we decide to go with the CodeEventListener's public API approach, it should be able to collect code objects created before the listener starts to collect data. Here's the minimal public API needed to provide the same information available today on --perf-basic-prof:

function start address

First address for the instructions of this function/builtin

function size

Size (in bytes) of this function/builtin

function name

Name of this function/builtin

function script

The script where this JS function was declared. This value is only relevant for JIT/Interpreted functions and will be empty for builtins.

function script line & column

The line and column on the script where this JS function was declared. This value is only relevant for JIT/Interpreted functions and will be empty for builtins.

hashseed · 2018-02-21T17:13:35Z

Any plans about inlined functions in optimized code?

mike-kaufman · 2018-02-21T17:38:18Z

A few questions to make sure I understand things here. Sorry if I'm being dense.

Is the proposal here to have "fake stack frames" for the interpreter still valid? Or is this obviated by the CodeEventListener API?
Does the CodeEventListener API pump data through trace_event macros? Or some other mechanism? How does this intersect w/ other efforts around tracing macros (e.g., goal to support LTTNG, ETW, DTrace)?
I think what's being proposed here is that a traditional calls stack sampling will not identify interpreter frames, w/out something that is able to listen to the events pumped through the CodeEventListener, thus the need for the "user-land module for perf" described above. Is this accurate?

Thanks,

Mike

mmarchini · 2018-02-21T18:11:31Z

Any plans about inlined functions in optimized code?

@hashseed haven't thought about that yet. Probably I'll take a look at it after the interpreted frames issue is addressed.

mmarchini · 2018-02-21T18:28:30Z

Is the proposal here to have "fake stack frames" for the interpreter still valid? Or is this obviated by the CodeEventListener API?

I see these two as separate (but complementary) issues. The "fake stack frames" proposal is to fix the problem we have today with V8 which makes interpreted functions undistinguishable in the call stack since they all are executed through InterpretedEntryTrampoline and BytecodeHandlers. In other words, they are not visible by stack samplers, no matter what V8 provides through a separate API, and the proposal to fix this is not directly related to an API or to --perf-basic-prof.

The second issue is "how" V8 can provide necessary information to allow external profilers to resolve unknown symbols (JIT code, Builtins, etc.). Today this is possible with --perf-basic-prof.

That being said, it's worth noting that --perf-basic-prof work exactly as expected today on Turbofan. The problem is not --perf-basic-prof, but the fact that there's no way to distinguish between interpreted function calls on the stack.

Does the CodeEventListener API pump data through trace_event macros? Or some other mechanism? How does this intersect w/ other efforts around tracing macros (e.g., goal to support LTTNG, ETW, DTrace)

I don't have opinions on this yet, but CodeEventListener - the private class V8 has today - do not use trace_events.

I think what's being proposed here is that a traditional calls stack sampling will not identify interpreter frames, w/out something that is able to listen to the events pumped through the CodeEventListener, thus the need for the "user-land module for perf" described above.

I think the answer to the first question also answered this one. If not please let me know so I can elaborate a little more.

mike-kaufman · 2018-02-21T18:36:34Z

Thanks @mmarchini.

gireeshpunathil · 2019-10-25T06:52:42Z

should this remain open? [ I am trying to chase dormant issues to closure ]

github-actions · 2020-07-18T00:37:13Z

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

mmarchini · 2020-07-18T03:43:41Z

This can be implemented in userland today via a new API, so closing

vmarchaud · 2020-07-18T07:21:53Z

@mmarchini I'm curious to which new API you are refering to ? CodeEventListener ?

mmarchini · 2020-07-18T08:59:14Z

Yes (the public API name might be different, I don't remember and I'm on my phone right now), and there's a module to enable/disable Linux perf maps during runtime: linux-perf. The API also gives flexibility to write to other formats.

mmarchini · 2020-07-18T09:03:25Z

Closed, but feel free to keep commenting if you have any questions (or feel free to open another issue)

RafaelGSS · 2020-07-18T20:44:29Z

I'm a bit confused about the resolution of this issue.

The core idea still in progress? If yes, has a roadmap to this "feature" to let other contributors to help?

mmarchini · 2020-07-20T23:36:50Z

@RafaelGSS it is not still in progress, this is addressed by https://www.npmjs.com/package/linux-perf.

camillobruni · 2022-04-11T08:29:55Z

Looking at the V8 implementation a bit, is it correct to assume that v8:: CodeEventListener is no longer used/necessary in node?

If that's the case I can clean up the V8 internal API

Qard · 2022-04-12T19:25:51Z

The public interface of that, v8::CodeEventHandler is used in the previously mentioned linux-perf module, and I'm also using it for my external CPU profiler.

davidmarkclements mentioned this issue Feb 13, 2018

Make --prof toggleable #154

Closed

mmarchini mentioned this issue Feb 13, 2018

Node CPU Profiling Roadmap #148

Closed

github-actions bot added the stale label Jul 18, 2020

mmarchini closed this as completed Jul 18, 2020

vmarchaud mentioned this issue Feb 15, 2021

Add Node support grafana/pyroscope#8

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make --perf-basic-prof toggleable #150

Make --perf-basic-prof toggleable #150

mike-kaufman commented Feb 12, 2018

mmarchini commented Feb 13, 2018

mmarchini commented Feb 13, 2018

hashseed commented Feb 13, 2018

mmarchini commented Feb 13, 2018

hashseed commented Feb 13, 2018

mmarchini commented Feb 13, 2018

ofrobots commented Feb 13, 2018

hashseed commented Feb 13, 2018

mmarchini commented Feb 13, 2018

mmarchini commented Feb 13, 2018 •

edited

Loading

mmarchini commented Feb 13, 2018 •

edited

Loading

joyeecheung commented Feb 13, 2018

mmarchini commented Feb 13, 2018

mmarchini commented Feb 20, 2018

hashseed commented Feb 21, 2018

mike-kaufman commented Feb 21, 2018 •

edited

Loading

mmarchini commented Feb 21, 2018

mmarchini commented Feb 21, 2018

mike-kaufman commented Feb 21, 2018

gireeshpunathil commented Oct 25, 2019

github-actions bot commented Jul 18, 2020

mmarchini commented Jul 18, 2020

vmarchaud commented Jul 18, 2020

mmarchini commented Jul 18, 2020 •

edited

Loading

mmarchini commented Jul 18, 2020

RafaelGSS commented Jul 18, 2020

mmarchini commented Jul 20, 2020

camillobruni commented Apr 11, 2022

Qard commented Apr 12, 2022

Make --perf-basic-prof toggleable #150

Make --perf-basic-prof toggleable #150

Comments

mike-kaufman commented Feb 12, 2018

mmarchini commented Feb 13, 2018

mmarchini commented Feb 13, 2018

hashseed commented Feb 13, 2018

mmarchini commented Feb 13, 2018

hashseed commented Feb 13, 2018

mmarchini commented Feb 13, 2018

ofrobots commented Feb 13, 2018

hashseed commented Feb 13, 2018

mmarchini commented Feb 13, 2018

mmarchini commented Feb 13, 2018 • edited Loading

mmarchini commented Feb 13, 2018 • edited Loading

joyeecheung commented Feb 13, 2018

mmarchini commented Feb 13, 2018

mmarchini commented Feb 20, 2018

function start address

function size

function name

function script

function script line & column

hashseed commented Feb 21, 2018

mike-kaufman commented Feb 21, 2018 • edited Loading

mmarchini commented Feb 21, 2018

mmarchini commented Feb 21, 2018

mike-kaufman commented Feb 21, 2018

gireeshpunathil commented Oct 25, 2019

github-actions bot commented Jul 18, 2020

mmarchini commented Jul 18, 2020

vmarchaud commented Jul 18, 2020

mmarchini commented Jul 18, 2020 • edited Loading

mmarchini commented Jul 18, 2020

RafaelGSS commented Jul 18, 2020

mmarchini commented Jul 20, 2020

camillobruni commented Apr 11, 2022

Qard commented Apr 12, 2022

mmarchini commented Feb 13, 2018 •

edited

Loading

mmarchini commented Feb 13, 2018 •

edited

Loading

mike-kaufman commented Feb 21, 2018 •

edited

Loading

mmarchini commented Jul 18, 2020 •

edited

Loading