Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make --perf-basic-prof toggleable #150

Closed
mike-kaufman opened this issue Feb 12, 2018 · 29 comments
Closed

Make --perf-basic-prof toggleable #150

mike-kaufman opened this issue Feb 12, 2018 · 29 comments
Labels

Comments

@mike-kaufman
Copy link
Contributor

Goal is to make to enable --perf-basic-prof dynamically at runtime. Need to resolve fact that this is not a fully supported v8 feature.

@mmarchini
Copy link
Contributor

Current implementation is available here if anyone is interested: nodejs/node@v6.12.3...mmarchini:v6.12.3-perf-basic-prof-toggling

@mmarchini
Copy link
Contributor

Talking to @hashseed during the break he suggested that we could expose a public API to capture code events, which would allow userland modules to deal with those code events as necessary (for example, we could have a userland module capable of generating the same /tmp/perf-XXX.map generated by --perf-basic-prof today). This way we have a platform-agnostic approach to the current issue. Having a public API would also solve the runtime toggling problem as well.

Need to resolve fact that this is not a fully supported v8 feature.

--perf-basic-prof doesn't support Interpreted Frames because the call stack with Interpreted Frames looks like this:

interpretedframesnow

The suggested approach so far is to (somehow) add a "fake" frame to the stack which points to a JIT-function with the purpose of keeping track of those interpreted function calls. This would essentially restore the previous functionality to Linux perf and any other stack recorder (DTrace, eBPF profile, etc.), so it's also platform-agnostic. The end result would look like this:

interpretedframessuggestion

I'll work on a proof-of-concept version for this. @hashseed raised some concerns that this approach may bring a considerable overhead, but once we have a proof-of-concept we can optimize it.

@hashseed
Copy link
Member

CodeEventListener is an internal interface that we use to implement various backends for profiling implementations internally. It does make sense to expose this in the V8 API and move backends to node modules and therefore shift the responsibility for maintenance.

My concern here is that code events are fairly low-level and the data we want to pass through it may change as we change the way V8 implements code objects. That would cause the ABI to change.

@mmarchini
Copy link
Contributor

I think we can come up with some higher-level API/format. For instance, we only need three values to create a proper /tmp/perf-XXXX.map file: function begin address, function size, function name. Maybe having an API which only reports those values (and maybe some more we agree on) could keep the ABI stable?

@hashseed
Copy link
Member

That might be a good idea. Can you maybe collect data to design this API?

@mmarchini
Copy link
Contributor

I'll look into it.

@ofrobots
Copy link
Contributor

In addition to function name, please include function url as well. Historically V8 was designed to not care about js files too much, but for Node.js use-case the filename is quite important.

@hashseed
Copy link
Member

Good point. In CodeEventListener, the filename is extracted from the script, which is referenced by the shared function info.

Obviously we do not really want to expose shared function infos this way, so we'd have to extract this information before passing through the API.

@mmarchini
Copy link
Contributor

Makes sense. I'll look into the SharedFunction class to see if there are other values we could pass through the API.

Maybe we could rename this issue to something like "Cross-platform call stack introspection"?

@mmarchini
Copy link
Contributor

mmarchini commented Feb 13, 2018

For the record: during the discussion about profiling at the Summit we mentioned that maybe in the future we could have a VM independent API to collect code-creation events. We also talked about creating a userland module for Linux perf support once V8 provides this API, and transfer that to the foundation once it's mature. cc/ @mhdawson @hashseed

@mmarchini
Copy link
Contributor

mmarchini commented Feb 13, 2018

Roadmap:

  • Collect data for the public CodeEventListener API
  • Introduce CodeEventListener public API to V8
  • Create userland module to provide Linux perf support to Node.js (similar to Java's perf-map-agent)
  • Do some implementation tests on restoring interpreted frames to call stack collectors (like Linux perf)

P.S.: Roadmap was moved to #148

@joyeecheung
Copy link
Member

By the way: we usually don't create such a module inside the foundation from scratch - instead we create them in other namespaces and iterate on them, then transfer them into the foundation when it's mature/ready.

@mmarchini
Copy link
Contributor

Updated my comment, thanks for pointing this out @joyeecheung

@mmarchini
Copy link
Contributor

I collected data from several profilers to understand what we need for a public API for CodeEventListener. Here's a list of external profilers I investigated:

  • Linux perf
  • eBPF profile
  • DTrace
  • OSX Instruments
  • Windows xperf

I also got some data from several npm packages and tutorials using those profilers or the data collected by them:

Except for xperf, all other profilers have at least one tutorial or npm package using --perf-basic-prof to generate Flamegraphs and other useful insights. It's also worth noting that even xperf could leverage --perf-basic-prof + xperf_to_collapsedstacks + stackvis to generate Flamegraphs.

Based on those new findings, I think we could either keep --perf-basic-prof (with some improvements) since it already provides all the data needed by external profilers to get insights from stack samples, or we could write the public API for CodeEventListener which gives the same information we have today with --perf-basic-prof.

If we decide to keep --perf-basic-prof, we could probably give it a better name and an option to chose the output file location. We should also provide an API to enable/disable it at runtime and documentation/tests to make sure it doesn't break in future versions.

If we decide to go with the CodeEventListener's public API approach, it should be able to collect code objects created before the listener starts to collect data. Here's the minimal public API needed to provide the same information available today on --perf-basic-prof:

function start address

First address for the instructions of this function/builtin

function size

Size (in bytes) of this function/builtin

function name

Name of this function/builtin

function script

The script where this JS function was declared. This value is only relevant for JIT/Interpreted functions and will be empty for builtins.

function script line & column

The line and column on the script where this JS function was declared. This value is only relevant for JIT/Interpreted functions and will be empty for builtins.

@hashseed
Copy link
Member

Any plans about inlined functions in optimized code?

@mike-kaufman
Copy link
Contributor Author

mike-kaufman commented Feb 21, 2018

A few questions to make sure I understand things here. Sorry if I'm being dense.

  1. Is the proposal here to have "fake stack frames" for the interpreter still valid? Or is this obviated by the CodeEventListener API?

  2. Does the CodeEventListener API pump data through trace_event macros? Or some other mechanism? How does this intersect w/ other efforts around tracing macros (e.g., goal to support LTTNG, ETW, DTrace)?

  3. I think what's being proposed here is that a traditional calls stack sampling will not identify interpreter frames, w/out something that is able to listen to the events pumped through the CodeEventListener, thus the need for the "user-land module for perf" described above. Is this accurate?

Thanks,

Mike

@mmarchini
Copy link
Contributor

Any plans about inlined functions in optimized code?

@hashseed haven't thought about that yet. Probably I'll take a look at it after the interpreted frames issue is addressed.

@mmarchini
Copy link
Contributor

Is the proposal here to have "fake stack frames" for the interpreter still valid? Or is this obviated by the CodeEventListener API?

I see these two as separate (but complementary) issues. The "fake stack frames" proposal is to fix the problem we have today with V8 which makes interpreted functions undistinguishable in the call stack since they all are executed through InterpretedEntryTrampoline and BytecodeHandlers. In other words, they are not visible by stack samplers, no matter what V8 provides through a separate API, and the proposal to fix this is not directly related to an API or to --perf-basic-prof.

The second issue is "how" V8 can provide necessary information to allow external profilers to resolve unknown symbols (JIT code, Builtins, etc.). Today this is possible with --perf-basic-prof.

That being said, it's worth noting that --perf-basic-prof work exactly as expected today on Turbofan. The problem is not --perf-basic-prof, but the fact that there's no way to distinguish between interpreted function calls on the stack.

Does the CodeEventListener API pump data through trace_event macros? Or some other mechanism? How does this intersect w/ other efforts around tracing macros (e.g., goal to support LTTNG, ETW, DTrace)

I don't have opinions on this yet, but CodeEventListener - the private class V8 has today - do not use trace_events.

I think what's being proposed here is that a traditional calls stack sampling will not identify interpreter frames, w/out something that is able to listen to the events pumped through the CodeEventListener, thus the need for the "user-land module for perf" described above.

I think the answer to the first question also answered this one. If not please let me know so I can elaborate a little more.

@mike-kaufman
Copy link
Contributor Author

Thanks @mmarchini.

@gireeshpunathil
Copy link
Member

should this remain open? [ I am trying to chase dormant issues to closure ]

@github-actions
Copy link

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

@github-actions github-actions bot added the stale label Jul 18, 2020
@mmarchini
Copy link
Contributor

This can be implemented in userland today via a new API, so closing

@vmarchaud
Copy link
Contributor

@mmarchini I'm curious to which new API you are refering to ? CodeEventListener ?

@mmarchini
Copy link
Contributor

mmarchini commented Jul 18, 2020

Yes (the public API name might be different, I don't remember and I'm on my phone right now), and there's a module to enable/disable Linux perf maps during runtime: linux-perf. The API also gives flexibility to write to other formats.

@mmarchini
Copy link
Contributor

Closed, but feel free to keep commenting if you have any questions (or feel free to open another issue)

@RafaelGSS
Copy link
Member

I'm a bit confused about the resolution of this issue.

  • The core idea still in progress? If yes, has a roadmap to this "feature" to let other contributors to help?

@mmarchini
Copy link
Contributor

@RafaelGSS it is not still in progress, this is addressed by https://www.npmjs.com/package/linux-perf.

@camillobruni
Copy link

Looking at the V8 implementation a bit, is it correct to assume that v8:: CodeEventListener is no longer used/necessary in node?

If that's the case I can clean up the V8 internal API

@Qard
Copy link
Member

Qard commented Apr 12, 2022

The public interface of that, v8::CodeEventHandler is used in the previously mentioned linux-perf module, and I'm also using it for my external CPU profiler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

10 participants