-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node CPU Profiling Roadmap #148
Comments
Great explanation and summary of the current scenario. Suggestion: We could add "Write tests to those profiling tools" to the roadmap, as I think one of the goals is to have these tools as first-class citizens in the future. |
Another suggestion: a better standardized format for the profiles, and tools that convert legacy formats to it. The current v8 CPU profile format is quite limiting considering it's JSON so it cannot be concatenated easily and has to be in the memory as a whole before being serialized. |
In 0x (flamegraph tool) I've re-implemented stack tracing on top of v8 profiler ( Pros
Cons
What I would love is the ability to cross reference prof stacks with OS level tracing stacks. If we can do that, then perf being deprecated, and bytecode handler problem (and any other problems) would be resolved with --prof while lower level tracing could be used to compliment --prof. |
In terms of Node internal implementation, absolutely agree on tests. The way we currently expose |
We recently added production grade continuous sampling to our product. For that we had to find a sweet spot between sampling frequency and duration. Additionally there was a memory leak in v8::CpuProfiler (nodejs/node#14894) we discovered. As already mentioned, this approach won't give you native frames - still we considered it to be sufficient for many use cases. @Hollerberg who implemented it will be at the summit and we are happy to discuss the validity of our approach and be part of an initiative to improve CPU sampling capabilities in node. |
We use the CPU profiler in the profiling agent here that is an experimental way to continuously collect the data from production Node.js apps. Some notes on that experience so far:
|
I believe it is important that Node.js have first-class tooling for problem investigation and CPU profiling is part of that along with tracing, debugging, heap dumps, node-report and being able to introspect core files. Ideally, a base level of core tooling would be available with the runtime without having to install additional components as that can often be an issue in production. That's a long way of saying that I support your effort and will look to see how I can help in respect to getting ci testing and other supporting pieces in place and also how I might be able to pull in people from IBM to help collaborate. |
For reference: #150 |
Another issue that @bmeurer mentioned to me is that there is no way for Linux perf and any other perf tool outside of V8 to discern inlined functions from its caller, since both occupy a single stack frame. The CpuProfiler doesn't provide this either, but @FranziH is addressing this. |
Updates from the Diagnostics SummitExternal profilersExternal profilers are working well on Node 6, even though they are not officially supported. They are also working on Node 8+, but the information collected by them can be misleading due to the introduction of Turbofan and Ignition. Also, the current method used by most external profilers to resolve JITed functions ( To support external profilers in the future, we need two things: Interpreted FramesAfter the introduction of Turbofan and Ignition, Interpreted Frames on the stack don’t reflect JavaScript function calls since only the interpreter appears in the stack, as we can see in the image below. As a consequence, understanding the data collected by external profilers can be tricky, and the data can be misleading since there's no way to distinguish between different JavaScript function calls when they are running in interpreted mode. As soon as those functions are compiled and optimized by Turbofan, they will appear in the stack as before. During the Summit, we came up with three different approaches to have more meaningful information on the call stack for external profilers. All of them must be implemented on the V8 interpreter, and they basically change the current flow to add a unique stack frame for each JS function. Intermediate function before calling the interpreterAdd an intermediate frame to the stack which points to a JIT-function with the purpose of keeping track of which JS function is being called. Duplicate the interpreter code for each JS functionCopy the InterpreterEntryTrampoline code for each Interpreted Function, this way each Interpreted Frame will have a unique entry point. Apparently, ChakraCore is implemented this way. Change the stack on runtime to replace the interpreter with a unique address representing the JS functionHack the call stack at runtime, replacing InterpreterEntryTrampoline's return address with the address to a small JIT function which will return to InterpreterEntryTrampoline later. API with information to resolve JIT function addressesExternal profilers can’t resolve names for JITed functions (including V8 Builtins) without help from the runtime. Also, most of the time those names are resolved with post-processing tools. Today we have As V8 Profiler & CpuProfilerV8 builtin profilers are officially supported, but they can only sample JavaScript frames. We discussed the possibility to add native frames to those profilers as well in the future. Some challenges are 1) sampling frames from other threads; 2) sampling syscalls frames. Roadmap
|
Related issue: #150 |
Thanks for the great write up of the discussion from the summit. |
OP said:
AFAIK, |
Let me try to rephrase this: perf-based tools are not officially supported because they were never intended for public use. As result, they did not get any attention when we launched TurboFan and Ignition in V8 5.8. That being said, they worked fine for a long time before 5.8 and if fixed, I expect that they will stay functional for a long time. I don't expect major changes like TurboFan/Ignition to occur often. |
In addition to what @hashseed said:
The time span here is in years since a new compiler/interpreter won't replace Turbofan/Ignition overnight. Also, it won't happen by surprise: we knew Turbofan and Ignition were coming years before they landed on Node.js core.
I believe when this happens it will take the same path to upgrade done in the past: the new compiler/interpreter is introduced in the codebase, and the old one is replaced in steps until it can be removed. That gives us time to help to make the new compiler/interpreter work well with external profilers (especially if we want to provide 1 Tier support for external profilers in Node.js, which I think we should). |
@hashseed so is the recommended path forward from the V8 team to use the perf-based tools as opposed to creating an new API? If so is there some level of commitment to keeping them working (along the lines of what we have now with V8 running Node.js tests ?) |
I recommend using the CpuProfiler, since this is the one that both Chrome DevTools and GCP Stackdriver uses. But I can see that the current feature set makes it unattractive in some use cases. Perf-based tools are not designed with public use in mind, and I cannot make hard commitment to support them in the long run. What I want to avoid is to be in the situation to have to block changes to V8 because they may break perf-based profiling in some way. I don't expect this to happen soon or often though. Also, the breakage we are talking about, introduced by Ignition, is something I hear no other popular interpreted language runtime supports either. That being said, we currently don't even have any test case for perf-based profiling at this point, again due to the nature of them being ad-hoc tools not designed for public use. I would welcome test cases in Node.js so that we could at least notice if they break, so that we can make informed decisions whether to fix them if effort to do so is reasonable. The current situation is that we cannot honestly claim to offer official support if we wanted to. |
CPU Profiling deep dive WG meetingTimeUTC Thu 15-Mar-2018 19:00 (07:00 PM):
Agenda
|
@hekike I expect this deep dive is focusing only on external profiling (e.g. via perf,..) - not in-process profiling (e.g. using V8 CpuProfiler APIs). |
I'm travelling so I won't be able to make that time. Any chance we could do it Thurs/Friday instead? |
What I've heard from former Chrome DevTools developers is that the current version of V8 expects two cores to perform well, since GC, parsing, compilation, etc. all happen on depart threads now (not to mention the threads spawned by Node.js). So folks running on a single thread might already be taking a good performance hit
IMO, based on how things are implemented today, this should be done on a worker thread.
Agreed.
Apparently we ran similar tests but got very different results. I'll try with express instead of fastify to see if there are any changes. Which hardware did you use to run these tests? What parameters on autocannon? |
hello.jsconst inspector = require('inspector')
const express = require('express')
const app = express()
inspector.open(3333, '127.0.0.1', false)
app.get('/hello', (req, res) => {
res.send('Hello World!')
})
app.listen(8080) profile.jsconst WebSocket = require('isomorphic-ws')
const ws = new WebSocket(process.env.WS)
ws.onmessage = ({ data }) => {
// console.log(data) // only used to validate everything works
}
ws.onopen = () => {
ws.send(JSON.stringify({
id: 1,
method: 'Profiler.enable'
}))
setInterval(() => {
ws.send(JSON.stringify({
id: 1,
method: 'Profiler.start'
}))
ws.send(JSON.stringify({
id: 1,
method: 'Profiler.stop'
}))
}, 5000)
}
Out of curiosity, why do you recommend using a worker thread for this? So far my tests showed no difference between using a worker thread or not. This is especially true if the underlying implementation is already using a thread. |
The problem I reported in #23070 was based on a real crash with a real application from a customer. That application had ~230000 functions. I increased it to 1M in my reproducer to get a similar memory pattern as the real application which requires a few hundred MB already without Profiler. That time the memory overhead was the main problem which looks far better now. But it still seems that the amount of work to be done by v8 main thread depends on the number of functions and it is synchron. Most likely because functions coluld move in memory otherwise. I agree that such a lot functions is not the typical setup. The main issue for us was that that there is no mechanism to estimate the overhead nor to limit it therefore we stopped using the profiler. |
Those are valid use cases, but most of the improvements as I mentioned above should be implemented on V8. So I guess we need to start to discuss how to get this upstream. |
Added to the agenda so we can discuss how to proceed. |
Removed from the agenda until someone has time to work on it (reach out to V8 folks, gather data, etc.) |
This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made. |
I would like to discuss it and help. Re-adding to the agenda |
Hi folks. As previously discussed on WG Meetings, since it issue was created in 2018 a lot of work was been done and a lot of contexts were mentioned here. Seems fair to close that issue and create another one about the performance of |
Closing in favor of #444 |
We’d like to get feedback on the status of the profiling tools landscape in Node.js today. In particular -- we want to get alignment on a roadmap which will provide a free, open source, and cross-platform set of tools that are part of the node/v8 API i.e. maintained across LTS versions that can provide a powerful suite to debug and diagnose Node.js issues in production.
Production Challenges
There are some challenges that are unique to debugging and diagnosing issues in production. Specifically for large critical production deployments. In particular here are some of the constraints due to the production nature of the deployments:
Profiling
One of the most useful methodologies to optimize CPU performance in a running application is by sampling the CPU stack frames (CPU profiling) and then visualizing the samples, typically using a flamegraph. This technique will show hot code paths on CPU -- which gives you the opportunity to optimize the relevant source code.
The approach can be done in production with OS level profilers such as (perf, DTrace, systemtap, eBPF) with very low overhead. The profilers lack the information to resolve the JS frames, resulting in unhelpful memory addresses for the JS frames. V8 solves this problem by dumping a mapping of native frame addresses to JS source and line number.
It’s important to mention here that having access to all stack frames, whether native (v8, libc, syscalls, libuv, native modules) or JS is important. Problems can occur anywhere in the stack, and we want to be able to profile Node with complete stack frames. E.g. We heavily use gRPC -- which is a native module -- so without access to native frames we would not be able to get visibility into this critical part of our application.
There are a few issues with this implementation:
perf(1)
support is now deprecated in V8 and will not be supported starting in Node 8 -- which effectively means we’re losing the ability to profile JS stacks.We’d like to contribute and collaborate on a set of comprehensive, cross-platform, and open source CPU profiling tools with the Node and V8 team. The V8 team has advised us that they plan to support the v8 profiler and the v8 cpu profiler API going forward, and we want to unlock CPU profiling capabilities in Node using these supported frameworks.
Roadmap:
We’re looking for feedback and alignment with the community on this subject before proceeding with the design and implementation -- please let us know your thoughts.
The text was updated successfully, but these errors were encountered: