Proposed new strategic initiative: revamping tracing/metrics collection #853

jasnell · 2020-04-23T19:40:49Z

Node.js currently uses a number of different mechanisms for tracking performance metrics internally.

Trace events
DTrace / ETW probe points
Perf_hooks
process.memoryUsage()
process.cpuUsage()
process.resourceUsage()
v8.getHeapStatistics()
v8.getHeapCodestatistics()
and so on.

These use a number of divergent mechanisms internally with very little consistency, making it complicated and cumbersome for someone to take a complete system-wide view of the metrics.

Further, Worker Threads make it even more difficult because some metrics become thread specific (e.g. process.memoryUsage()) while others are process wide.

Lastly, some of the mechanisms (DTrace, ETW and the trace events implementation) are under supported and problematic. The trace events implementation, for instance, will often abort under load when running worker threads because it has not yet been made fully thread safe. The team at google that had been working on the implementation is no longer engaged and has moved on to other things so the code has largely sat unfinished.

I have started investigating a top down overhaul of the metrics collection mechanisms in Node with the intent on providing a single, clear, coherent subsystem for per-process and per-isolate metrics tracking and reporting that will support multiple targets and use cases with a much cleaner implementation. A key goal will be to make it easier and more reliable to attach various analytics tools on top of Node.js (e.g. clinic.js, n|solid, apms, etc) without having to rely on hacks or building custom versions of the runtime. I also want to increase the visibility/observability of various key components of the platform and modernize metrics collection and reporting for tools such as Prometheus.

This will be a large effort that will take some time to get right and will require input from a number of folks. I'm still working through some work plan details now but I wanted to at least provide some notification that I was starting this effort.

/cc @nodejs/diagnostics @mmarchini @addaleax @sam-github @mcollina

mhdawson · 2020-04-24T17:05:53Z

I'm definitely supportive of an effort on the Diagnostics side. Do you want to summit a PR to added it to the strategic initiatives list? It would be great to have a top level issues that can be used to hold references to the ongoing/complete work.

One other thing is how current metrics feed into reporting through Prometheus and anything we should be making available that can be exposed through modules like prom-client.

sam-github · 2020-04-24T17:21:24Z

Would make a good collab summit topic, too.

legendecas · 2021-06-24T02:45:32Z

Since this proposal is opened for one year, I'd like to ask if there is any further discussion on this? Also, I'm wondering if there is anything the diagnostics team could get involved or lead the discussion and following up actions on this topic since I see most areas in the topic are somewhat related to diagnostic tools.

mhdawson · 2022-06-28T21:21:17Z

This has been open for almost a year since the last comment. I think we should likely close unless we can find a champion for the initiative. Otherwise related discussion can take place in the diagnostics wg.

@jasnell unless you are still planning to work on this as announced in the original post is it ok if I close this issue?

Jamlee · 2022-06-29T12:33:01Z

add http https http2 perfermance mertic. 😄

mhdawson · 2023-04-05T16:00:11Z

@jasnell I think this was meant as an FYI and since its been almost a year and a half since the FYI it can be closed. Please let me know if you think that was not the right thing to do.

jasnell mentioned this issue Apr 23, 2020

[discuss] event loop idle metrics nodejs/node#33026

Closed

jasnell mentioned this issue Apr 28, 2020

Node.js Technical Steering Committee (TSC) Meeting 2020-04-29 #856

Closed

mmarchini mentioned this issue May 4, 2020

Discuss usage and support of eBPF nodejs/diagnostics#386

Closed

vmarchaud mentioned this issue Jul 5, 2020

Using diagnostic-channel for instrumentation open-telemetry/opentelemetry-js#1263

Open

2 tasks

mhdawson closed this as completed Apr 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposed new strategic initiative: revamping tracing/metrics collection #853

Proposed new strategic initiative: revamping tracing/metrics collection #853

jasnell commented Apr 23, 2020 •

edited

Loading

mhdawson commented Apr 24, 2020

sam-github commented Apr 24, 2020

legendecas commented Jun 24, 2021 •

edited

Loading

mhdawson commented Jun 28, 2022

Jamlee commented Jun 29, 2022

mhdawson commented Apr 5, 2023

Proposed new strategic initiative: revamping tracing/metrics collection #853

Proposed new strategic initiative: revamping tracing/metrics collection #853

Comments

jasnell commented Apr 23, 2020 • edited Loading

mhdawson commented Apr 24, 2020

sam-github commented Apr 24, 2020

legendecas commented Jun 24, 2021 • edited Loading

mhdawson commented Jun 28, 2022

Jamlee commented Jun 29, 2022

mhdawson commented Apr 5, 2023

jasnell commented Apr 23, 2020 •

edited

Loading

legendecas commented Jun 24, 2021 •

edited

Loading