-
Notifications
You must be signed in to change notification settings - Fork 351
fix: improve runtime metrics #6344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: improve runtime metrics #6344
Conversation
The current implementation for runtime.node.event_loop.utilization always calculated a diff from the former entry and replaced the former entry with that diff. That in itself worked for the first two calls, but not afterwards, since we now receive a diff of the diff and that is incorrect. It is now replaced by directly calculating the utilization instead. The overall implementation is also a tad faster by removing some calls to native side as well as using simpler logic. Code for unsupported Node.js versions was also removed as well as adding a comment about the overall metric types and where to find those.
Overall package sizeSelf size: 11.92 MB Dependency sizes| name | version | self size | total size | |------|---------|-----------|------------| | @datadog/libdatadog | 0.7.0 | 35.02 MB | 35.02 MB | | @datadog/native-appsec | 10.1.0 | 20.37 MB | 20.37 MB | | @datadog/native-iast-taint-tracking | 4.0.0 | 11.72 MB | 11.73 MB | | @datadog/pprof | 5.9.0 | 9.77 MB | 10.1 MB | | @opentelemetry/core | 1.30.1 | 908.66 kB | 7.16 MB | | protobufjs | 7.5.4 | 2.95 MB | 5.6 MB | | @datadog/wasm-js-rewriter | 4.0.1 | 2.85 MB | 3.58 MB | | @datadog/native-metrics | 3.1.1 | 1.02 MB | 1.43 MB | | @opentelemetry/api | 1.8.0 | 1.21 MB | 1.21 MB | | jsonpath-plus | 10.3.0 | 617.18 kB | 1.08 MB | | import-in-the-middle | 1.14.2 | 122.36 kB | 850.93 kB | | lru-cache | 10.4.3 | 804.3 kB | 804.3 kB | | opentracing | 0.14.7 | 194.81 kB | 194.81 kB | | source-map | 0.7.6 | 185.63 kB | 185.63 kB | | pprof-format | 2.1.0 | 111.69 kB | 111.69 kB | | @datadog/sketches-js | 2.1.1 | 109.9 kB | 109.9 kB | | lodash.sortby | 4.7.0 | 75.76 kB | 75.76 kB | | ignore | 7.0.5 | 63.38 kB | 63.38 kB | | istanbul-lib-coverage | 3.2.2 | 34.37 kB | 34.37 kB | | rfdc | 1.4.1 | 27.15 kB | 27.15 kB | | dc-polyfill | 0.1.10 | 26.73 kB | 26.73 kB | | @isaacs/ttlcache | 1.4.1 | 25.2 kB | 25.2 kB | | tlhunter-sorted-set | 0.1.0 | 24.94 kB | 24.94 kB | | shell-quote | 1.8.3 | 23.74 kB | 23.74 kB | | limiter | 1.1.5 | 23.17 kB | 23.17 kB | | retry | 0.13.1 | 18.85 kB | 18.85 kB | | semifies | 1.0.0 | 15.84 kB | 15.84 kB | | jest-docblock | 29.7.0 | 8.99 kB | 12.76 kB | | crypto-randomuuid | 1.0.0 | 11.18 kB | 11.18 kB | | ttl-set | 1.0.0 | 4.61 kB | 9.69 kB | | mutexify | 1.4.0 | 5.71 kB | 8.74 kB | | path-to-regexp | 0.1.12 | 6.6 kB | 6.6 kB | | koalas | 1.0.2 | 6.47 kB | 6.47 kB | | module-details-from-path | 1.0.4 | 3.96 kB | 3.96 kB |🤖 This report was automatically generated by heaviest-objects-in-the-universe |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6344 +/- ##
==========================================
+ Coverage 83.76% 83.99% +0.22%
==========================================
Files 477 477
Lines 20060 20071 +11
==========================================
+ Hits 16804 16858 +54
+ Misses 3256 3213 -43 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
BenchmarksBenchmark execution time: 2025-08-29 12:29:43 Comparing candidate commit eb5a4aa in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 1268 metrics, 55 unstable metrics. |
Some conditions were not correctly handled before. For example did we still send out metrics in case options deactivated the gc or eventloop metrics. They were just always zero in that case. On top of that did we check for old Node.js support. That is not needed and was removed. The time was miscalulcated for the CPU metrics in the former commit. The divisor had to become smaller, not bigger. This is also fixed.
This also improves the code a bit
The config was using filters when they weren't needed. This is now a simple loop instead. The received config had to be truthy, otherwise it would have failed in the code afterwards.
This enables event loop runtime metrics in case the native module is not available. The metrics are aligned with the non-native ones besides having a standard deviation instead of a median. We could consider aligning this later.
szegedi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks okay; some improvement ideas in the individual comments.
In addition it does not propagate confusing Node.js event loop delay metrics (zero values).
Things fixed / improved: * runtime.node.event_loop.utilization is now working as expected (the metric was formerly calculating the diff on each iteration on top of the diff instead of on the actual former value) * runtime.node.gc.pause* is now only sending data, if the metric contains a count > 0 * Stop tracking gc metrics on native side when it is deactivated to track those while event loop tracking is also deactivated * Slightly faster implementation for a couple of metrics * Implemented a best effort fallback mode for the event loop delay in case the native library can’t be loaded * Cleaning up code that was there for old unsupported Node.js versions * Created lots of tests that are now checking that values are in a correct range (almost all lines are now fully covered and the tests are significantly more robust about actually finding issues) * Refactored dogstatsd to use private properties instead of underscored * Improved client config in dogstatsd to only use a single loop instead of multiple filters * Made sure calling runtimeMetrics.start() multiple times without calling stop in-between will automatically clean up former tracking
Things fixed / improved: * runtime.node.event_loop.utilization is now working as expected (the metric was formerly calculating the diff on each iteration on top of the diff instead of on the actual former value) * runtime.node.gc.pause* is now only sending data, if the metric contains a count > 0 * Stop tracking gc metrics on native side when it is deactivated to track those while event loop tracking is also deactivated * Slightly faster implementation for a couple of metrics * Implemented a best effort fallback mode for the event loop delay in case the native library can’t be loaded * Cleaning up code that was there for old unsupported Node.js versions * Created lots of tests that are now checking that values are in a correct range (almost all lines are now fully covered and the tests are significantly more robust about actually finding issues) * Refactored dogstatsd to use private properties instead of underscored * Improved client config in dogstatsd to only use a single loop instead of multiple filters * Made sure calling runtimeMetrics.start() multiple times without calling stop in-between will automatically clean up former tracking
The current implementation for runtime.node.event_loop.utilization always calculated a diff from the former entry and replaced the former entry with that diff. That in itself worked for the first two calls, but not afterwards, since we now receive a diff of the diff and that is incorrect.
It is now replaced by directly calculating the utilization instead.
The overall implementation is also a tad faster by removing some calls to native side as well as using process.hrtime.bigint() instead of process.hrtime().
Code for unsupported Node.js versions was also removed as well as adding a comment about the overall metric types and where to find those.