Skip to content

Conversation

@BridgeAR
Copy link
Collaborator

@BridgeAR BridgeAR commented Aug 26, 2025

The current implementation for runtime.node.event_loop.utilization always calculated a diff from the former entry and replaced the former entry with that diff. That in itself worked for the first two calls, but not afterwards, since we now receive a diff of the diff and that is incorrect.
It is now replaced by directly calculating the utilization instead.

The overall implementation is also a tad faster by removing some calls to native side as well as using process.hrtime.bigint() instead of process.hrtime().

Code for unsupported Node.js versions was also removed as well as adding a comment about the overall metric types and where to find those.

The current implementation for runtime.node.event_loop.utilization
always calculated a diff from the former entry and replaced the
former entry with that diff. That in itself worked for the first
two calls, but not afterwards, since we now receive a diff of the
diff and that is incorrect.
It is now replaced by directly calculating the utilization instead.

The overall implementation is also a tad faster by removing some
calls to native side as well as using simpler logic.

Code for unsupported Node.js versions was also removed as well as
adding a comment about the overall metric types and where to find
those.
@github-actions
Copy link

github-actions bot commented Aug 26, 2025

Overall package size

Self size: 11.92 MB
Deduped: 111.53 MB
No deduping: 111.87 MB

Dependency sizes | name | version | self size | total size | |------|---------|-----------|------------| | @datadog/libdatadog | 0.7.0 | 35.02 MB | 35.02 MB | | @datadog/native-appsec | 10.1.0 | 20.37 MB | 20.37 MB | | @datadog/native-iast-taint-tracking | 4.0.0 | 11.72 MB | 11.73 MB | | @datadog/pprof | 5.9.0 | 9.77 MB | 10.1 MB | | @opentelemetry/core | 1.30.1 | 908.66 kB | 7.16 MB | | protobufjs | 7.5.4 | 2.95 MB | 5.6 MB | | @datadog/wasm-js-rewriter | 4.0.1 | 2.85 MB | 3.58 MB | | @datadog/native-metrics | 3.1.1 | 1.02 MB | 1.43 MB | | @opentelemetry/api | 1.8.0 | 1.21 MB | 1.21 MB | | jsonpath-plus | 10.3.0 | 617.18 kB | 1.08 MB | | import-in-the-middle | 1.14.2 | 122.36 kB | 850.93 kB | | lru-cache | 10.4.3 | 804.3 kB | 804.3 kB | | opentracing | 0.14.7 | 194.81 kB | 194.81 kB | | source-map | 0.7.6 | 185.63 kB | 185.63 kB | | pprof-format | 2.1.0 | 111.69 kB | 111.69 kB | | @datadog/sketches-js | 2.1.1 | 109.9 kB | 109.9 kB | | lodash.sortby | 4.7.0 | 75.76 kB | 75.76 kB | | ignore | 7.0.5 | 63.38 kB | 63.38 kB | | istanbul-lib-coverage | 3.2.2 | 34.37 kB | 34.37 kB | | rfdc | 1.4.1 | 27.15 kB | 27.15 kB | | dc-polyfill | 0.1.10 | 26.73 kB | 26.73 kB | | @isaacs/ttlcache | 1.4.1 | 25.2 kB | 25.2 kB | | tlhunter-sorted-set | 0.1.0 | 24.94 kB | 24.94 kB | | shell-quote | 1.8.3 | 23.74 kB | 23.74 kB | | limiter | 1.1.5 | 23.17 kB | 23.17 kB | | retry | 0.13.1 | 18.85 kB | 18.85 kB | | semifies | 1.0.0 | 15.84 kB | 15.84 kB | | jest-docblock | 29.7.0 | 8.99 kB | 12.76 kB | | crypto-randomuuid | 1.0.0 | 11.18 kB | 11.18 kB | | ttl-set | 1.0.0 | 4.61 kB | 9.69 kB | | mutexify | 1.4.0 | 5.71 kB | 8.74 kB | | path-to-regexp | 0.1.12 | 6.6 kB | 6.6 kB | | koalas | 1.0.2 | 6.47 kB | 6.47 kB | | module-details-from-path | 1.0.4 | 3.96 kB | 3.96 kB |

🤖 This report was automatically generated by heaviest-objects-in-the-universe

@codecov
Copy link

codecov bot commented Aug 26, 2025

Codecov Report

❌ Patch coverage is 95.53571% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.99%. Comparing base (907d8e8) to head (94eaf7f).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
...es/dd-trace/src/runtime_metrics/runtime_metrics.js 95.04% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6344      +/-   ##
==========================================
+ Coverage   83.76%   83.99%   +0.22%     
==========================================
  Files         477      477              
  Lines       20060    20071      +11     
==========================================
+ Hits        16804    16858      +54     
+ Misses       3256     3213      -43     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@datadog-datadog-prod-us1
Copy link

datadog-datadog-prod-us1 bot commented Aug 26, 2025

⚠️ Tests

⚠️ Warnings

❄️ 1 New flaky test detected

cypress@14.5.4 esm intelligent test runner can skip tests received by the intelligent test runner API and still reports code coverage from integration-tests/cypress/cypress.spec.js (Datadog)
Timeout

ℹ️ Info

🧪 All tests passed

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: eb5a4aa | Docs | Was this helpful? Give us feedback!

@pr-commenter
Copy link

pr-commenter bot commented Aug 26, 2025

Benchmarks

Benchmark execution time: 2025-08-29 12:29:43

Comparing candidate commit eb5a4aa in PR branch BridgeAR/2025-08-25-fix-eventloop-utilization-metric with baseline commit 7830469 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 1268 metrics, 55 unstable metrics.

Some conditions were not correctly handled before. For example did
we still send out metrics in case options deactivated the gc or
eventloop metrics. They were just always zero in that case.

On top of that did we check for old Node.js support. That is not
needed and was removed.

The time was miscalulcated for the CPU metrics in the former commit.
The divisor had to become smaller, not bigger. This is also fixed.
The config was using filters when they weren't needed. This is now
a simple loop instead.

The received config had to be truthy, otherwise it would have failed
in the code afterwards.
This enables event loop runtime metrics in case the native module
is not available.
The metrics are aligned with the non-native ones besides having a
standard deviation instead of a median. We could consider aligning
this later.
@BridgeAR BridgeAR changed the title fix: runtime.node.event_loop.utilization is now handled correct fix: improve runtime metrics Aug 27, 2025
@BridgeAR BridgeAR marked this pull request as ready for review August 28, 2025 14:11
@BridgeAR BridgeAR requested a review from a team as a code owner August 28, 2025 14:11
@BridgeAR BridgeAR requested a review from szegedi August 28, 2025 14:12
Copy link
Contributor

@szegedi szegedi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks okay; some improvement ideas in the individual comments.

szegedi
szegedi previously approved these changes Aug 29, 2025
In addition it does not propagate confusing Node.js event loop
delay metrics (zero values).
@BridgeAR BridgeAR merged commit 156f35a into master Aug 29, 2025
695 checks passed
@BridgeAR BridgeAR deleted the BridgeAR/2025-08-25-fix-eventloop-utilization-metric branch August 29, 2025 16:29
dd-octo-sts bot pushed a commit that referenced this pull request Aug 30, 2025
Things fixed / improved:

* runtime.node.event_loop.utilization is now working as expected (the metric was formerly calculating the diff on each iteration on top of the diff instead of on the actual former value)
* runtime.node.gc.pause* is now only sending data, if the metric contains a count > 0
* Stop tracking gc metrics on native side when it is deactivated to track those while event loop tracking is also deactivated
* Slightly faster implementation for a couple of metrics
* Implemented a best effort fallback mode for the event loop delay in case the native library can’t be loaded
* Cleaning up code that was there for old unsupported Node.js versions
* Created lots of tests that are now checking that values are in a correct range (almost all lines are now fully covered and the tests are significantly more robust about actually finding issues)
* Refactored dogstatsd to use private properties instead of underscored
* Improved client config in dogstatsd to only use a single loop instead of multiple filters
* Made sure calling runtimeMetrics.start() multiple times without calling stop in-between will automatically clean up former tracking
@dd-octo-sts dd-octo-sts bot mentioned this pull request Aug 30, 2025
szegedi pushed a commit that referenced this pull request Sep 2, 2025
Things fixed / improved:

* runtime.node.event_loop.utilization is now working as expected (the metric was formerly calculating the diff on each iteration on top of the diff instead of on the actual former value)
* runtime.node.gc.pause* is now only sending data, if the metric contains a count > 0
* Stop tracking gc metrics on native side when it is deactivated to track those while event loop tracking is also deactivated
* Slightly faster implementation for a couple of metrics
* Implemented a best effort fallback mode for the event loop delay in case the native library can’t be loaded
* Cleaning up code that was there for old unsupported Node.js versions
* Created lots of tests that are now checking that values are in a correct range (almost all lines are now fully covered and the tests are significantly more robust about actually finding issues)
* Refactored dogstatsd to use private properties instead of underscored
* Improved client config in dogstatsd to only use a single loop instead of multiple filters
* Made sure calling runtimeMetrics.start() multiple times without calling stop in-between will automatically clean up former tracking
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants