[PROF-10589] Add internal metrics for GVL profiling performance #3993

ivoanjo · 2024-10-11T10:34:18Z

What does this PR do?

This PR introduces the following new internal metrics/counters for the profiler:

// How many times we skipped the after_gvl_running sampling
unsigned int gvl_dont_sample;
// Min/max/total wall-time spent on gvl sampling
uint64_t gvl_sampling_time_ns_min;
uint64_t gvl_sampling_time_ns_max;
uint64_t gvl_sampling_time_ns_total;

Motivation:

These metrics get automatically reported together with profilers, and allow us to keep an eye on/investigate/debug the performance of GVL profiling.

Additional Notes:

I needed to refactor a bit the way we trigger GVL profiling samples; the change is similar to what we were doing with regular cpu/wall-time samples.

How to test the change?

This change includes test coverage.

…or GVL" This is useful because our main control for "Waiting for GVL" is the waiting_for_gvl_threshold_ns, and thus we can use this stat to monitor how many events are being skipped as being under that threshold. Note that regular (periodic) samples can still observe the "Waiting for GVL" state; but there will not be any special samples created to represent the beginning or end of "Waiting for GVL".

…stead of calling ThreadContext directly This structure is similar to the one we're already using for `sample_from_postponed_job` => `rescued_sample_from_postponed_job` and enables us to have more control over the parameters sent to the ThreadContext collector.

…gvl_running` function Now that the `CpuAndWallTimeWorker` does not call `thread_context_collector_sample_after_gvl_running` directly from a `rb_rescue`, it can use the version of the method with extra arguments directly.

…uring directly This is similar to what we do for `thread_context_collector_sample` and makes it easier for the CpuAndWallTimeWorker to keep stats on how long sampling took without needing to read the clock more than needed.

…amples

pr-commenter · 2024-10-11T11:09:01Z

Benchmarks

Benchmark execution time: 2024-10-11 12:03:44

Comparing candidate commit b2b331b in PR branch ivoanjo/prof-10589-gvl-profiling-internal-metrics with baseline commit 58b17f0 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 23 metrics, 2 unstable metrics.

Flaked in CI with... ``` 1) Datadog::Profiling::Collectors::CpuAndWallTimeWorker#start when main thread is sleeping but a background thread is working when GVL profiling is enabled when 'Waiting for GVL' periods are below waiting_for_gvl_threshold_ns does not trigger extra samples Failure/Error: expect(cpu_and_wall_time_worker.stats.fetch(:gvl_dont_sample)) .to be > 100 # Arbitrary, on my machine I see 250k on a run expected: > 100 got: 5 ```

codecov-commenter · 2024-10-11T11:48:56Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.86%. Comparing base (58b17f0) to head (b2b331b).
Report is 2 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #3993      +/-   ##
==========================================
- Coverage   97.87%   97.86%   -0.02%     
==========================================
  Files        1314     1314              
  Lines       78652    78667      +15     
  Branches     3909     3909              
==========================================
+ Hits        76978    76984       +6     
- Misses       1674     1683       +9

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ivoanjo added 5 commits October 11, 2024 10:44

Introduce internal stats to track time spent in extra GVL profiling s…

7bba03e

…amples

ivoanjo requested review from a team as code owners October 11, 2024 10:34

AlexJF approved these changes Oct 11, 2024

View reviewed changes

ivoanjo merged commit cdfc685 into master Oct 11, 2024
248 checks passed

ivoanjo deleted the ivoanjo/prof-10589-gvl-profiling-internal-metrics branch October 11, 2024 15:35

github-actions bot added this to the 2.5.0 milestone Oct 11, 2024

ivoanjo added profiling Involves Datadog profiling dev/internal Other internal work that does not need to be included in the changelog labels Oct 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PROF-10589] Add internal metrics for GVL profiling performance #3993

[PROF-10589] Add internal metrics for GVL profiling performance #3993

ivoanjo commented Oct 11, 2024

pr-commenter bot commented Oct 11, 2024 •

edited

Loading

codecov-commenter commented Oct 11, 2024

[PROF-10589] Add internal metrics for GVL profiling performance #3993

[PROF-10589] Add internal metrics for GVL profiling performance #3993

Conversation

ivoanjo commented Oct 11, 2024

pr-commenter bot commented Oct 11, 2024 • edited Loading

Benchmarks

codecov-commenter commented Oct 11, 2024

Codecov Report

pr-commenter bot commented Oct 11, 2024 •

edited

Loading