Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROF-10589] Add internal metrics for GVL profiling performance #3993

Merged
merged 6 commits into from
Oct 11, 2024

Conversation

ivoanjo
Copy link
Member

@ivoanjo ivoanjo commented Oct 11, 2024

What does this PR do?

This PR introduces the following new internal metrics/counters for the profiler:

// How many times we skipped the after_gvl_running sampling
unsigned int gvl_dont_sample;
// Min/max/total wall-time spent on gvl sampling
uint64_t gvl_sampling_time_ns_min;
uint64_t gvl_sampling_time_ns_max;
uint64_t gvl_sampling_time_ns_total;

Motivation:

These metrics get automatically reported together with profilers, and allow us to keep an eye on/investigate/debug the performance of GVL profiling.

Additional Notes:

I needed to refactor a bit the way we trigger GVL profiling samples; the change is similar to what we were doing with regular cpu/wall-time samples.

How to test the change?

This change includes test coverage.

…or GVL"

This is useful because our main control for "Waiting for GVL" is the
waiting_for_gvl_threshold_ns, and thus we can use this stat to monitor
how many events are being skipped as being under that threshold.

Note that regular (periodic) samples can still observe the
"Waiting for GVL" state; but there will not be any special samples
created to represent the beginning or end of "Waiting for GVL".
…stead of calling ThreadContext directly

This structure is similar to the one we're already using for
`sample_from_postponed_job` => `rescued_sample_from_postponed_job` and
enables us to have more control over the parameters sent to the
ThreadContext collector.
…gvl_running` function

Now that the `CpuAndWallTimeWorker` does not call
`thread_context_collector_sample_after_gvl_running` directly from a
`rb_rescue`, it can use the version of the method with extra
arguments directly.
…uring directly

This is similar to what we do for `thread_context_collector_sample` and
makes it easier for the CpuAndWallTimeWorker to keep stats on how long
sampling took without needing to read the clock more than needed.
@ivoanjo ivoanjo requested review from a team as code owners October 11, 2024 10:34
@pr-commenter
Copy link

pr-commenter bot commented Oct 11, 2024

Benchmarks

Benchmark execution time: 2024-10-11 12:03:44

Comparing candidate commit b2b331b in PR branch ivoanjo/prof-10589-gvl-profiling-internal-metrics with baseline commit 58b17f0 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 23 metrics, 2 unstable metrics.

Flaked in CI with...

```
  1) Datadog::Profiling::Collectors::CpuAndWallTimeWorker#start when main thread is sleeping but a background thread is working when GVL profiling is enabled when 'Waiting for GVL' periods are below waiting_for_gvl_threshold_ns does not trigger extra samples
     Failure/Error:
       expect(cpu_and_wall_time_worker.stats.fetch(:gvl_dont_sample))
         .to be > 100 # Arbitrary, on my machine I see 250k on a run

       expected: > 100
            got:   5
```
@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.86%. Comparing base (58b17f0) to head (b2b331b).
Report is 2 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3993      +/-   ##
==========================================
- Coverage   97.87%   97.86%   -0.02%     
==========================================
  Files        1314     1314              
  Lines       78652    78667      +15     
  Branches     3909     3909              
==========================================
+ Hits        76978    76984       +6     
- Misses       1674     1683       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ivoanjo ivoanjo merged commit cdfc685 into master Oct 11, 2024
248 checks passed
@ivoanjo ivoanjo deleted the ivoanjo/prof-10589-gvl-profiling-internal-metrics branch October 11, 2024 15:35
@github-actions github-actions bot added this to the 2.5.0 milestone Oct 11, 2024
@ivoanjo ivoanjo added profiling Involves Datadog profiling dev/internal Other internal work that does not need to be included in the changelog labels Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dev/internal Other internal work that does not need to be included in the changelog profiling Involves Datadog profiling
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants