-
Notifications
You must be signed in to change notification settings - Fork 377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PROF-10589] Add internal metrics for GVL profiling performance #3993
[PROF-10589] Add internal metrics for GVL profiling performance #3993
Conversation
…or GVL" This is useful because our main control for "Waiting for GVL" is the waiting_for_gvl_threshold_ns, and thus we can use this stat to monitor how many events are being skipped as being under that threshold. Note that regular (periodic) samples can still observe the "Waiting for GVL" state; but there will not be any special samples created to represent the beginning or end of "Waiting for GVL".
…stead of calling ThreadContext directly This structure is similar to the one we're already using for `sample_from_postponed_job` => `rescued_sample_from_postponed_job` and enables us to have more control over the parameters sent to the ThreadContext collector.
…gvl_running` function Now that the `CpuAndWallTimeWorker` does not call `thread_context_collector_sample_after_gvl_running` directly from a `rb_rescue`, it can use the version of the method with extra arguments directly.
…uring directly This is similar to what we do for `thread_context_collector_sample` and makes it easier for the CpuAndWallTimeWorker to keep stats on how long sampling took without needing to read the clock more than needed.
BenchmarksBenchmark execution time: 2024-10-11 12:03:44 Comparing candidate commit b2b331b in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 23 metrics, 2 unstable metrics. |
Flaked in CI with... ``` 1) Datadog::Profiling::Collectors::CpuAndWallTimeWorker#start when main thread is sleeping but a background thread is working when GVL profiling is enabled when 'Waiting for GVL' periods are below waiting_for_gvl_threshold_ns does not trigger extra samples Failure/Error: expect(cpu_and_wall_time_worker.stats.fetch(:gvl_dont_sample)) .to be > 100 # Arbitrary, on my machine I see 250k on a run expected: > 100 got: 5 ```
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #3993 +/- ##
==========================================
- Coverage 97.87% 97.86% -0.02%
==========================================
Files 1314 1314
Lines 78652 78667 +15
Branches 3909 3909
==========================================
+ Hits 76978 76984 +6
- Misses 1674 1683 +9 ☔ View full report in Codecov by Sentry. |
What does this PR do?
This PR introduces the following new internal metrics/counters for the profiler:
Motivation:
These metrics get automatically reported together with profilers, and allow us to keep an eye on/investigate/debug the performance of GVL profiling.
Additional Notes:
I needed to refactor a bit the way we trigger GVL profiling samples; the change is similar to what we were doing with regular cpu/wall-time samples.
How to test the change?
This change includes test coverage.