-
Notifications
You must be signed in to change notification settings - Fork 375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PROF-7251] Exclude disabled profiling sample value types from output #2634
Merged
ivoanjo
merged 5 commits into
master
from
ivoanjo/prof-7251-exclude-disabled-profiling-types
Feb 22, 2023
Merged
[PROF-7251] Exclude disabled profiling sample value types from output #2634
ivoanjo
merged 5 commits into
master
from
ivoanjo/prof-7251-exclude-disabled-profiling-types
Feb 22, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
A lot of the knowledge for the values was spread out everywhere -- the collectors had to care about a lot of libdatadog details that they were getting from the `stack_recorder.h` header. As part of the work to make some values optional, I've replaced the previous `metric_values` array with a very concrete struct, and moved the knowledge about how to turn it into the array inside the `StackRecorder`.
… configurable In the previous commit the `sample_values` struct was introduced, which abstracted how values are passed to libdatadog away from everywhere else in the profiler, and centralized this into the `StackRecorder`. In this commit, I reimplemented the `record_sample` function to, instead of using a hardcoded position for every value type, rely on two extra indirections: * `state->position_for` * `state->enabled_values_count` By default (e.g. when all profile types are enabled), this new strategy behaves exactly as before. The interesting thing happens when some profile types are disabled (via the constructor). When profile types are disabled, the two indirections above are reconfigured: `enabled_values_count` becomes less than `ALL_VALUE_TYPES_COUNT`, and `position_for` is updated to account for this as well. In pratice, the `position_for` array is treated as if it was a hashmap -- the key is a given profile type, and the value is the position that libdatadog is expected it to be written to. Thus, converting a `sample_values` to an array for libdatadog is as simple as ```c metric_values[position_for[CPU_TIME_VALUE_ID]] = values.cpu_time_ns; metric_values[position_for[CPU_SAMPLES_VALUE_ID]] = values.cpu_samples; metric_values[position_for[WALL_TIME_VALUE_ID]] = values.wall_time_ns; metric_values[position_for[ALLOC_SAMPLES_VALUE_ID]] = values.alloc_samples; ``` The trick here, is that when certain profile_types are disabled their `position_for` is changed, so they are put at the end of the `metric_values` array. For instance, when we disable both `CPU_TIME_VALUE` and `ALLOC_SAMPLES_VALUE` the `position_for` "hashmap" will look something like ```ruby { CPU_SAMPLES_VALUE_ID: 0, WALL_TIME_VALUE_ID: 1, CPU_TIME_VALUE_ID: 2, ALLOC_SAMPLES_VALUE_ID: 3, } ``` And thus, given ```ruby { 'cpu-time' => 123, 'cpu-samples' => 456, 'wall-time' => 789, 'alloc-samples' => 4242 } ``` We will produce a `metrics_values` array with ``` +-----+-----+-----+------+ | 456 | 789 | 123 | 4242 | +-----+-----+-----+------+ ``` ...but we'll tell libdatadog to only use the first 2 positions of the array, which contain the values for the enabled profile types! To be honest, this was more boilerplate than I wanted, but I'm happy that most of the complexity lies in `_native_initialize` around the creation of `position_for` and the values list for libdatadog, and everywhere else still looks kinda sane.
…om `StackRecorder` I plan to remove the default values for the `StackRecorder` `cpu_time_enabled:` and `alloc_samples_enabled:` args. Thus, this refactors all tests and one benchmark where the `StackRecorder` was being manually created to make sure it provides its own values for those arguments. Why remove the default values? I want to force callers in production code to be explicit about their decisions on the profile types in use, as this is important to get right so the profiling backend can distinguish which types of profiles are enabled. Thus, I want to avoid accidentally enabling or disabling some profile type because that was the default on the constructor.
**What does this PR do?**: Up until now, the profiler always included all sample value types in the resulting profile, even if some value is was expected to always be zero. (Why would a value always be zero? The `cpu-time` value is only available on Linux, and would always be zero on macOS; and we plan to have a way to disable allocation profiling, and thus `alloc-samples` will also be zero.) This PR: * Abstracts away from all components other than the `StackRecorder` which sample value types are enabled, by introducing a new `sample_values` struct that every other component can use instead of having to deal with the old array and other implementation details. * Extends the `StackRecorder` to allow `cpu-time` and `alloc-samples` to be disabled * Updates the profiler component creation to configure `StackRecorder` correctly. **Motivation**: Having the extra sample value types always being included as we did so far is mostly harmless. But, it does make one thing difficult -- detection of which profile types are actually enabled in an application, in general. Thus, the main reason for doing this change is so that the Datadog profiling backend can easily detect which profile types are enabled, and which aren't, so it can display better guidance and error messages to customers. **Additional Notes**: N/A **How to test the change?**: The change includes test coverage. To manually validate this, you can download a pprof profile via the Datadog UX and confirm that only the enabled sample value types are available in it, for instance by running `go tool pprof -raw rubyprofile.pprof` and observing the columns shown after the `Samples:` header -- disabled value types will not show up there.
In this case it's equivalent, but `Qfalse` and `Qnil` are different in Ruby, so let's make sure we actually test for `Qtrue`.
Codecov Report
@@ Coverage Diff @@
## master #2634 +/- ##
=======================================
Coverage 98.07% 98.07%
=======================================
Files 1153 1153
Lines 63126 63164 +38
Branches 2813 2813
=======================================
+ Hits 61908 61950 +42
+ Misses 1218 1214 -4
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
marcotc
approved these changes
Feb 21, 2023
ivoanjo
deleted the
ivoanjo/prof-7251-exclude-disabled-profiling-types
branch
February 22, 2023 10:08
Merged
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?:
Up until now, the profiler always included all sample value types in the resulting profile, even if some value is was expected to always be zero.
(Why would a value always be zero? The
cpu-time
value is only available on Linux, and would always be zero on macOS; and we plan to have a way to disable allocation profiling, and thusalloc-samples
will also be zero.)This PR:
Abstracts away from all components other than the
StackRecorder
which sample value types are enabled, by introducing a newsample_values
struct that every other component can use instead of having to deal with the old array and other implementation details.Extends the
StackRecorder
to allowcpu-time
andalloc-samples
to be disabledUpdates the profiler component creation to configure
StackRecorder
correctly.Motivation:
Having the extra sample value types always being included as we did so far is mostly harmless.
But, it does make one thing difficult -- detection of which profile types are actually enabled in an application, in general.
Thus, the main reason for doing this change is so that the Datadog profiling backend can easily detect which profile types are enabled, and which aren't, so it can display better guidance and error messages to customers.
Additional Notes:
N/A
How to test the change?:
The change includes test coverage.
To manually validate this, you can download a pprof profile via the Datadog UX and confirm that only the enabled sample value types are available in it, for instance by running
go tool pprof -raw rubyprofile.pprof
and observing the columns shown after theSamples:
header -- disabled value types will not show up there.