-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a way to limit the number of samples we keep for buffered metrics #283
Merged
remeh
merged 11 commits into
DataDog:master
from
iksaif:corentin.chary/max-samples-and-distrib-rates
Oct 16, 2023
Merged
Add a way to limit the number of samples we keep for buffered metrics #283
remeh
merged 11 commits into
DataDog:master
from
iksaif:corentin.chary/max-samples-and-distrib-rates
Oct 16, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
blemale
approved these changes
Sep 28, 2023
iksaif
commented
Sep 28, 2023
remeh
reviewed
Oct 10, 2023
iksaif
force-pushed
the
corentin.chary/max-samples-and-distrib-rates
branch
from
October 11, 2023 09:06
925f953
to
14adef5
Compare
Sampling rates are an inefficient mechanism to sample distributions because it requires the user to dynamically compute the sampling rate in order to effictively limit the load induced by distributions. This adds `WithMaxSamplesPerContext(int)` which will limit the number of samples we keep per contexts to a fixed number that will be high enough to stay statisically relevant. The sampling is done using using an algorithm called Vitter’s R), which randomly selects values with linearly-decreasing probability. This is a commonly used algorithm in instrumentation libraries (such as codahale). (see http://www.cs.umd.edu/~samir/498/vitter.pdf) Additionally this fixes the computation of the `rate` for buffered metrics, this is important because it is forwarded to the agent and passed down to the sketches in order to make sure that we can still compute the count of events.
iksaif
force-pushed
the
corentin.chary/max-samples-and-distrib-rates
branch
from
October 12, 2023 12:57
3a1df67
to
1ee2618
Compare
carlosroman
approved these changes
Oct 13, 2023
remeh
approved these changes
Oct 13, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Sampling rates are an inefficient mechanism to sample distributions because it requires the user to dynamically compute the sampling rate in order to effictively limit the load induced by distributions.
This adds
WithMaxSamplesPerContext(int)
which will limit the number of samples we keep per contexts to a fixed number that will be high enough to stay statisically relevant.The sampling is done using using an algorithm called Vitter’s R), which randomly selects values with linearly-decreasing probability. This is a commonly used algorithm in instrumentation libraries (such as codahale). (see http://www.cs.umd.edu/~samir/498/vitter.pdf)
Additionally this fixes the computation of the
rate
for buffered metrics, this is important because it is forwarded to the agent and passed down to the sketches in order to make sure that we can still compute the count of events.Here's the result on an application sending ~10 000 samples per second per distribution contexts
Agent CPU
dogstatsd Bytes/sec
The impact on the application itself: