Add a way to limit the number of samples we keep for buffered metrics #283

iksaif · 2023-09-27T08:48:26Z

Sampling rates are an inefficient mechanism to sample distributions because it requires the user to dynamically compute the sampling rate in order to effictively limit the load induced by distributions.

This adds WithMaxSamplesPerContext(int) which will limit the number of samples we keep per contexts to a fixed number that will be high enough to stay statisically relevant.

The sampling is done using using an algorithm called Vitter’s R), which randomly selects values with linearly-decreasing probability. This is a commonly used algorithm in instrumentation libraries (such as codahale). (see http://www.cs.umd.edu/~samir/498/vitter.pdf)

Additionally this fixes the computation of the rate for buffered metrics, this is important because it is forwarded to the agent and passed down to the sketches in order to make sure that we can still compute the count of events.

Here's the result on an application sending ~10 000 samples per second per distribution contexts

Agent CPU

dogstatsd Bytes/sec

The impact on the application itself:

statsd/metrics.go

statsd/aggregator.go

statsd/options.go

statsd/metrics.go

statsd/worker.go

statsd/buffered_metric_context.go

Sampling rates are an inefficient mechanism to sample distributions because it requires the user to dynamically compute the sampling rate in order to effictively limit the load induced by distributions. This adds `WithMaxSamplesPerContext(int)` which will limit the number of samples we keep per contexts to a fixed number that will be high enough to stay statisically relevant. The sampling is done using using an algorithm called Vitter’s R), which randomly selects values with linearly-decreasing probability. This is a commonly used algorithm in instrumentation libraries (such as codahale). (see http://www.cs.umd.edu/~samir/498/vitter.pdf) Additionally this fixes the computation of the `rate` for buffered metrics, this is important because it is forwarded to the agent and passed down to the sketches in order to make sure that we can still compute the count of events.

iksaif marked this pull request as ready for review September 28, 2023 08:13

blemale approved these changes Sep 28, 2023

View reviewed changes

statsd/metrics.go Outdated Show resolved Hide resolved

statsd/metrics.go Outdated Show resolved Hide resolved

iksaif commented Sep 28, 2023

View reviewed changes

statsd/metrics.go Outdated Show resolved Hide resolved

remeh reviewed Oct 10, 2023

View reviewed changes

iksaif force-pushed the corentin.chary/max-samples-and-distrib-rates branch from 925f953 to 14adef5 Compare October 11, 2023 09:06

iksaif added 11 commits October 12, 2023 14:57

Add documentation and changelog entries

764e1dc

Fix a bug when we have less samples than the max

9996ccf

Make sure we send the rate over the wire for aggregated metrics

39482b3

Make sure we compute correctly the bytes we need for metadata

2ffc632

Make sure we don't sample aggregated metrics twice

af256fe

Fix randInt63() call, we need to use the total number of samples here

74fce6f

Add a comment to explain why totalSamples is increased with an atomic

2af6d6e

metrics: fix data-race

a0823f6

address review comments

59d053b

Fix data race

1ee2618

iksaif force-pushed the corentin.chary/max-samples-and-distrib-rates branch from 3a1df67 to 1ee2618 Compare October 12, 2023 12:57

carlosroman self-requested a review October 13, 2023 12:07

carlosroman approved these changes Oct 13, 2023

View reviewed changes

remeh approved these changes Oct 13, 2023

View reviewed changes

remeh merged commit aafbe8f into DataDog:master Oct 16, 2023
23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a way to limit the number of samples we keep for buffered metrics #283

Add a way to limit the number of samples we keep for buffered metrics #283

iksaif commented Sep 27, 2023 •

edited

Loading

Add a way to limit the number of samples we keep for buffered metrics #283

Add a way to limit the number of samples we keep for buffered metrics #283

Conversation

iksaif commented Sep 27, 2023 • edited Loading

iksaif commented Sep 27, 2023 •

edited

Loading