Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/datadog] Add support for sketches #34662

Merged
merged 14 commits into from
Sep 9, 2024

Conversation

carrieedwards
Copy link
Contributor

@carrieedwards carrieedwards commented Aug 13, 2024

Description:
This PR adds support for translating Datadog sketches into Exponential Histograms.

Follow up of #33631, #33957 and #34180.

The full version of the code can be found in the cedwards/datadog-metrics-receiver-full branch, or in Grafana Alloy: https://github.com/grafana/alloy/tree/main/internal/etc/datadogreceiver

Link to tracking Issue:
#18278

Testing:
Unit tests, as well as an end-to-end test, have been added.


negativeBuckets, positiveBuckets, zeroCount := mapSketchBucketsToHistogramBuckets(sketch.K, sketch.N)

dp.SetZeroCount(zeroCount)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the zero threshold line up?

Copy link
Contributor

@krajorama krajorama Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should set the minimum value of sketch boundaries, which is I think: e^((1-1338)/(1/gamma)), which is about 9.941..E-10
EDIT: the value is ok, but the math is actually e^((1-1338)/(1/ln(gamma)))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved

Copy link
Contributor

@fedetorres93 fedetorres93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 147 to 148
// In some cases, the exponential histogram index that is mapped from the Sketch index corresponds to a bucket that
// has a lower bound that is higher than the Sketch bucket's lower bound. In this case, it is necessary to start

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this possible? We find a histogram index that covers the lower bound with sketchLowerBoundToHistogramIndex(sketchLowerBound) so the histogram bucket's lower bound shouldn't ever be higher than the sketch's lower bound

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right Fiona. We're taking the Frexp of the sketch lower bound (B). There's two possibilities. Either the frac part is 0.5 , in which case it was a power of two and then we use that as the upper bound of the exponential bucket , the lower being lower obviously (for example if you calculate Frexp(2)-> (0.5, 2)-> we return 31, which is the (1.957, 2] bucket.

Otherwise we take the floor(log_2(B)*32) . When you calculate the bucket boundary: 2^(floor(log_2(B)*32)/32) it's going to give a lower number than 2^(log_2(B)*32/32) which is the original number.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. I think the reason I implemented it to start in the bucket below the histogram index mapped from the sketch index was in case of floating point rounding errors. I can update it to start at the histogram index that is computed from the sketch index, though

Copy link
Contributor

@krajorama krajorama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! I think we do need some tests that specify the expected exponential buckets and their counters to help review of the mapSketchBucketsToHistogramBuckets bit and make sure that all cases are covered. I can think of these:

  1. There's 1 sketch bucket and it fits inside an exponential bucket.
  2. There's 1 sketch bucket and it spans two exponential buckets.
  3. There's 2 sketch buckets and they overlap with a common exponential bucket.
  4. Is it possible to have more overlap?

receiver/datadogreceiver/internal/translator/sketches.go Outdated Show resolved Hide resolved
receiver/datadogreceiver/internal/translator/sketches.go Outdated Show resolved Hide resolved

negativeBuckets, positiveBuckets, zeroCount := mapSketchBucketsToHistogramBuckets(sketch.K, sketch.N)

dp.SetZeroCount(zeroCount)
Copy link
Contributor

@krajorama krajorama Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should set the minimum value of sketch boundaries, which is I think: e^((1-1338)/(1/gamma)), which is about 9.941..E-10
EDIT: the value is ok, but the math is actually e^((1-1338)/(1/ln(gamma)))

// bucket that has a range covering that lower bound
// See: https://opentelemetry.io/docs/specs/otel/metrics/data-model/#all-scales-use-the-logarithm-function
func sketchLowerBoundToHistogramIndex(value float64) int {
if frac, exp := math.Frexp(value); frac == 0.5 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optimization for later: client golang uses lookup table after Frexp is called to get the bucket: https://github.com/prometheus/client_golang/blob/989a6d0a1a2543c2d0d8b9a8c524eced9c81fd3d/prometheus/histogram.go#L59

Find the lowest index in the table where frac is higher. E.g.: if value == 1.98, then frac == 0.99, which finds the last index, index=31 and since exp==1, the exponential bucket index is 31+(exp-1) = 31, the bucket is (1.957, 2].

receiver/datadogreceiver/receiver_test.go Outdated Show resolved Hide resolved
Comment on lines 147 to 148
// In some cases, the exponential histogram index that is mapped from the Sketch index corresponds to a bucket that
// has a lower bound that is higher than the Sketch bucket's lower bound. In this case, it is necessary to start
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right Fiona. We're taking the Frexp of the sketch lower bound (B). There's two possibilities. Either the frac part is 0.5 , in which case it was a power of two and then we use that as the upper bound of the exponential bucket , the lower being lower obviously (for example if you calculate Frexp(2)-> (0.5, 2)-> we return 31, which is the (1.957, 2] bucket.

Otherwise we take the floor(log_2(B)*32) . When you calculate the bucket boundary: 2^(floor(log_2(B)*32)/32) it's going to give a lower number than 2^(log_2(B)*32/32) which is the original number.

@jpkrohling jpkrohling changed the title [chore] [receiver/datadog] Add support for sketches [receiver/datadog] Add support for sketches Aug 27, 2024
Copy link
Member

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this needs a chagenlog entry, given that this feature (metrics handling) is already exposed to end users.

if index < 0 {
index = -index
}
return math.Exp((float64(index-agentSketchOffset) / (1 / math.Log(gamma))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This returns +Inf over an index value of 47096, just copy paste the code bellow into go playground (https://go.dev/play/p/2mnSdR9G7RS).
Causing the mapSketchBucketsToHistogramBuckets to go into an infinite loop.


import (
	"fmt"
	"math"
)

func main() {
	relativeAccuracy := 1.0 / 128
	gamma := 1 + 2*relativeAccuracy
	var agentSketchOffset int32 = 1338

	var index int32 = 47096

	lowerBound := math.Exp((float64(index-agentSketchOffset) / (1 / math.Log(gamma))))
	fmt.Printf("%g\n", lowerBound)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved

@krajorama

This comment was marked as resolved.

dp.SetMin(sketch.Min)
dp.SetMax(sketch.Max)
dp.SetScale(scale)
dp.SetZeroThreshold(math.Exp(float64(1-agentSketchOffset) / (1 / gamma))) // See https://github.com/DataDog/sketches-go/blob/7546f8f95179bb41d334d35faa281bfe97812a86/ddsketch/mapping/logarithmic_mapping.go#L48
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was making the right calculation but commenting wrong . Also see line 266 down below and playground.

Suggested change
dp.SetZeroThreshold(math.Exp(float64(1-agentSketchOffset) / (1 / gamma))) // See https://github.com/DataDog/sketches-go/blob/7546f8f95179bb41d334d35faa281bfe97812a86/ddsketch/mapping/logarithmic_mapping.go#L48
dp.SetZeroThreshold(math.Exp(float64(1-agentSketchOffset) / (1 / math.Log(gamma)))) // See https://github.com/DataDog/sketches-go/blob/7546f8f95179bb41d334d35faa281bfe97812a86/ddsketch/mapping/logarithmic_mapping.go#L48

Copy link
Contributor

@krajorama krajorama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from technical point of view. There are opportunities for optimizations for sure, but I think more than enough is there to do some field testing. Good job!

carrieedwards and others added 14 commits September 6, 2024 09:34
* Add output verification to TestDatadogMetricsV1_EndToEnd

Signed-off-by: Federico Torres <federico.torres@grafana.com>

* Add output verification to TestDatadogMetricsV2_EndToEnd

Signed-off-by: Federico Torres <federico.torres@grafana.com>

* Add output verification to TestDatadogSketches_EndToEnd

Signed-off-by: Federico Torres <federico.torres@grafana.com>

* Add output verification to TestDatadogServices_EndToEnd

Signed-off-by: Federico Torres <federico.torres@grafana.com>

* fmt

Signed-off-by: Federico Torres <federico.torres@grafana.com>

* Refactor output verifications in E2E receiver tests

Signed-off-by: Federico Torres <federico.torres@grafana.com>

* Add TestConvertBucketLayout

Signed-off-by: Federico Torres <federico.torres@grafana.com>

* Add TestMapSketchBucketsToHistogramBuckets

Signed-off-by: Federico Torres <federico.torres@grafana.com>

---------

Signed-off-by: Federico Torres <federico.torres@grafana.com>
…ckets

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
@jpkrohling jpkrohling merged commit 5e26464 into open-telemetry:main Sep 9, 2024
156 checks passed
@github-actions github-actions bot added this to the next release milestone Sep 9, 2024
f7o pushed a commit to f7o/opentelemetry-collector-contrib that referenced this pull request Sep 12, 2024
**Description:**
This PR adds support for translating Datadog sketches into Exponential
Histograms.

Follow up of open-telemetry#33631, open-telemetry#33957 and open-telemetry#34180.

The full version of the code can be found in the
`cedwards/datadog-metrics-receiver-full` branch, or in Grafana Alloy:
https://github.com/grafana/alloy/tree/main/internal/etc/datadogreceiver

**Link to tracking Issue:** 
open-telemetry#18278 

**Testing:** 
Unit tests, as well as an end-to-end test, have been added.

---------

Signed-off-by: Federico Torres <federico.torres@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Co-authored-by: Federico Torres <federico.sa.torres@gmail.com>
Co-authored-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
jriguera pushed a commit to springernature/opentelemetry-collector-contrib that referenced this pull request Oct 4, 2024
**Description:**
This PR adds support for translating Datadog sketches into Exponential
Histograms.

Follow up of open-telemetry#33631, open-telemetry#33957 and open-telemetry#34180.

The full version of the code can be found in the
`cedwards/datadog-metrics-receiver-full` branch, or in Grafana Alloy:
https://github.com/grafana/alloy/tree/main/internal/etc/datadogreceiver

**Link to tracking Issue:** 
open-telemetry#18278 

**Testing:** 
Unit tests, as well as an end-to-end test, have been added.

---------

Signed-off-by: Federico Torres <federico.torres@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Co-authored-by: Federico Torres <federico.sa.torres@gmail.com>
Co-authored-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants