Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The current implementation is not merging the histogram correctly. The sum is calculated as follow:
count * (existingMean + currentMean) / 2
Now let's say that the existing histogram has a count of 1000 and an existing mean of 10000, the incoming histogram has a count of 100 and a mean of 1000.
newCount = 1000 + 100 = 1100
newMean = (10000 + 1000) / 2 = 5500
newSum = 1100 * newMean = 6_050_000
Now the next incoming histogram has a count of 1000 and an existing mean of 100:
newCount = 1100 + 1000 = 2100
newMean = (5500 +100) / 2 = 2800
newSum = 2100 * 2800 = 5_880_000
By definition, the sum of the histogram is supposed to behave like a counter (A counter is a cumulative metric that represents a single monotonically increasing counter whose value can only increase or be reset to zero on restart.)
In the example above, the sum decreased (5_880_000 < 6_050_000) which is not something that should happen.
You can see it happening with real values:
With this PR, the computation of the sum changes to the following:
sum = existingSum + (currentMean * count)
Taking the previous example, we would have:
previousSum = 10_000_000
newSum = previousSum * (1000 * 100) = 10_100_000
previousSum = 10_100_000
newSum = previousSum * (100 * 1000) = 10_200_000
The sum is increasing and the computation is correct: (the sum of 100 buckets with a mean value of 1000 should be equal to the sum of 1000 buckets with a mean value of 100)