-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sum values in unwrapped rate aggregation instead of treating them as counter #6361
Conversation
2241854
to
64edec9
Compare
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
- distributor -0.3%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
- logql -0.2%
+ loki 0% |
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. - ingester -0.1%
- distributor -0.3%
+ querier 0%
+ querier/queryrange 0.1%
+ iter 0%
+ storage 0%
+ chunkenc 0%
- logql -0.2%
+ loki 0% |
@liguozhong Wanted to give you a heads up because it probably affects you. |
Thanks , I got it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few nits.
Btw, do you mind renaming the PR title to something like what you have in the CHANGELOG? In the changelog it is described as "Sum values in unwrapped rate aggregation instead of treating them as counter".
Let me make two separate pull requests. Then the PR title also matches the changes. |
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
- logql -0.2%
+ loki 0% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! (the jsonnet failure isn't related to your change)
5220c7a
to
275a1ab
Compare
This PR reverts the implementation done in #5013 to the original implementation that sums the extracted values from the log lines instead of treating them like a Prometheus counter metric. Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
275a1ab
to
a7ab57a
Compare
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
- distributor -0.3%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
- logql -0.2%
+ loki 0% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, I don't fully understand this PR. for one, why are all the values we assert against in the tests changing? does this actually change computation behavior? was the previous implementation incorrect?
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
- logql -0.2%
+ loki 0% |
// SUM(n=47, 61, 1) = 15 | ||
// 15 / 30 = 0.5 | ||
promql.Vector{promql.Sample{Point: promql.Point{T: 60 * 1000, V: 0.5}, Metric: labels.Labels{labels.Label{Name: "app", Value: "foo"}}}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having some difficulty understanding these tests 😕
What does the SUM(n=47, 61, 1)
mean? I assume this is the result of the newSeries
method on the data
property, but don't understand how these values were calculated 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the SUM(n=47, 61, 1)
: it just means the sum of the values from item 47 to item 61 where the value is a constant of 1, like https://www.wolframalpha.com/input?i2d=true&i=Sum%5B1%2C%7Bn%2C47%2C61%7D%5D
Regarding, why 47, 61, and 1, we have to look at the output of the newSeries()
function: It creates a stream {app="foo"}
with 300 samples starting at timestamp 46e9 (46s) and ending at timestamp 345e9 (345s). The sample value is a constant 1.
So the query now looks at the time range from ts=30s to ts=60s, where the lower bound is not included. There are 15 items (item 47 to item 61) from the generated series that matches.
Hope this helps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! 🙏 This helps a lot.
Would it be possible to add this information in the comments to make the tests easier to understand? e.g.
// create a stream {app="foo"} with 300 samples starting at 46s and ending at 345s with a constant value of 1
[][]logproto.Series{
{newSeries(testSize, offset(46, constantValue(1)), `{app="foo"}`)},
},
// query between the time range from ts=30s and ts=60s where the lower bound is not included
[]SelectSampleParams{
{&logproto.SampleQueryRequest{Start: time.Unix(30, 0), End: time.Unix(60, 0), Selector: `rate({app="foo"} | unwrap foo[30s])`}},
},
// SUM(n=47, 61, 1) = 15 - there are 15 samples (from 47 to 61) matched from the generated series
// 15 / 30 = 0.5
promql.Vector{promql.Sample{Point: promql.Point{T: 60 * 1000, V: 0.5}, Metric: labels.Labels{labels.Label{Name: "app", Value: "foo"}}}},
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs in this PR look good to me.
…counter (#6361) * Revert unwrapped rate aggregation to previous implementation This PR reverts the implementation done in #5013 to the original implementation that sums the extracted values from the log lines instead of treating them like a Prometheus counter metric. Signed-off-by: Christian Haudum <christian.haudum@gmail.com> * Move changelog entry Signed-off-by: Christian Haudum <christian.haudum@gmail.com> * Remove unused/dead code Signed-off-by: Christian Haudum <christian.haudum@gmail.com> * Clean changelog Signed-off-by: Christian Haudum <christian.haudum@gmail.com> (cherry picked from commit b315ed0)
…counter (#6361) (#6555) * Revert unwrapped rate aggregation to previous implementation This PR reverts the implementation done in #5013 to the original implementation that sums the extracted values from the log lines instead of treating them like a Prometheus counter metric. Signed-off-by: Christian Haudum <christian.haudum@gmail.com> * Move changelog entry Signed-off-by: Christian Haudum <christian.haudum@gmail.com> * Remove unused/dead code Signed-off-by: Christian Haudum <christian.haudum@gmail.com> * Clean changelog Signed-off-by: Christian Haudum <christian.haudum@gmail.com> (cherry picked from commit b315ed0) Co-authored-by: Christian Haudum <christian.haudum@gmail.com>
What this PR does / why we need it:
This PR implements the first part of the RFC described in #6351
It reverts
rate()
to its previous implementation prior to #5013 That means it calculates the per-second rate from the sum of all extracted values.Which issue(s) this PR fixes:
#6351
Checklist
CHANGELOG.md
.docs/sources/upgrading/_index.md