-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement query pushdown for a subset of aggregations #4917
Conversation
8cb1b92
to
753be73
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, good start! Looks generally good, but the topk/bottomk I don't think can be pushdown-ed easily 🤔 Essentially the top from node X might be not the top from part Y.
pkg/store/prometheus.go
Outdated
Method: "GET", | ||
} | ||
|
||
matrix, _, err := p.client.QueryRange(ctx, p.base, r.ToPromQL(), r.MinTime, r.MaxTime, r.QueryHints.Step/1000, queryOpts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it always range query?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it needs to be because of lookback, but I can double check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we can always use a query range, we just need to use a positive step for instant queries.
cc @GiedriusS what if we start with this slowly into main? |
We could also add a flag to the sidecar to toggle this feature on and off. Something like a feature flag. |
753be73
to
99e0d67
Compare
I think we might want to add some tests as well to make sure we don't introduce a regression. I'll try to do that by EOW |
Related to #305 |
b5a8700
to
08dea85
Compare
7ef8acf
to
bf11fbf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we reuse the same --enable-feature
pattern for this instead of a separate flag?
c8823d6
to
f91d036
Compare
f5bc482
to
827849b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love this work - solid PR. LGTM, just small nits. Also we are missing, docs but we can add them in separate PR!
return err | ||
} | ||
|
||
matrix = make(model.Matrix, 0, len(vector)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO to myself: Add issue about remote read understanding aggr or streamed PromQL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Played around with this locally - looks good modulo @bwplotka's comments. I'm not sure how I dismissed Bartek's review with a commit that I didn't even author or push 😂 Let's play with it behind a feature flag and see how we can further improve upon this base 👍
pkg/store/prometheus.go
Outdated
} | ||
matrix = result | ||
} else { | ||
vector, _, err := p.client.QueryInstant(s.Context(), p.base, r.ToPromQL(), time.Unix(r.MaxTime/1000, 0), opts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vector, _, err := p.client.QueryInstant(s.Context(), p.base, r.ToPromQL(), time.Unix(r.MaxTime/1000, 0), opts) | |
vector, _, err := p.client.QueryInstant(s.Context(), p.base, r.ToPromQL(), timestamp.Time(r.MaxTime), opts) |
@fpetkovski any update on this? I'd like to merge this as a good base so that we could continue further push down related work. |
Ah sorry I saw it was approved and didn't see the comments that followed up. I'll address them today so that we can merge it 👍 |
d8c5c71
to
ec4b109
Compare
@GiedriusS all comments should be addressed. The e2e tests seem to be failing but I don't think the failure is related to this PR |
Certain aggregations can be executed safely on leaf nodes without worrying about data duplication or overlap. One such example is the max function which can be computed on local data by the leaves before it is computed globally by the querier. This commit implements local aggregation in the Prometheus sidecar for all functions which are safe to execute locally. The feature can be enabled by passing the `--enable-feature evaluate-queries` flag to the sidecar. Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>
ec4b109
to
6c40d35
Compare
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
💪🏽 |
"max_over_time", | ||
"min", | ||
"min_over_time", | ||
"group", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not count()
btw.? I guess some others would also work, but I guess you're just starting with a few...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because then PromQL engine goes ahead and calculates count() again on the already correct results i.e. pushed down data thus corrupting it. I have suggested something here but it got no attention so I haven't pursued it further. An alternative is to edit the AST of the query but it got denied as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, I had assumed there was already some facility for changing the outer operation (in the count
case, to sum
), but that's not required for min/max/group.
The comment on prometheus/prometheus#10101 is interesting though. Indeed, if the same series can exist on two leafs (I think this is only the case for HA deduplication in Thanos, right?), then pushing rate()
or even count()
down in a correct way would be impossible (if you do the count across two HA replicas and you get 10 on each, you don't know if you really counted the same 10 series both times or counted different ones, or some overlapping sets).
"max", | ||
"max_over_time", | ||
"min", | ||
"min_over_time", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May I ask why avg_over_time
is not safe? Is it because in Thanos we do deduplication at Query time?
If we do write time deduplication like Cortex then this seems fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's the reason. I believe it should be safe if you have unique data.
Certain aggregations can be executed safely on leaf nodes without
worrying about data duplication or overlap. One such example is the max
function which can be computed on local data by the leaves before it is
computed globally by the querier.
This commit implements local aggregation in the Prometheus sidecar for
all functions which are safe to execute locally. The feature can be enabled by
passing the
--enable-feature evaluate-queries
option to the sidecar.Changes
Verification