Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query Frontend: add query instant tripperware for query sharding #5561

Merged
merged 9 commits into from
Aug 17, 2022

Conversation

yeya24
Copy link
Contributor

@yeya24 yeya24 commented Aug 2, 2022

Signed-off-by: Ben Ye ben.ye@bytedance.com

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

This pr adds a simple tripperware for sharding instant queries.

Verification

@@ -133,6 +137,97 @@ func (r *ThanosQueryRangeRequest) String() string { return "" }
// which is not used in thanos.
func (r *ThanosQueryRangeRequest) ProtoMessage() {}

type ThanosQueryInstantRequest struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a way to use the same object for instant and range queries? That way we wouldn't have to duplicate code for these two use cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to split the two types. Having them in the same struct is okay but we need a flag to maintain whether it is instant query and range query, and check the flag when needed.
Compared to having two types, this is not that clear and require other code, too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, makes sense 👍

@fpetkovski
Copy link
Contributor

One question, do you think it makes sense to have separate sharding number for query range and query instant? It would mean we change --query-frontend.vertical-shards to

--query-range.vertical-shards
--query-instant.vertical-shards

which is a "breaking change", but we also haven't released the feature yet.

@yeya24
Copy link
Contributor Author

yeya24 commented Aug 2, 2022

One question, do you think it makes sense to have separate sharding number for query range and query instant? It would mean we change --query-frontend.vertical-shards to

--query-range.vertical-shards
--query-instant.vertical-shards

which is a "breaking change", but we also haven't released the feature yet.

What about using the same NumShards flag?

type Config struct {
	QueryRangeConfig
	LabelsConfig
	DownstreamTripperConfig

	CortexHandlerConfig    *transport.HandlerConfig
	CompressResponses      bool
	CacheCompression       string
	RequestLoggingDecision string
	DownstreamURL          string
	ForwardHeaders         []string
	NumShards              int
}

Now the NumShards is defined in the root level instead of inside QueryRangeConfig so I think it is totally okay to use it in instant queries.
Maintaining two flags are also okay but I cannot think of a usecase for setting vertical shards differently for different query types.

@fpetkovski
Copy link
Contributor

Got it. Let's start with a single flag and we can revisit if a use case comes up.

@yeya24
Copy link
Contributor Author

yeya24 commented Aug 3, 2022

After we vendor the cortex package internally, is there a way to import the cortex proto file we vendored? Tried multiple times but didn't find a way to get it working with existing genproto.sh script.

@GiedriusS
Copy link
Member

After we vendor the cortex package internally, is there a way to import the cortex proto file we vendored? Tried multiple times but didn't find a way to get it working with existing genproto.sh script.

Yeah, I think it hasn't been updated. You probably need to add -I option that points to the Cortex directory.

@yeya24
Copy link
Contributor Author

yeya24 commented Aug 9, 2022

Still cannot get it working by adding the Cortex dir to the proto dependency. Wondering what's the future plan on this.
Are we allowed to modify the vendored Cortex code for some use cases? We may need to build the Cortex proto ourselves.

@fpetkovski
Copy link
Contributor

Could you share the command that fails for you currently?

@yeya24
Copy link
Contributor Author

yeya24 commented Aug 13, 2022

Could you share the command that fails for you currently?

Thanks. I figured out a way to compile the Cortex protobuf instead and make changes on it directly. Not sure if it is a acceptable way but I got it working finally. Seems query response parsing and instant query sharding are working properly on my local tests.

I will add more tests soon.

body, _ := ioutil.ReadAll(r.Body)
return nil, httpgrpc.Errorf(r.StatusCode, string(body))
}
log, ctx := spanlogger.New(ctx, "ParseQueryInstantResponse") //nolint:ineffassign,staticcheck
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SA4006: this value of ctx is never used


Reply with "@sonatype-lift help" for info about LiftBot commands.
Reply with "@sonatype-lift ignore" to tell LiftBot to leave out the above finding from this PR.
Reply with "@sonatype-lift ignoreall" to tell LiftBot to leave out all the findings from this PR and from the status bar in Github.

When talking to LiftBot, you need to refresh the page to see its response. Click here to get to know more about LiftBot commands.


Was this a good recommendation?
[ 🙁 Not relevant ] - [ 😕 Won't fix ] - [ 😑 Not critical, will fix ] - [ 🙂 Critical, will fix ] - [ 😊 Critical, fixing now ]

@yeya24 yeya24 force-pushed the query-instant-tripperware branch 2 times, most recently from 074ba82 to 2603a96 Compare August 13, 2022 18:03
Ben Ye added 7 commits August 13, 2022 11:14
Signed-off-by: Ben Ye <ben.ye@bytedance.com>
Signed-off-by: Ben Ye <ben.ye@bytedance.com>
Signed-off-by: Ben Ye <ben.ye@bytedance.com>
Signed-off-by: Ben Ye <ben.ye@bytedance.com>
Signed-off-by: Ben Ye <ben.ye@bytedance.com>
Signed-off-by: Ben Ye <ben.ye@bytedance.com>
Signed-off-by: Ben Ye <ben.ye@bytedance.com>
Ben Ye added 2 commits August 13, 2022 21:52
Signed-off-by: Ben Ye <ben.ye@bytedance.com>
Signed-off-by: Ben Ye <ben.ye@bytedance.com>
@yeya24
Copy link
Contributor Author

yeya24 commented Aug 15, 2022

This is ready for another round of review. @fpetkovski @GiedriusS

Copy link
Contributor

@fpetkovski fpetkovski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, my only concern is that we need to modify the vendored Cortex files. I am not sure though what's the strategy for keeping up to date with upstream Cortex.

Maybe @kakkoyun or @bwplotka have an idea?

@@ -7,7 +7,7 @@ package cortexpb;

option go_package = "cortexpb";

import "github.com/gogo/protobuf/gogoproto/gogo.proto";
import "gogoproto/gogo.proto";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can do this rewrite in go.mod or with some sed magic in the Makefile.

My concern is that the change can be lost the next time we update Cortex.

@yeya24
Copy link
Contributor Author

yeya24 commented Aug 17, 2022

This looks good, my only concern is that we need to modify the vendored Cortex files. I am not sure though what's the strategy for keeping up to date with upstream Cortex.

Yeah, I see your concern. I am not sure about this as well. If needed I can try to avoid modifying the Cortex code. But this causes more code duplication. For example, we need to copy Cortex unexposed functions like merging query stats, etc.

@kakkoyun
Copy link
Member

Even though it's not ideal for syncing up with the upstream Cortex, we'll need to modify the vendored Cortex files. If I'm not mistaken, @GiedriusS also working on a PR that modifies the vendored files. It'll be a half-automated process to upgrade Cortex in Thanos, but I don't see any other options. If someone has a brilliant idea to do this, please let us know :)

Copy link
Contributor

@fpetkovski fpetkovski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, just one more question from my side. Looking forward to using this!

return nil, err
}

if r.FormValue(queryv1.MaxSourceResolutionParam) == "auto" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does downsampling apply to instant queries?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah from the instant query API I can see the param is used.

Copy link
Member

@GiedriusS GiedriusS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, it will be a pain to keep Cortex fork up-to-date, though, however it's inevitable and I can't think of a better strategy at the moment. 💪 good work!

@GiedriusS GiedriusS merged commit 13a2eb8 into thanos-io:main Aug 17, 2022
@fpetkovski
Copy link
Contributor

Amazing stuff, one of our instant queries went from 36s to 8s.

@yeya24
Copy link
Contributor Author

yeya24 commented Aug 18, 2022

Whoops, I didn't expect the PR got merged so quickly. I haven't added some unit tests and changelog, too. I will add them in follow up prs.

One important thing I want to make sure is to have PromQL compatibility test for Thanos Query behind the query frontend to make sure no regression. Since the pr is already large I will do it in another pr, too.

@yeya24 yeya24 deleted the query-instant-tripperware branch August 18, 2022 06:23
GiedriusS pushed a commit to vinted/thanos that referenced this pull request Aug 25, 2022
…nos-io#5561)

* add query instant tripperware for query sharding

Signed-off-by: Ben Ye <ben.ye@bytedance.com>

* fix lint

Signed-off-by: Ben Ye <ben.ye@bytedance.com>

* working version for instant query sharding

Signed-off-by: Ben Ye <ben.ye@bytedance.com>

* tidy up

Signed-off-by: Ben Ye <ben.ye@bytedance.com>

* lint

Signed-off-by: Ben Ye <ben.ye@bytedance.com>

* fix unit test

Signed-off-by: Ben Ye <ben.ye@bytedance.com>

* remove ioutil

Signed-off-by: Ben Ye <ben.ye@bytedance.com>

* fix E2E test

Signed-off-by: Ben Ye <ben.ye@bytedance.com>

* add e2e test for instant query sharding

Signed-off-by: Ben Ye <ben.ye@bytedance.com>

Signed-off-by: Ben Ye <ben.ye@bytedance.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants