Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: Tenant pods fan out to cancel SQL queries #71503

Merged
merged 1 commit into from
Oct 19, 2021
Merged

server: Tenant pods fan out to cancel SQL queries #71503

merged 1 commit into from
Oct 19, 2021

Conversation

matthewtodd
Copy link
Contributor

Partially addresses #70832.

Previously, the tenant /_status/cancel_query endpoint was not
available over HTTP and could not cancel a query not running locally.
This change opens it up, so that we can call it from the cloud console,
and makes sure that that we can cancel queries when a tenant has more
than one SQL pod running.

Release note (api change): The tenant /_status/cancel_query endpoint
was made available over HTTP and augmented to fan out requests in a
multi-pod world.

@matthewtodd matthewtodd requested review from dhartunian, rafiss and a team October 13, 2021 01:42
@matthewtodd matthewtodd requested a review from a team as a code owner October 13, 2021 01:42
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@maryliag maryliag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as long as the tests are fixed :lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @dhartunian and @rafiss)

Copy link
Contributor

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 7 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @dhartunian, @matthewtodd, and @rafiss)


pkg/ccl/serverccl/tenant_status_test.go, line 600 at r2 (raw file):

	cancelQueryReq := serverpb.CancelQueryRequest{QueryID: query.ID}
	cancelQueryResp := serverpb.CancelQueryResponse{}
	err = httpPod1.PostJSON("/_status/cancel_query/"+session.NodeID.String(), &cancelQueryReq, &cancelQueryResp)

hmm, is the NodeID args here necessary? We would just fanout the RPC anyway right ?


pkg/ccl/serverccl/tenant_status_test.go, line 602 at r2 (raw file):

	err = httpPod1.PostJSON("/_status/cancel_query/"+session.NodeID.String(), &cancelQueryReq, &cancelQueryResp)
	require.NoError(t, err)
	require.Equal(t, true, cancelQueryResp.Canceled, cancelQueryResp.Error)

super small nit: perhaps we can format the error message a bit, e.g. "expected query to be cancelled, but encountered unexpected error %s"

Copy link
Contributor Author

@matthewtodd matthewtodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @Azhng, @dhartunian, and @rafiss)


pkg/ccl/serverccl/tenant_status_test.go, line 600 at r2 (raw file):

Previously, Azhng (Archer Zhang) wrote…

hmm, is the NodeID args here necessary? We would just fanout the RPC anyway right ?

Good call. It had been necessary before I switched to fanout (I was jamming a SQLInstanceID into that field), but, yes, now that we're fanning out, I've replaced it with a 0 here.


pkg/ccl/serverccl/tenant_status_test.go, line 602 at r2 (raw file):

Previously, Azhng (Archer Zhang) wrote…

super small nit: perhaps we can format the error message a bit, e.g. "expected query to be cancelled, but encountered unexpected error %s"

Done.

Copy link
Contributor

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @Azhng, @dhartunian, @matthewtodd, and @rafiss)


pkg/ccl/serverccl/tenant_status_test.go, line 600 at r2 (raw file):

Previously, matthewtodd (Matthew Todd) wrote…

Good call. It had been necessary before I switched to fanout (I was jamming a SQLInstanceID into that field), but, yes, now that we're fanning out, I've replaced it with a 0 here.

Ah nice. Hmm just wondering is it possible we can just leave it blank? I think 0 is an invalid SQLInstanceID right ?

Copy link
Contributor Author

@matthewtodd matthewtodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @Azhng, @dhartunian, and @rafiss)


pkg/ccl/serverccl/tenant_status_test.go, line 600 at r2 (raw file):

Previously, Azhng (Archer Zhang) wrote…

Ah nice. Hmm just wondering is it possible we can just leave it blank? I think 0 is an invalid SQLInstanceID right ?

Good question! I just looked, and the grpc gateway won't let us leave it blank. I did wonder about putting some "meaningful" value in here, but I didn't have any ideas I liked, since the handler actually ignores this field altogether. Any thoughts?

Copy link
Contributor

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @dhartunian, @matthewtodd, and @rafiss)


pkg/ccl/serverccl/tenant_status_test.go, line 600 at r2 (raw file):

Previously, matthewtodd (Matthew Todd) wrote…

Good question! I just looked, and the grpc gateway won't let us leave it blank. I did wonder about putting some "meaningful" value in here, but I didn't have any ideas I liked, since the handler actually ignores this field altogether. Any thoughts?

Oooof 😢 , this is unfortunate.

Actually, i'm not entirely sure if node_Id needs to be in the path param 🤔 . /_status/resetsqlstats has NodeID in the request payload but it doesn't have it specified in the path param. This is because it's a POST endpoint and we directly pass in the request protobuf from the caller. So perhaps that would be a way for us to drop the node_id field from the URL. Also I think for /_status/statements endpoint we have been using query params instead of the path params so that node_id will become optional.

I guess that means we also need to update the frontend code to use those as well which can be a bit annoying 😅 . So I will defer that decision to you

Copy link
Contributor Author

@matthewtodd matthewtodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @Azhng, @dhartunian, and @rafiss)


pkg/ccl/serverccl/tenant_status_test.go, line 600 at r2 (raw file):

Previously, Azhng (Archer Zhang) wrote…

Oooof 😢 , this is unfortunate.

Actually, i'm not entirely sure if node_Id needs to be in the path param 🤔 . /_status/resetsqlstats has NodeID in the request payload but it doesn't have it specified in the path param. This is because it's a POST endpoint and we directly pass in the request protobuf from the caller. So perhaps that would be a way for us to drop the node_id field from the URL. Also I think for /_status/statements endpoint we have been using query params instead of the path params so that node_id will become optional.

I guess that means we also need to update the frontend code to use those as well which can be a bit annoying 😅 . So I will defer that decision to you

Agreed! Yes, that would work. I don't see a design reason why the node_id was put in the URL. But since that's where it also lives for the dedicated (non-tenant) status server, I suggest we not change it in this pull request. SG?

Copy link
Contributor

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @Azhng, @dhartunian, and @rafiss)

Partially addresses #70832.

Previously, the tenant `/_status/cancel_query` endpoint was not
available over HTTP and could not cancel a query not running locally.
This change opens it up, so that we can call it from the cloud console,
and makes sure that that we can cancel queries when a tenant has more
than one SQL pod running.

Release note (api change): The tenant `/_status/cancel_query` endpoint
was made available over HTTP and augmented to fan out requests in a
multi-pod world.
@matthewtodd
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Oct 19, 2021

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants