Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why BigQuery service.query_job calls service.insert_job? #22781

Closed
alanhala opened this issue Aug 15, 2023 · 3 comments
Closed

Why BigQuery service.query_job calls service.insert_job? #22781

alanhala opened this issue Aug 15, 2023 · 3 comments
Labels
api: bigquery Issues related to the BigQuery API.

Comments

@alanhala
Copy link
Contributor

alanhala commented Aug 15, 2023

I'm using the gem in my project and I want to make a synchronous query using this endpoint https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query and I realized that if call bigquery.query_job(sql) internally uses the endpoint https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/insert. Isn't that unexpected? Here's the method:

def query_job query_job_gapi
execute backoff: true do
service.insert_job @project, query_job_gapi
end
end

@dazuma
Copy link
Member

dazuma commented Aug 17, 2023

That's a really good question. It looks like it's been that way for years, but it doesn't seem correct to me. Digging into this one a bit more...

@dazuma dazuma added the api: bigquery Issues related to the BigQuery API. label Aug 17, 2023
@dazuma
Copy link
Member

dazuma commented Aug 18, 2023

So I did some research on this. The Service#query_job method that you cite simply inserts a normal asynchronous QueryJob representing the query, and it is implemented correctly for that purpose. If you want synchronous behavior, it's simplest just to make the asynchronous call and wait for it to complete. There's a convenience method that does just that: https://cloud.google.com/ruby/docs/reference/google-cloud-bigquery/latest/Google-Cloud-Bigquery-Project#Google__Cloud__Bigquery__Project_query_instance_

Currently, the clients intentionally do not use the v2/jobs/query endpoint for synchronous jobs. This is because the performance implications are subtle, and getting the usage of that endpoint right is tricky. (See googleapis/python-bigquery#589 for a discussion around this in the Python client.)

@dazuma dazuma closed this as completed Aug 18, 2023
@alanhala
Copy link
Contributor Author

But why? There's even a service method for that... It doesn't seem intuitive in my opinion. Why having the exact same method that acts in the opposite way as the one in the service?

If you want synchronous behavior, it's simplest just to make the asynchronous call and wait for it to complete.

Yes, but adding HTTP calls for polling for a result adds a lot of overhead in the operation and since there is an operation for doing the query sync, why not supporting it? An operation that can just be a request and a response now it is a 3 HTTP request for the exact same response. What am I missing here?

The method you linked in the comment even does one extra API call when waiting for the results:

query_results_gapi = service.job_query_results job_id, location: location, max: 0

So if the job succeeds it doesn't get the response right there an instead you have to call it again to get the data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API.
Projects
None yet
Development

No branches or pull requests

2 participants