Why BigQuery `service.query_job` calls `service.insert_job`? #22781

alanhala · 2023-08-15T19:05:44Z

I'm using the gem in my project and I want to make a synchronous query using this endpoint https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query and I realized that if call bigquery.query_job(sql) internally uses the endpoint https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/insert. Isn't that unexpected? Here's the method:

google-cloud-ruby/google-cloud-bigquery/lib/google/cloud/bigquery/service.rb

Lines 416 to 420 in b4b683e

    
           def query_job query_job_gapi 
        
             execute backoff: true do 
        
               service.insert_job @project, query_job_gapi 
        
             end 
        
           end

The text was updated successfully, but these errors were encountered:

dazuma · 2023-08-17T18:14:35Z

That's a really good question. It looks like it's been that way for years, but it doesn't seem correct to me. Digging into this one a bit more...

dazuma · 2023-08-18T02:18:46Z

So I did some research on this. The Service#query_job method that you cite simply inserts a normal asynchronous QueryJob representing the query, and it is implemented correctly for that purpose. If you want synchronous behavior, it's simplest just to make the asynchronous call and wait for it to complete. There's a convenience method that does just that: https://cloud.google.com/ruby/docs/reference/google-cloud-bigquery/latest/Google-Cloud-Bigquery-Project#Google__Cloud__Bigquery__Project_query_instance_

Currently, the clients intentionally do not use the v2/jobs/query endpoint for synchronous jobs. This is because the performance implications are subtle, and getting the usage of that endpoint right is tricky. (See googleapis/python-bigquery#589 for a discussion around this in the Python client.)

alanhala · 2023-08-22T19:36:47Z

But why? There's even a service method for that... It doesn't seem intuitive in my opinion. Why having the exact same method that acts in the opposite way as the one in the service?

If you want synchronous behavior, it's simplest just to make the asynchronous call and wait for it to complete.

Yes, but adding HTTP calls for polling for a result adds a lot of overhead in the operation and since there is an operation for doing the query sync, why not supporting it? An operation that can just be a request and a response now it is a 3 HTTP request for the exact same response. What am I missing here?

The method you linked in the comment even does one extra API call when waiting for the results:

query_results_gapi = service.job_query_results job_id, location: location, max: 0

So if the job succeeds it doesn't get the response right there an instead you have to call it again to get the data.

dazuma added the api: bigquery Issues related to the BigQuery API. label Aug 17, 2023

dazuma closed this as completed Aug 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why BigQuery `service.query_job` calls `service.insert_job`? #22781

Why BigQuery `service.query_job` calls `service.insert_job`? #22781

alanhala commented Aug 15, 2023 •

edited

Loading

dazuma commented Aug 17, 2023

dazuma commented Aug 18, 2023

alanhala commented Aug 22, 2023

Why BigQuery service.query_job calls service.insert_job? #22781

Why BigQuery service.query_job calls service.insert_job? #22781

Comments

alanhala commented Aug 15, 2023 • edited Loading

dazuma commented Aug 17, 2023

dazuma commented Aug 18, 2023

alanhala commented Aug 22, 2023

Why BigQuery `service.query_job` calls `service.insert_job`? #22781

Why BigQuery `service.query_job` calls `service.insert_job`? #22781

alanhala commented Aug 15, 2023 •

edited

Loading