BigQuery: populate RowIterator.total_rows after running a query job #6117

yan-hic · 2018-09-26T22:50:08Z

Request to have total_rows property public, under QueryJob() along with other stats

The text was updated successfully, but these errors were encountered:

shollyman · 2018-09-27T18:29:06Z

My initial preference would be against exposing this directly as property of the QueryJob, as its part of the jobs.getQueryResults / tabledata.list response and not a member of the query stats metadata.

Is the interest in fetching the table statistics for the results without the extra fetch of destination table metadata, or are you interested in it related to progress while fetching row data?

yan-hic · 2018-09-27T19:10:47Z

The interest is to determine how many rows a query, when completed, has generated.
Fetching the destination table metadata is not valid if records are appended to an existing table - I would get the total number of rows, not the addition.

Note that totalRows exists in both jobs.getQueryResults and jobs.query. There is no need to call the former (except if job is inserted) so request is not unreasonable. Also, https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query#totalBytesProcessed made it as property of QueryJob() so why not total_rows.

shollyman · 2018-09-27T20:38:48Z

Thanks for clarifying the interest is in the effective delta.

For the case where you run a query with a destination table and a WRITE_APPEND disposition, the results (and total_rows) will represent the whole table. For mutation DML queries (e.g. UPDATE/DELETE), num_dml_affected_rows may help, but we should consider exposing something like the statistics.load.outputRows equivalent for query jobs (likely contingent on statement type).

If that sounds like what you're after, I'll request it for inclusion in the backend response. There's insufficient information in the existing response to correctly report the delta for the append case.

yan-hic · 2018-09-27T21:10:10Z

Let's cancel the request as I just suggested making the existing - and perfectly working - _query_results.total_rows a public property of QueryJob() or another object. I thought https://googlecloudplatform.github.io/google-cloud-python/latest/bigquery/generated/google.cloud.bigquery.job.QueryJob.result.html#google.cloud.bigquery.job.QueryJob.result would work but it doesn't, total_rows is None (myqueryjob.result().total_rows).
This was reported on SO too: https://stackoverflow.com/questions/48799898/bigquery-py-client-library-v0-28-fetch-result-from-table-query-job

I will just use the private _query_results.total_rows.
Thanks.

tswast · 2019-01-30T20:29:38Z

Let's write the total_rows value from the QueryResults to the RowIterator so that it's available without an additional (and awkward) API requests to tabledata.list that I do in the code sample at #7217.

yan-hic · 2019-01-30T20:33:28Z

Thanks Tim ! To add some additional context, I have also reported an unexpected behavior of the underlying totalRecords API property here

JustinBeckwith added the triage me I really want to be triaged. label Sep 27, 2018

tseaver added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. api: bigquery Issues related to the BigQuery API. and removed triage me I really want to be triaged. labels Sep 27, 2018

yan-hic closed this as completed Sep 27, 2018

yan-hic mentioned this issue Jan 30, 2019

Add sample for fetching total_rows from query results. #7217

Merged

tswast reopened this Jan 30, 2019

tswast changed the title ~~BigQuery: make QueryJob._query_results.total_rows public~~ BigQuery: populate RowIterator.total_rows after running a query job Mar 28, 2019

tswast self-assigned this Mar 28, 2019

tswast mentioned this issue Mar 30, 2019

Make total_rows available on RowIterator before iteration #7622

Merged

tswast closed this as completed in #7622 Apr 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BigQuery: populate RowIterator.total_rows after running a query job #6117

BigQuery: populate RowIterator.total_rows after running a query job #6117

yan-hic commented Sep 26, 2018

shollyman commented Sep 27, 2018

yan-hic commented Sep 27, 2018

shollyman commented Sep 27, 2018

yan-hic commented Sep 27, 2018

tswast commented Jan 30, 2019

yan-hic commented Jan 30, 2019

BigQuery: populate RowIterator.total_rows after running a query job #6117

BigQuery: populate RowIterator.total_rows after running a query job #6117

Comments

yan-hic commented Sep 26, 2018

shollyman commented Sep 27, 2018

yan-hic commented Sep 27, 2018

shollyman commented Sep 27, 2018

yan-hic commented Sep 27, 2018

tswast commented Jan 30, 2019

yan-hic commented Jan 30, 2019