Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery: populate RowIterator.total_rows after running a query job #6117

Closed
yan-hic opened this issue Sep 26, 2018 · 6 comments · Fixed by #7622
Closed

BigQuery: populate RowIterator.total_rows after running a query job #6117

yan-hic opened this issue Sep 26, 2018 · 6 comments · Fixed by #7622
Assignees
Labels
api: bigquery Issues related to the BigQuery API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@yan-hic
Copy link

yan-hic commented Sep 26, 2018

Request to have total_rows property public, under QueryJob() along with other stats

@JustinBeckwith JustinBeckwith added the triage me I really want to be triaged. label Sep 27, 2018
@shollyman
Copy link
Contributor

My initial preference would be against exposing this directly as property of the QueryJob, as its part of the jobs.getQueryResults / tabledata.list response and not a member of the query stats metadata.

Is the interest in fetching the table statistics for the results without the extra fetch of destination table metadata, or are you interested in it related to progress while fetching row data?

@tseaver tseaver added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. api: bigquery Issues related to the BigQuery API. and removed triage me I really want to be triaged. labels Sep 27, 2018
@yan-hic
Copy link
Author

yan-hic commented Sep 27, 2018

The interest is to determine how many rows a query, when completed, has generated.
Fetching the destination table metadata is not valid if records are appended to an existing table - I would get the total number of rows, not the addition.

Note that totalRows exists in both jobs.getQueryResults and jobs.query. There is no need to call the former (except if job is inserted) so request is not unreasonable. Also, https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query#totalBytesProcessed made it as property of QueryJob() so why not total_rows.

@shollyman
Copy link
Contributor

Thanks for clarifying the interest is in the effective delta.

For the case where you run a query with a destination table and a WRITE_APPEND disposition, the results (and total_rows) will represent the whole table. For mutation DML queries (e.g. UPDATE/DELETE), num_dml_affected_rows may help, but we should consider exposing something like the statistics.load.outputRows equivalent for query jobs (likely contingent on statement type).

If that sounds like what you're after, I'll request it for inclusion in the backend response. There's insufficient information in the existing response to correctly report the delta for the append case.

@yan-hic
Copy link
Author

yan-hic commented Sep 27, 2018

Let's cancel the request as I just suggested making the existing - and perfectly working - _query_results.total_rows a public property of QueryJob() or another object. I thought https://googlecloudplatform.github.io/google-cloud-python/latest/bigquery/generated/google.cloud.bigquery.job.QueryJob.result.html#google.cloud.bigquery.job.QueryJob.result would work but it doesn't, total_rows is None (myqueryjob.result().total_rows).
This was reported on SO too: https://stackoverflow.com/questions/48799898/bigquery-py-client-library-v0-28-fetch-result-from-table-query-job

I will just use the private _query_results.total_rows.
Thanks.

@tswast
Copy link
Contributor

tswast commented Jan 30, 2019

Let's write the total_rows value from the QueryResults to the RowIterator so that it's available without an additional (and awkward) API requests to tabledata.list that I do in the code sample at #7217.

@yan-hic
Copy link
Author

yan-hic commented Jan 30, 2019

Thanks Tim ! To add some additional context, I have also reported an unexpected behavior of the underlying totalRecords API property here

@tswast tswast changed the title BigQuery: make QueryJob._query_results.total_rows public BigQuery: populate RowIterator.total_rows after running a query job Mar 28, 2019
@tswast tswast self-assigned this Mar 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants