Move BigQuery list_ methods to use iterators #2565

dhermes · 2016-10-19T09:02:59Z

Follow up to #2561. This only covers Dataset.list_tables() and Table.fetch_data(). It may also be "correct" to use an Iterator in

QueryResults.fetch_data
QueryResults.run (via QueryResults._build_resource)

but I wanted to get eyes on this change first / opinion from the original author (@tseaver) before moving forward.

scripts/run_pylint.py

@@ -72,7 +72,7 @@
 }
 TEST_RC_REPLACEMENTS = {
 'FORMAT': {
- 'max-module-lines': 1960,
+ 'max-module-lines': 2000,


bigquery/google/cloud/bigquery/table.py

+ page_token=page_token, max_results=max_results,
+ page_start=_rows_page_start)
+ iterator.schema = self._schema
+ # Over-ride the key used to retrieve the next page token.


bigquery/google/cloud/bigquery/_helpers.py

@@ -86,20 +86,34 @@ def _string_from_json(value, _):
 }


+def _row_from_json(row, schema):
+ """Convert JSON row data to row w/ appropriate types.


bigquery/google/cloud/bigquery/_helpers.py

+ """Convert JSON row data to row w/ appropriate types.
+
+ :type row: dict
+ :param row:


bigquery/google/cloud/bigquery/table.py

+ return _row_from_json(resource, iterator.schema)
+
+
+# pylint: disable=unused-argument


bigquery/unit_tests/test_table.py

- max_results=MAX,
- page_token=TOKEN)
+ iterator = table.fetch_data(
+ client=client2, max_results=MAX, page_token=TOKEN)


dhermes · 2016-10-19T15:56:33Z

@tseaver Can you weigh in here? I'm especially curious how you feel about fetch_data (both in Table and QueryResults) and QueryResults.run.

tseaver · 2016-10-20T16:31:56Z

I'm especially curious how you feel about fetch_data (both in Table and QueryResults) and QueryResults.run.

+0 for adding it to fetch_data: those methods currently return a 3-tuple, row_data, total_rows, page_token, which means we would have to choose what to do with the total_rows count.

-1 for adding it to run: that doesn't return row data, paged or not.

dhermes · 2016-10-20T16:36:14Z

+0 for adding it to fetch_data: those methods currently return a 3-tuple, row_data, total_rows, page_token, which means we would have to choose what to do with the total_rows count.

Check out the implementation, I put total_rows on iterator.page.

bigquery/google/cloud/bigquery/table.py

+ """
+ total_rows = response.get('totalRows')
+ if total_rows is not None:
+ page.total_rows = int(total_rows)


tseaver · 2016-10-20T17:15:14Z

LGTM, once you decide about my question about where total_rows should be exposed.

dhermes · 2016-10-20T17:25:28Z

@tseaver I'll put total_rows on both once back at keyboard. Now that you've seen the implementation, does it seem appropriate to change QueryResults.fetch_data over as well?

dhermes · 2016-10-20T18:22:51Z

@tseaver I started writing the code to put total_rows on the Iterator instance and realized something. If while paging, the first request has totalRows: 1200 and the second request somehow has the key missing, then on the second page we'd either have to del iterator.total_rows or iterator.total_rows = None or have an invalid value.

Given this scenario, it seems like my_iter.page.total_rows is the only real correct place for total_rows since it is actually a page-specific value (it may even change between pages if the backend is processing more rows between requests?)

tseaver · 2016-10-20T19:12:11Z

@dhermes For queries, the docs say totalRows is only present on a page of results if the query has already completed: we can therefore presume that i will not change.

For tables, the docs say only that totalRows represents "[t]he total number of rows in the complete table." We would have to ask the back-end team whether the page_token encodes a "timestamped" cursor which would make totalRows constant across pages, even if new values were inserted into the table in the meanwhile.

tseaver · 2016-10-25T15:01:23Z

@fhoffa Can you chime in on how stable the totalRows value is for tabledata?

In particular, isolating the logic useful to work on a single row.

Also fixing BigQuery system test in the process.

dhermes · 2016-11-01T06:21:18Z

@tseaver I put total_rows only on the Iterator and just allowed it to be None if missing in the response.

Move BigQuery list_ methods to use iterators

dhermes added the api: bigquery Issues related to the BigQuery API. label Oct 19, 2016

dhermes assigned daspecster and tseaver Oct 19, 2016

googlebot added the cla: yes This human has signed the Contributor License Agreement. label Oct 19, 2016

daspecster reviewed Oct 19, 2016

View reviewed changes

tseaver reviewed Oct 20, 2016

View reviewed changes

bigquery/google/cloud/bigquery/table.py

"""

total_rows = response.get('totalRows')

if total_rows is not None:

page.total_rows = int(total_rows)

This comment was marked as spam.

Sign in to view

tseaver added the backend label Oct 25, 2016

dhermes added 4 commits October 31, 2016 22:30

Making BigQuery dataset.list_tables() into an iterator.

5b0f9c8

Refactoring _rows_from_json BigQuery helper.

092dd09

In particular, isolating the logic useful to work on a single row.

Making BigQuery table.fetch_data() into an iterator.

0e7b01f

Rebase fixes.

d3a6dff

dhermes force-pushed the bigquery-iterators-2 branch from e439de8 to d3a6dff Compare November 1, 2016 05:53

Review feedback: moving total_rows from Page to Iterator.

76fa05c

Also fixing BigQuery system test in the process.

tseaver approved these changes Nov 1, 2016

View reviewed changes

dhermes merged commit 4b4023f into googleapis:master Nov 1, 2016

dhermes deleted the bigquery-iterators-2 branch November 1, 2016 15:55

dhermes mentioned this pull request Nov 14, 2016

Upgrading core to version to 0.21.0. #2733

Merged

richkadel pushed a commit to richkadel/google-cloud-python that referenced this pull request May 6, 2017

Merge pull request googleapis#2565 from dhermes/bigquery-iterators-2

9d10874

Move BigQuery list_ methods to use iterators

theacodes unassigned daspecster Sep 28, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move BigQuery list_ methods to use iterators #2565

Move BigQuery list_ methods to use iterators #2565

dhermes commented Oct 19, 2016

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

dhermes commented Oct 19, 2016

tseaver commented Oct 20, 2016

dhermes commented Oct 20, 2016

This comment was marked as spam.

tseaver commented Oct 20, 2016

dhermes commented Oct 20, 2016

dhermes commented Oct 20, 2016

tseaver commented Oct 20, 2016

tseaver commented Oct 25, 2016

dhermes commented Nov 1, 2016

		return _row_from_json(resource, iterator.schema)


		# pylint: disable=unused-argument

Move BigQuery list_ methods to use iterators #2565

Move BigQuery list_ methods to use iterators #2565

Conversation

dhermes commented Oct 19, 2016

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

dhermes commented Oct 19, 2016

tseaver commented Oct 20, 2016

dhermes commented Oct 20, 2016

This comment was marked as spam.

tseaver commented Oct 20, 2016

dhermes commented Oct 20, 2016

dhermes commented Oct 20, 2016

tseaver commented Oct 20, 2016

tseaver commented Oct 25, 2016

dhermes commented Nov 1, 2016