feat: add `job_id`, `location`, `project`, and `query_id` properties on `RowIterator` #1733

tswast · 2023-11-17T20:59:43Z

These can be used to recover the original job metadata when RowIterator is the result of a QueryJob. This will be especially important after #1722 where query_and_wait() won't return an actual QueryJob object.

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)

Towards internal issue 305260214 and related to #589
🦕

…on `RowIterator` These can be used to recover the original job metadata when `RowIterator` is the result of a `QueryJob`.

google/cloud/bigquery/job/query.py

Linchin · 2023-11-17T22:15:05Z

google/cloud/bigquery/table.py

@@ -1723,7 +1766,7 @@ def to_arrow_iterable(

        bqstorage_download = functools.partial(
            _pandas_helpers.download_arrow_bqstorage,
-            self._project,
+            self._bqstorage_project,


Looks like we now have two possible project values, does this mean a RowIterator can correspond to different projects for storage APIs compared to other APIs?

Linchin · 2023-11-17T22:18:39Z

google/cloud/bigquery/table.py

+    @property
+    def _bqstorage_project(self) -> Optional[str]:
+        """GCP Project ID where BQ Storage API will bill to (if applicable)."""
+        client = self.client


I wonder what's the difference between self.client.project and self._project. When the client isn't of storage type, does self.client.project still reflect the storage project id?

The difference in other contexts is usually described as "billing project" versus "data project". I've renamed this private property to reflect that. This is easiest to understand in the context of public datasets.

I have permission to query tables in the bigquery-public-data project and even download the data in bulk with the BigQuery Storage Read API, but I don't have permission to run queries in that project or start a BigQuery Storage Read API session from that project. Instead, I have start the query or bq storage read session in my own project and reference the tables in the other project.

Wow I see! so self._project is the project for the data/table, and self.client.project a.k.a. billing project is the user's own project. So I guess this billing project doesn't just apply to storage?

That's true, though for queries we're using jobs.getQueryResults, so the project will always be the billing project since that's where temp results for queries are stored.

tabledata.list is free, but technically we should be overriding the x-goog-user-project header so that the quota is charged to the billing project. (I don't think we're actually doing that, though)

google/cloud/bigquery/table.py

Co-authored-by: Lingqing Gan <lingqing.gan@gmail.com>

chalmerlowe · 2023-11-19T13:19:24Z

tests/unit/test_table.py

@@ -2113,6 +2113,38 @@ def test_constructor_with_dict_schema(self):
        ]
        self.assertEqual(iterator.schema, expected_schema)

+    def test_job_id_missing(self):
+        rows = self._make_one()
+        self.assertIsNone(rows.job_id)


@tswast

Just curious about testing philosophy.
For this collection of tests that are pretty much the same (i.e. all the tests that .assertIsNone()) is there a benefit in your mind to having each test broken out separately versus providing a parameterized list of inputs and expected outputs?

I matched the existing pattern in this test. I do think we could probably combine a lot of these into a single parametrized test.

I worry slightly about the required getattr(property_name) being too complex for new contributors to understand. In general tests should avoid anything even as complex as an if statement. https://testing.googleblog.com/2014/07/testing-on-toilet-dont-put-logic-in.html In this case, getattr is a grey area. I wouldn't necessarily count it as "logic"

…on `RowIterator` (googleapis#1733) * feat: add `job_id`, `location`, `project`, and `query_id` properties on `RowIterator` These can be used to recover the original job metadata when `RowIterator` is the result of a `QueryJob`. * rename bqstorage_project to billing project * Update google/cloud/bigquery/table.py Co-authored-by: Lingqing Gan <lingqing.gan@gmail.com> --------- Co-authored-by: Lingqing Gan <lingqing.gan@gmail.com>

feat: add job_id, location, project, and query_id properties …

1f9dd1e

…on `RowIterator` These can be used to recover the original job metadata when `RowIterator` is the result of a `QueryJob`.

tswast requested review from a team as code owners November 17, 2023 20:59

tswast requested a review from farhan0102 November 17, 2023 20:59

product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery API. labels Nov 17, 2023

tswast requested review from Linchin and chalmerlowe and removed request for farhan0102 November 17, 2023 21:00

Linchin reviewed Nov 17, 2023

View reviewed changes

google/cloud/bigquery/job/query.py Show resolved Hide resolved

Linchin reviewed Nov 17, 2023

View reviewed changes

rename bqstorage_project to billing project

18abf49

Linchin reviewed Nov 17, 2023

View reviewed changes

google/cloud/bigquery/table.py Outdated Show resolved Hide resolved

Update google/cloud/bigquery/table.py

1a07c4e

Co-authored-by: Lingqing Gan <lingqing.gan@gmail.com>

Linchin approved these changes Nov 18, 2023

View reviewed changes

Linchin merged commit 494f275 into main Nov 18, 2023
20 checks passed

Linchin deleted the b305260214-rowiterator-job-metadata branch November 18, 2023 00:44

release-please bot mentioned this pull request Nov 18, 2023

chore(main): release 3.14.0 #1709

Merged

chalmerlowe reviewed Nov 19, 2023

View reviewed changes

This was referenced Dec 5, 2023

December 04, 2023 kitta65/bq-extension-vscode#257

Closed

December 04, 2023 kitta65/prettier-plugin-bq#264

Closed

December 04, 2023 kitta65/bq2cst#274

Closed

This was referenced Dec 19, 2023

December 18, 2023 kitta65/bq-extension-vscode#262

Closed

December 18, 2023 kitta65/prettier-plugin-bq#269

Closed

December 18, 2023 kitta65/bq2cst#279

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add `job_id`, `location`, `project`, and `query_id` properties on `RowIterator` #1733

feat: add `job_id`, `location`, `project`, and `query_id` properties on `RowIterator` #1733

tswast commented Nov 17, 2023 •

edited

Loading

Linchin Nov 17, 2023

Linchin Nov 17, 2023

tswast Nov 17, 2023 •

edited

Loading

Linchin Nov 17, 2023

tswast Nov 17, 2023

tswast Nov 17, 2023

chalmerlowe Nov 19, 2023

tswast Nov 20, 2023

feat: add job_id, location, project, and query_id properties on RowIterator #1733

feat: add job_id, location, project, and query_id properties on RowIterator #1733

Conversation

tswast commented Nov 17, 2023 • edited Loading

Linchin Nov 17, 2023

Choose a reason for hiding this comment

Linchin Nov 17, 2023

Choose a reason for hiding this comment

tswast Nov 17, 2023 • edited Loading

Choose a reason for hiding this comment

Linchin Nov 17, 2023

Choose a reason for hiding this comment

tswast Nov 17, 2023

Choose a reason for hiding this comment

tswast Nov 17, 2023

Choose a reason for hiding this comment

chalmerlowe Nov 19, 2023

Choose a reason for hiding this comment

tswast Nov 20, 2023

Choose a reason for hiding this comment

feat: add `job_id`, `location`, `project`, and `query_id` properties on `RowIterator` #1733

feat: add `job_id`, `location`, `project`, and `query_id` properties on `RowIterator` #1733

tswast commented Nov 17, 2023 •

edited

Loading

tswast Nov 17, 2023 •

edited

Loading