fix: `read_gbq` supports extreme DATETIME values such as `0001-01-01 00:00:00` #444

tswast · 2021-12-06T22:58:00Z

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)

Fixes #365 🦕

deps: require google-cloud-bigquery 1.26.1 or later

This reverts commit 2a76982.

tswast · 2021-12-07T21:42:56Z

I think we need to remove the datetime64[ns] values from the dtypes map, since they don't map to the full allowed range of values in BigQuery. Instead, do a post-processing step with a couple of fallbacks depending on out-of-bounds errors (assuming the user hasn't already manually overridden the dtypes).

python-bigquery-pandas/pandas_gbq/gbq.py

Line 583 in e13abaf

dtype_map = {

…o-query

tswast · 2021-12-10T23:04:42Z

ci/requirements-3.7-0.24.2.conda

 fastavro
 flake8
 numpy==1.16.6
-google-cloud-bigquery==1.11.1
+google-cloud-bigquery==1.26.1


Needed for date_as_object parameter

…imes

…ecessary

…imes

…tetimes

tseaver · 2021-12-31T17:17:31Z

pandas_gbq/gbq.py

        "FLOAT": np.dtype(float),
-        "GEOMETRY": "object",


I'm not sure how the changes here to non-datetime-related mappings relates to this PR.

If these changes are intentional, then the comment above seems to require a corresponding update to docs/reading.rst.

object types were removed because it's the default anyway.

tseaver · 2021-12-31T17:20:09Z

setup.py

@@ -28,12 +28,13 @@
    "pandas >=0.24.2",
    "pyarrow >=3.0.0, <7.0dev",
    "pydata-google-auth",
-    "google-api-core >=1.14.0",
+    "google-api-core >=1.21.0",


Seems unrelated, and unmotivated by anything in the changelog for that release.

Correct. Needed to update due to updating the minimum google-cloud-bigquery, though.

We do use google-api-core directly, so I think it makes sense to include here still.

tseaver · 2021-12-31T17:22:19Z

testing/constraints-3.7.txt

-google-api-core==1.14.0
-google-auth==1.4.1
+google-api-core==1.21.0
+google-auth==1.18.0


Doesn't match the minimum constraint in setup.py.

Updated setup.py. Needed this version due to updated google-api-core (via google-cloud-bigquery)

…tetimes

Fixes prerelease test run

shollyman

LGTM, though all this version checking feels mildly terrifying.

shollyman · 2022-01-05T20:47:24Z

pandas_gbq/gbq.py


    for field in schema_fields:
-        column = str(field["name"])
+        # This method doesn't modify ARRAY/REPEATED columns.


Does this imply a TODO for later, or is the nature of pandas such that arrays are just always an object that gets no special processing?

Potential TODO, but such a low priority I don't think it's worth calling out. Now that we have https://github.com/googleapis/python-db-dtypes-pandas we have more flexibility in terms of creating dtypes that are more efficient than Python object columns. Though in this case, I'm not sure we'd have any better of an approach than https://github.com/xhochy/fletcher

tswast · 2022-01-05T20:59:08Z

LGTM, though all this version checking feels mildly terrifying.

Yeah, for sure... I'd very much like to give our pandas-gbq users as wide a set of versions as possible, though. Those folks are often stuck in (notebook) environments with some core dependencies locked.

🤖 I have created a release *beep* *boop* --- ## [0.17.0](v0.16.0...v0.17.0) (2022-01-19) ### ⚠ BREAKING CHANGES * use nullable Int64 and boolean dtypes if available (#445) ### Features * accepts a table ID, which downloads the table without a query ([#443](#443)) ([bf0e863](bf0e863)) * use nullable Int64 and boolean dtypes if available ([#445](#445)) ([89078f8](89078f8)) ### Bug Fixes * `read_gbq` supports extreme DATETIME values such as `0001-01-01 00:00:00` ([#444](#444)) ([d120f8f](d120f8f)) * `to_gbq` allows strings for DATE and floats for NUMERIC with `api_method="load_parquet"` ([#423](#423)) ([2180836](2180836)) * allow extreme DATE values such as `datetime.date(1, 1, 1)` in `load_gbq` ([#442](#442)) ([e13abaf](e13abaf)) * avoid iteritems deprecation in pandas prerelease ([#469](#469)) ([7379cdc](7379cdc)) * use data project for destination in `to_gbq` ([#455](#455)) ([891a00c](891a00c)) ### Miscellaneous Chores * release 0.17.0 ([#470](#470)) ([29ac8c3](29ac8c3)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).

tswast added 4 commits December 6, 2021 14:13

fix: read out-of-bounds DATETIME values such as 0001-01-01 00:00:00

933d470

deps: require google-cloud-bigquery 1.26.1 or later

feat: accepts a table ID, which downloads the table without a query

9a9d3fd

revert tests for read_gbq fix which isn't yet resolved

2a76982

Revert "revert tests for read_gbq fix which isn't yet resolved"

4695c5f

This reverts commit 2a76982.

product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery-pandas API. label Dec 6, 2021

add todo for next steps

6adf233

tswast added 11 commits December 9, 2021 12:42

Merge remote-tracking branch 'upstream/main' into issue266-read_gbq-n…

73a791a

…o-query

add unit test for table ID read_gbq

9b1eb0d

add helper for is_query

ec9ddaf

implement read_gbq with table id

9cc7c74

fix remaining tests, don't localalize out-of-bounds timestamp columns

dd51ad8

Update pandas_gbq/gbq.py

e1ad679

fix 3.7 unit tests

d29bc2a

correct coverage

cb8f24f

skip coverage for optional test skip

56b73b2

fix docs build

8a61e97

improve test coverage for error case

3f7900b

tswast commented Dec 10, 2021

View reviewed changes

tswast added 6 commits December 10, 2021 17:06

Merge branch 'issue266-read_gbq-no-query' into issue365-extreme-datet…

ae3e044

…imes

as of google-cloud-bigquery 1.11.0, get_table before list_rows is unn…

3c53f1f

…ecessary

Merge branch 'issue266-read_gbq-no-query' into issue365-extreme-datet…

c98982d

…imes

refactor tests

f0acde6

add more scalars

362a26d

add more types

752d67c

tswast mentioned this pull request Dec 14, 2021

Remove pandas-gbq from testing coiled/dask-bigquery#31

Merged

1 task

add failing time test

5b46127

tswast mentioned this pull request Dec 15, 2021

convert time columns to dbtime by default in to_dataframe googleapis/python-bigquery#862

Closed

tswast added 2 commits December 15, 2021 16:09

add test for bignumeric

254f6a0

add test for null values

c0780b6

tswast added 8 commits December 29, 2021 11:02

Merge remote-tracking branch 'upstream/main' into issue365-extreme-da…

cd6ae70

…tetimes

add failing test for desired fix

11126a6

fix the issue with extreme datetimes

14e6070

fix constraints

8f92d9b

fix tests for empty dataframe

9985d15

fix tests for older google-cloud-bigquery

6fb73a2

ignore index on empty dataframe

8cc4524

add db-dtypes to runtime import checks

a0d6cad

tswast marked this pull request as ready for review December 30, 2021 22:53

tswast requested a review from a team December 30, 2021 22:53

tswast requested a review from a team as a code owner December 30, 2021 22:53

tswast requested a review from stephaniewang526 December 30, 2021 22:53

tseaver reviewed Dec 31, 2021

View reviewed changes

tswast added 4 commits January 4, 2022 09:55

Merge remote-tracking branch 'upstream/main' into issue365-extreme-da…

dfa6942

…tetimes

document dependencies

82c5362

remove TODO, since done

de4a06e

remove unnecessary special case for empty dataframe

9fc8c08

Fixes prerelease test run

tswast requested review from tseaver and shollyman January 4, 2022 16:53

shollyman approved these changes Jan 5, 2022

View reviewed changes

remove redundant 'deprecated' from comment

c5c0e85

tswast added the automerge Merge the pull request once unit tests and other checks pass. label Jan 5, 2022

tswast merged commit d120f8f into googleapis:main Jan 5, 2022

gcf-merge-on-green bot removed the automerge Merge the pull request once unit tests and other checks pass. label Jan 5, 2022

tswast deleted the issue365-extreme-datetimes branch January 5, 2022 22:17

release-please bot mentioned this pull request Jan 11, 2022

chore(main): release 1.0.0 #461

Closed

tswast mentioned this pull request Jan 18, 2022

read_gbq() raises AttributeError for TIMESTAMP values out of bounds for pd.Timestamp #468

Closed

release-please bot mentioned this pull request Jan 19, 2022

chore(main): release 0.17.0 #472

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: `read_gbq` supports extreme DATETIME values such as `0001-01-01 00:00:00` #444

fix: `read_gbq` supports extreme DATETIME values such as `0001-01-01 00:00:00` #444

tswast commented Dec 6, 2021 •

edited

Loading

tswast commented Dec 7, 2021

tswast Dec 10, 2021

tseaver Dec 31, 2021

tswast Jan 4, 2022

tseaver Dec 31, 2021

tswast Jan 4, 2022

tseaver Dec 31, 2021

tswast Jan 4, 2022

shollyman left a comment

shollyman Jan 5, 2022

tswast Jan 5, 2022

tswast commented Jan 5, 2022

fix: read_gbq supports extreme DATETIME values such as 0001-01-01 00:00:00 #444

fix: read_gbq supports extreme DATETIME values such as 0001-01-01 00:00:00 #444

Conversation

tswast commented Dec 6, 2021 • edited Loading

tswast commented Dec 7, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shollyman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tswast commented Jan 5, 2022

fix: `read_gbq` supports extreme DATETIME values such as `0001-01-01 00:00:00` #444

fix: `read_gbq` supports extreme DATETIME values such as `0001-01-01 00:00:00` #444

tswast commented Dec 6, 2021 •

edited

Loading