-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix!: use nullable Int64
and boolean
dtypes in to_dataframe
#786
Merged
gcf-merge-on-green
merged 17 commits into
googleapis:v3
from
tswast:b144712110-nullable-pandas-types
Aug 16, 2021
Merged
Changes from all commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
76d88f4
feat!: use nullable types like float and Int64 by default in `to_data…
tswast f2223e9
add test data for all scalar columns
tswast 07ed871
add test data for all scalar columns
tswast 66ce732
Merge branch 'b144712110-nullable-pandas-types' of github.com:tswast/…
tswast 21d4369
update tests with expected dtypes
tswast 69a747f
add expected types, REST test
tswast 4f78e6d
use dtype defaults for "easy" cases
tswast 62a57bd
Merge remote-tracking branch 'upstream/v3' into b144712110-nullable-p…
tswast d53aa68
add interval
tswast d17e637
Merge remote-tracking branch 'upstream/v3' into b144712110-nullable-p…
tswast 6ceff2c
WIP: split TIME and DATE into separate issues
tswast 18152d9
WIP: unit tests
tswast 2e957cd
add tests, update minimum pandas version
tswast 8f90c51
add unit test for repeated fields
tswast 187a950
Merge branch 'v3' into b144712110-nullable-pandas-types
plamut 3155dab
Address docs nits
tswast 189404c
Merge remote-tracking branch 'origin/b144712110-nullable-pandas-types…
tswast File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -567,7 +567,7 @@ def test_query_results_to_dataframe(bigquery_client): | |
for _, row in df.iterrows(): | ||
for col in column_names: | ||
# all the schema fields are nullable, so None is acceptable | ||
if not row[col] is None: | ||
if not pandas.isna(row[col]): | ||
assert isinstance(row[col], exp_datatypes[col]) | ||
|
||
|
||
|
@@ -597,7 +597,7 @@ def test_query_results_to_dataframe_w_bqstorage(bigquery_client): | |
for index, row in df.iterrows(): | ||
for col in column_names: | ||
# all the schema fields are nullable, so None is acceptable | ||
if not row[col] is None: | ||
if not pandas.isna(row[col]): | ||
assert isinstance(row[col], exp_datatypes[col]) | ||
|
||
|
||
|
@@ -795,3 +795,71 @@ def test_list_rows_max_results_w_bqstorage(bigquery_client): | |
dataframe = row_iterator.to_dataframe(bqstorage_client=bqstorage_client) | ||
|
||
assert len(dataframe.index) == 100 | ||
|
||
|
||
@pytest.mark.parametrize( | ||
("max_results",), ((None,), (10,),) # Use BQ Storage API. # Use REST API. | ||
) | ||
def test_list_rows_nullable_scalars_dtypes(bigquery_client, scalars_table, max_results): | ||
df = bigquery_client.list_rows( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note to self: I'll need to exclude the INTERVAL column next time we sync with master |
||
scalars_table, max_results=max_results, | ||
).to_dataframe() | ||
|
||
assert df.dtypes["bool_col"].name == "boolean" | ||
assert df.dtypes["datetime_col"].name == "datetime64[ns]" | ||
assert df.dtypes["float64_col"].name == "float64" | ||
assert df.dtypes["int64_col"].name == "Int64" | ||
assert df.dtypes["timestamp_col"].name == "datetime64[ns, UTC]" | ||
|
||
# object is used by default, but we can use "datetime64[ns]" automatically | ||
# when data is within the supported range. | ||
# https://github.com/googleapis/python-bigquery/issues/861 | ||
assert df.dtypes["date_col"].name == "object" | ||
|
||
# object is used by default, but we can use "timedelta64[ns]" automatically | ||
# https://github.com/googleapis/python-bigquery/issues/862 | ||
assert df.dtypes["time_col"].name == "object" | ||
|
||
# decimal.Decimal is used to avoid loss of precision. | ||
assert df.dtypes["bignumeric_col"].name == "object" | ||
assert df.dtypes["numeric_col"].name == "object" | ||
|
||
# pandas uses Python string and bytes objects. | ||
assert df.dtypes["bytes_col"].name == "object" | ||
assert df.dtypes["string_col"].name == "object" | ||
|
||
|
||
@pytest.mark.parametrize( | ||
("max_results",), ((None,), (10,),) # Use BQ Storage API. # Use REST API. | ||
) | ||
def test_list_rows_nullable_scalars_extreme_dtypes( | ||
bigquery_client, scalars_extreme_table, max_results | ||
): | ||
df = bigquery_client.list_rows( | ||
scalars_extreme_table, max_results=max_results | ||
).to_dataframe() | ||
|
||
# Extreme values are out-of-bounds for pandas datetime64 values, which use | ||
# nanosecond precision. Values before 1677-09-21 and after 2262-04-11 must | ||
# be represented with object. | ||
# https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timestamp-limitations | ||
assert df.dtypes["date_col"].name == "object" | ||
assert df.dtypes["datetime_col"].name == "object" | ||
assert df.dtypes["timestamp_col"].name == "object" | ||
|
||
# These pandas dtypes can handle the same ranges as BigQuery. | ||
assert df.dtypes["bool_col"].name == "boolean" | ||
assert df.dtypes["float64_col"].name == "float64" | ||
assert df.dtypes["int64_col"].name == "Int64" | ||
|
||
# object is used by default, but we can use "timedelta64[ns]" automatically | ||
# https://github.com/googleapis/python-bigquery/issues/862 | ||
assert df.dtypes["time_col"].name == "object" | ||
|
||
# decimal.Decimal is used to avoid loss of precision. | ||
assert df.dtypes["numeric_col"].name == "object" | ||
assert df.dtypes["bignumeric_col"].name == "object" | ||
|
||
# pandas uses Python string and bytes objects. | ||
assert df.dtypes["bytes_col"].name == "object" | ||
assert df.dtypes["string_col"].name == "object" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit)
Since already at this, there's at least on other occurrence of "python" not capitalized (line 69), which can also be fixed.