Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: dataframe_to_json_generator doesn't support pandas.NA type #729

Closed
LinuxChristian opened this issue Jun 29, 2021 · 3 comments · Fixed by #750
Closed

Bug: dataframe_to_json_generator doesn't support pandas.NA type #729

LinuxChristian opened this issue Jun 29, 2021 · 3 comments · Fixed by #750
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@LinuxChristian
Copy link
Contributor

Environment details

  • OS type and version: Ubuntu 20.04.2
  • Python version: 3.9.4
  • pip version: 21.1.2
  • google-cloud-bigquery version: 2.20.0

Steps to reproduce

  1. Run the code example below

Code example

import pandas
from google.cloud import bigquery

df = pandas.DataFrame({
    "series_a": [1, 2, pandas.NA]
})

json_iter = bigquery._pandas_helpers.dataframe_to_json_generator(df)
for row in json_iter:
    print(row)

Stack trace

{'series_a': 1}
{'series_a': 2}
Traceback (most recent call last):
  File "/home/christian/code/bug_example.py", line 11, in <module>
    for row in json_iter:
  File "/home/christian/miniconda3/envs/data-services-prod/lib/python3.9/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 783, in dataframe_to_json_generator
    if value != value:
  File "pandas/_libs/missing.pyx", line 360, in pandas._libs.missing.NAType.__bool__
TypeError: boolean value of NA is ambiguous

Suggested fix

Starting with pandas 1.0, an experimental pandas.NA value (singleton) is available to represent scalar missing values as
opposed to numpy.nan. Comparing the variable with itself (value != value) results in a TypeError as the pandas.NA value doesn't support type-casting to boolean.

I am planning to make a PR that switches the syntax value != value on _pandas_helpers.py#L783 to use the pandas.isna function but wanted to check if there is a better solution before I submit a patch?

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Jun 29, 2021
@plamut plamut added priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Jun 30, 2021
@tswast
Copy link
Contributor

tswast commented Jul 8, 2021

Yes, I agree that's the problematic line. I'd like NaN to continue to serve the same purpose, but happy to also support NA.

@plamut
Copy link
Contributor

plamut commented Jul 10, 2021

I can probably have a closer look at this some time next week.

@LinuxChristian
Copy link
Contributor Author

I completely forgot to submit my PR after opening this issue, sorry about that. I looked through the code and switching to pandas.isna seemed like the obvious solution since the function only works for pandas.DataFrame objects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants