Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: bq_to_arrow_field in _pandas_helper.py always sets pyarrow.field to nullable #1998

Closed
xyloid opened this issue Aug 17, 2024 · 0 comments · Fixed by #1999
Closed

BUG: bq_to_arrow_field in _pandas_helper.py always sets pyarrow.field to nullable #1998

xyloid opened this issue Aug 17, 2024 · 0 comments · Fixed by #1999
Assignees
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@xyloid
Copy link
Contributor

xyloid commented Aug 17, 2024

Current implementation of bq_to_arrow_field always sets pyarrow.field to nullable. However, since BigQuery array can not be set to nullable, the following error shows up when trying to upload a pandas dataframe to a bigquery table that contains a field in repeated mode.

Stack trace

Traceback (most recent call last):

  File "/project_path/field_trie.py", line 426, in <module>

    main()

  File "/project_path/field_trie.py", line 363, in main

    job = client.load_table_from_dataframe(df, table)

          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/../lib/python3.12/site-packages/google/cloud/bigquery/client.py", line 2793, in load_table_from_dataframe

    _pandas_helpers.dataframe_to_parquet(

  File "/../lib/python3.12/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 669, in dataframe_to_parquet

    arrow_table = dataframe_to_arrow(dataframe, bq_schema)

                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/../lib/python3.12/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 611, in dataframe_to_arrow

    bq_to_arrow_array(get_column_or_index(dataframe, bq_field.name), bq_field)

  File "/../lib/python3.12/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 319, in bq_to_arrow_array

    return pyarrow.ListArray.from_pandas(series, type=arrow_type)

           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "pyarrow/array.pxi", line 1115, in pyarrow.lib.Array.from_pandas

  File "pyarrow/array.pxi", line 339, in pyarrow.lib.array

  File "pyarrow/array.pxi", line 85, in pyarrow.lib._ndarray_to_array

  File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status

pyarrow.lib.ArrowInvalid: Cannot append scalar of type struct<order_id: int64, order_date: date32[day], event_timestamps: list<item: timestamp[us, tz=UTC]> not null, items: list<item: struct<item_id: int64, item_name: string, item_price: double>> not null> to builder for type struct<order_id: int64, order_date: date32[day], event_timestamps: list<item: timestamp[us, tz=UTC]>, items: list<item: struct<item_id: int64, item_name: string, item_price: double>>>
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Aug 17, 2024
@Linchin Linchin added priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants