BigQuery integration incorrect for list type columns #16938
Labels
A-io-parquet
Area: reading/writing Parquet files
bug
Something isn't working
needs triage
Awaiting prioritization by a maintainer
python
Related to Python Polars
Checks
Reproducible example
If I write a
list[int]
type field (let's saylinked_ids
) from Polars to BQ with the example codeWhat you get is a record named "linked_ids" within which is a record named "list" within which is a repeated integer field named "item"!
I can of course create this manually: create a BigQuery table with an integer column and an array of integers column
and write to it
Log output
No response
Issue description
I can see that this comes from
pyarrow.parquet.write_table
but this causes a major discontinuity between Polars and BigQuery.It's well known that list field types are one of Polars' main USPs, so this is an equally major hurdle to encounter when uploading to a database source which is compatible with array-type fields.
Expected behavior
I'd like some way to upload list-type columns, and am unclear on what the best approach to get that here is.
To my understanding the point of streaming parquet is for compression and automatic schema application. Here the schema is essentially being lost in transit.
Perhaps the right approach is to go via JSON [without schema interference]? 🤔
I appreciate this is in a dependency but it's also being suggested as how to use this library so I think it falls in the remit of Polars development to consider how to make it work, so I hope it's not seen as out of place to raise it in this issue tracker. For me it's a major usability concern.
Installed versions
The text was updated successfully, but these errors were encountered: