-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BigQuery: 'load_table_from_dataframe' raises OSError with STRUCT / RECORD columns. #9024
Comments
@dabasmoti Is there another exception being raised when that |
@tseaver - No, |
I'm getting the same issue on mac.
|
Do you have write permissions to those temp directories? We originally started using tempfiles because fastparquet does not support in-memory file objects, but I wonder if there are systems in which tempfiles cannot be created? |
Note: |
@tswast - what version should i use? |
I have to mention that the error occur only when use type dict in the dataframe column |
I am running as admin |
0.14.1 and 0.13.0 are good releases of pyarrow.
Thank you for mentioning this. STRUCT / RECORD columns are not yet supported by the pandas connector. https://github.com/googleapis/google-cloud-python/issues/8191 Neither are ARRAY / REPEATED columns, unfortunately. https://github.com/googleapis/google-cloud-python/issues/8544 Those issues are currently blocked on improvements to the Parquet file serialization logic. @plamut Can you investigate this further? Hopefully pyarrow can provide an exception that we can catch when trying to write a table with unsupported data types to a parquet file. If no exception is thrown, perhaps we need to check for these and raise a ValueError? |
TL; DR - I was able to reproduce the reported behavior. Using the posted code and the following dataframe: data = {
"uid_first": "1001",
"agg_col": [
{"page_type": 1},
{"record_type": 1},
{"non_consectutive_home": 0},
]
}
df = pandas.DataFrame(data=data) I got the following traceback in Python 3.6:
Trying the same with Python 2.7, I only got the second part of the traceback, i.e. the That was with We could try catching this error in Edit:
More recent versions of |
@plamut - I am using python 3.7 |
@dabasmoti I see, let me try with Python 3.7, too, just in case ... although the outcome should probably be the same. Update:
... which is then followed by the |
pyarrow-0.14.0
pandas '0.24.2'
windows 10
Hi,
I am tring to load dataframe to big query that looks like that
the agg_col is list of dicts
I also tried dict
Schema config:
load command
The error message
The text was updated successfully, but these errors were encountered: