BigQuery: load_table_from_dataframe should use a temporary file #7543
Labels
api: bigquery
Issues related to the BigQuery API.
type: bug
Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
load_table_from_dataframe currently uses BytesIO when it serializes a pandas dataframe to parquet before uploading it via a load job. This is actually violating the contract for to_parquet, which requires a filepath. BytesIO happens to work when pyarrow is used but not with fastparquet. A more minor reason we may wish to serialize to disk is that dataframe can sometimes be quite large, so spilling to disk would be preferable to filling up memory. Note: the function should clean up after itself by removing the temp file after the load job completes.
The text was updated successfully, but these errors were encountered: