-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BigQuery: use union categorical to concatenate pages in to_dataframe
when categorical dtype is requested
#8044
Comments
@tswast I know you have some stuff in work related to the |
If the size of the query results is large, then there are multiple pages to parse. Since we construct a dataframe for each page and then concatenate them, the results may not be as expected. Per: http://pandas.pydata.org/pandas-docs/stable/user_guide/categorical.html#concatenation When two categorical series are concatenated with different types, they get converted to the "object" I'd love if there was a way to support this feature without having to add special logic for different Related: per googleapis/python-bigquery-pandas#275 (comment) there are more immediate ways that we can decrease peak memory usage. I've filed #8107 to track that issue separately. |
to_dataframe
when categorical dtype is requested
I updated the issue title to reflect that this issue is specifically for the categorical dtype. #10027 which refactors I suggest we at least add tests with categorical dtypes before closing this issue. |
FWIW, the peak memory usage can be cut down by a lot using the bqstorage API (benchmark from today). I will have a look at the remaining scope of this issue (the dtypes). |
I am working on an enhancement for
pandas-gbq
, where we would like to reduce the memory footprint in pandas through downcasting. See googleapis/python-bigquery-pandas#275 for the full details.I have made a first working version and getting erratic behaviour in the conversion of STRING to pandas
category
. It seems like it has to do with the size of the query results.See attached notebook for the code example.
pandas-gbq-bugtracing.ipynb.zip
The text was updated successfully, but these errors were encountered: