-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pandas casting int64 to float64, misrepresenting value #8225
Comments
Issue-Label Bot is automatically applying the label Links: app homepage, dashboard and code for this bot. |
I've been wrestling with something similar lately (unrelated data wrangling), and ended up having to bypass pandas completely, as having Nones in columns messed up the |
I was able to fix this by passing a dtype constructed based on the cursor description, but then PyArrow fails to serialize the resulting Pandas dataframe, sigh: |
This is still an issue for non-Presto databases. |
Support for PyArrow serialization of Pandas Int64 dtypes is currently merged to master in both repos, but not yet released on PyPi: pandas-dev/pandas@34fff1f Also requires converting the pandas Dataframe to an arrow Table prior to serialization:
|
I have the following data being returned by Presto (single column, 6 rows):
Due to the missing data (
None
), Pandas infers the type asfloat64
, converting the value to a wrong id:The number then shows up as
1239162456494753800
in SQL Lab.Here's the Pandas documentation on this:
Note that if the missing data is filtered the value is inferred as an int64, and it shows up correctly in SQL Lab:
The solution is to pass a
dtype
argument when creating the Pandas data frame, built from the cursor description. I'm working on a fix for this.The text was updated successfully, but these errors were encountered: