-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Collection of inconsistencies in .astype conversions #37626
Comments
pls run these in master and not an older version |
My mistake. I updated the post. |
What are the expected behaviours for For the former I would have assumed that this raises unless the For the latter, this line ensures that only zeros and ones can be casted from |
Another one: pd.Series([1, 2, None], dtype="Int64").to_numpy().astype(float) (throws error) |
Another one >>> import pandas as pd # version 1.3.2 (current release)
>>> pd.DataFrame({"x": [1, pd.NA]}, index = [1])['x'].astype(float)
TypeError: float() argument must be a string or a number, not 'NAType' |
I have a use case where (automatic) casting between the following
pandas
dtypes is necessary;bool
,boolean
,int64
,Int64
,float64
,object
andstring
.Note that
boolean
,Int64
andstring
are the new pandas 1.0 nullable dtypes.The default approach for this would be
series.astype(target_dtype)
, fortarget_dtype
one of the above dtypes as strings. This works (given no issues with missings) but for the inconsistencies below:pd.NA
tofloat
:Summary: Casting
float(pd.NA)
raises aTypeError
(as does castingfloat(None)
). Whilenp.array([None, "1"]).astype("float")
works (and gets called here), the same call withpd.NA
fails.Edit: Fixed with #37974.
object
/string
toInt64
(nullable)Summary:
object
columns cannot be casted toInt64
. Castingobject
->string
->Int64
works.object
->float
->Int64
works if the data does not containpd.NA
(see above)object
->int64
->Int64
works if the data does not contain missings.Related: #25472 (comment)
string
/object
tobool
orboolean
Summary: Casting
string
orobject
columns tobool
orboolean
behaves strangely. I am not sure what the expected behaviour forstring
/object
tobool
/boolean
should be. It would be nice to have consistent behaviour.string
/object
->bool
works if there are no missings, but yields onlyTrue
string
/object
->boolean
raisesint
(non-nullable) toboolean
Summary: Casting from (non-nullable)
int64
to (nullable)boolean
raises.int64
->Int64
->boolean
worksint64
->bool
->boolean
works as long as there are no missings.Related: #37614
While there exist separate issues for the first and last report, I gathered that it might be nice to have a collection of these somewhere, which I did not find.
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: