-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
API/BUG: Series(floating, dtype=intlike) ignores dtype, DataFrame casts #40110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
No idea what the reason would be if it was intentional. So I would be fine with changing it. I am only thinking if there is some way to deprecate it (which doesn't seem easy without introducing an extra keyword just to manager the deprecation ..), since it is a breaking change (actaul numbers would change) |
It seems this changed behaviour before, as a long time ago we actually honored the dtype (I checked pd 0.20) |
xref #26919 |
Tried changing this to only disallow float-int casting that included NaNs, but it turns out we explicitly test that Is there reasoning/precedent for why we would treat list[float] differently from ndarray[float]? cc @jreback |
only recollection is that we didn't want to silently cast floats to ints like numpy (but agreed this is a bit odd that we allow for array but not list) |
any preference for changing/deprecating one or the other? |
yeah i think we should raise |
hmm not what i expected. in most cases we think of |
update: Index behaves like Series in this case |
This actually gets worse:
The latter behavior we have a specific test for added in #21456 @pandas-dev/pandas-core thoughts on making this consistent? |
Instead of raising, when the specified dtype could result in data loss (e.g. float -> int), does a warning make sense? That way it is visible to interactive users and effectively an error for a production job running with -Werror or in a test framework. |
@jreback for the most part we like to have |
I would still have this axiom hold. My point is that (and maybe this is a bigger change / deprecation), casting to ints (not extension) should fail if we have NaN (i think we don this now in all cases) or if these are not exactly the same. This would be 'safe' casting. IOW the numpy behavior i view as really bad actully (if you want to round/floor/ceil, then that should be explicit). So I think that both construction & astype should fail here. |
totally agreed. the case in question here is where we have e.g. |
I think we should raise on this as well. This is a big departure from numpy, but IMHO totally unexpected to directly cast these to int should raise. (wether by constror dtype or via .astype) |
OK, ive got a branch almost ready that changes the Series constructor (though not DataFrame, which is a separate branch to ...). I'd be OK with deprecating instead of directly changing, too. |
yeah i think have to deprecate |
I don't have a strong opinion here, but one question: What's the performance impact of verifying / validating that we should raise? Does this have to look at each value, or is it equivalent to |
It is not equivalent to
I've opened numpy/numpy#19146 to discuss how to implement this (well, the uint analogue of this) more efficiently. |
In most contexts, Series is strict about dtype, so will always either return the given dtype or raise. DataFrame is the opposite, often silently ignoring dtype (xref #24435) (i think on the theory that dtype may be intended to apply to some columns but not others).
With floating data and integer dtypes, its the opposite:
We have exactly one test that is broken if we change the latter behavior, and that is mostly by coincidence. There are other bugs (e.g.
Series(bigints, dtype="int8")
silently overflowing) that would be easier to fix if maintaining this behavior weren't a consideration.Is this intentional? cc @jreback @jorisvandenbossche @TomAugspurger
The text was updated successfully, but these errors were encountered: