-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
EA: Should IntegerNA support Inf, -Inf? #28423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
inf / -inf are necessarily floating point, right? Or do you have some masked-based approach in mind (which seems hard to do if we want to switch to a bit mask)? |
the approach i have in mind depends on the fact that the mask is uint8. If we switch to an actual bitmask that wouldnt work |
I don't think integer should support this. Inf/-Inf is a float concept, it makes IMO no sense to have that in an integer array conceptually. |
-1 on this as it would be very very odd for an integer array to support a naturaly float (inf). |
There is nothing float-specific about inf conceptually/mathematically. What you're describing is an artifact of the existing implementations, and getting around those is the point of IntegerNA. |
For me the main point of IntegerNA is to provide support for missing values, not necessarily for adding concepts like infinity or non-existent numbers. But I also don't really understand the reason. You say that it would make it easier to keep IntegerNA arithmetic in sync with Series arithmetic. Can you explain this? Cases where you end up with infinity (eg division) should normally end up as float anyway? |
Good question. In
(See also: #27829.) If IntegerArray handled inf and -inf in the mask, then the second case here could avoid casting to float. |
That's a bug I would say. But note that |
Reading your comment fully now. Why would we want to avoid casting to float? I think divisions should always give float? (at least that's the rule in numpy, I think it is good to follow that to have predictable types) |
Fair enough, then consider floordiv. ATM the Int64 case gives back an all-zero Int64 result (will have to track down whats causing that) while the Series[int64] result is the same as for the truediv example. |
I think that making _mask a bitmask is more important than supporting +/-
inf.
…On Fri, Sep 13, 2019 at 11:24 AM jbrockmendel ***@***.***> wrote:
I think divisions should always give float?
Fair enough, then consider floordiv. ATM the Int64 case gives back an
all-zero Int64 result (will have to track down whats causing that) while
the Series[int64] result is the same as for the truediv example.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#28423?email_source=notifications&email_token=AAKAOIVBXRX45LGCPW54SRLQJO5ELA5CNFSM4IWLG5N2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6VQWII#issuecomment-531303201>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOIVWTTIHL2YRRSUNA63QJO5ELANCNFSM4IWLG5NQ>
.
|
Agreed. |
I had a PR (closed) which created a MaskArray backed by pyarrow which does exactly this (use a bit mask) |
If we had pyarrow wouldn't we just use their implementation of IntegerNA? |
in theory yes, but there are too many missing methods so the storage is useful at this point (not a lot else) |
Personally, I am not sure a bitmask is necessarily worth if without the advantanges of going full pyarrow memory (eg it would make compatibility with future numpy masked dtypes more difficult; will need our own implementation, ..).
Yep, floordiv is indeed an example of undefined behaviour in this case for ints. The zeros come from numpy (which also raises a warning in that case), I would say that the |
It uses a boolean mask with 8 bits under the hood, so it wouldn't be too tough to implement. Doing this would make it easier to keep IntegerNA arithmetic in sync with Series arithmetic.
The text was updated successfully, but these errors were encountered: