EA: Should IntegerNA support Inf, -Inf? #28423

jbrockmendel · 2019-09-13T01:41:20Z

It uses a boolean mask with 8 bits under the hood, so it wouldn't be too tough to implement. Doing this would make it easier to keep IntegerNA arithmetic in sync with Series arithmetic.

TomAugspurger · 2019-09-13T02:45:29Z

inf / -inf are necessarily floating point, right? Or do you have some masked-based approach in mind (which seems hard to do if we want to switch to a bit mask)?

jbrockmendel · 2019-09-13T02:50:59Z

the approach i have in mind depends on the fact that the mask is uint8. If we switch to an actual bitmask that wouldnt work

jorisvandenbossche · 2019-09-13T09:25:36Z

I don't think integer should support this. Inf/-Inf is a float concept, it makes IMO no sense to have that in an integer array conceptually.

jreback · 2019-09-13T12:22:09Z

-1 on this as it would be very very odd for an integer array to support a naturaly float (inf).

jbrockmendel · 2019-09-13T14:32:14Z

Inf/-Inf is a float concept
a naturaly float (inf).

There is nothing float-specific about inf conceptually/mathematically. What you're describing is an artifact of the existing implementations, and getting around those is the point of IntegerNA.

jorisvandenbossche · 2019-09-13T14:38:25Z

For me the main point of IntegerNA is to provide support for missing values, not necessarily for adding concepts like infinity or non-existent numbers.

But I also don't really understand the reason. You say that it would make it easier to keep IntegerNA arithmetic in sync with Series arithmetic. Can you explain this? Cases where you end up with infinity (eg division) should normally end up as float anyway?

jbrockmendel · 2019-09-13T15:54:36Z

You say that it would make it easier to keep IntegerNA arithmetic in sync with Series arithmetic. Can you explain this? Cases where you end up with infinity (eg division) should normally end up as float anyway?

Good question.

In IntegerArray._maybe_mask_result entries with np.inf or -np.inf are masked to be treated as nan. As a result, we get the following behavior:

>>> ser = pd.Series(range(-1, 2))
>>> (ser / 0)._values
array([-inf,  nan,  inf])

>>> ser2 = ser.astype("Int64")
>>> (ser2 / 0)._values
array([nan, nan, nan])

(See also: #27829.) If IntegerArray handled inf and -inf in the mask, then the second case here could avoid casting to float.

jorisvandenbossche · 2019-09-13T16:08:29Z

That's a bug I would say. But note that ser2 / 0 is not a Int64 Series, but a float64 Series, so I don't see the relationship with Int64 needing to hold inf to get the same result

jorisvandenbossche · 2019-09-13T16:10:36Z

the second case here could avoid casting to float.

Reading your comment fully now. Why would we want to avoid casting to float? I think divisions should always give float? (at least that's the rule in numpy, I think it is good to follow that to have predictable types)

jbrockmendel · 2019-09-13T16:24:09Z

I think divisions should always give float?

Fair enough, then consider floordiv. ATM the Int64 case gives back an all-zero Int64 result (will have to track down whats causing that) while the Series[int64] result is the same as for the truediv example.

TomAugspurger · 2019-09-13T20:46:22Z

I think that making _mask a bitmask is more important than supporting +/- inf.

…

On Fri, Sep 13, 2019 at 11:24 AM jbrockmendel ***@***.***> wrote: I think divisions should always give float? Fair enough, then consider floordiv. ATM the Int64 case gives back an all-zero Int64 result (will have to track down whats causing that) while the Series[int64] result is the same as for the truediv example. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#28423?email_source=notifications&email_token=AAKAOIVBXRX45LGCPW54SRLQJO5ELA5CNFSM4IWLG5N2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6VQWII#issuecomment-531303201>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAKAOIVWTTIHL2YRRSUNA63QJO5ELANCNFSM4IWLG5NQ> .

jbrockmendel · 2019-09-13T20:50:12Z

I think that making _mask a bitmask is more important than supporting +/- inf.

Agreed.

jreback · 2019-09-13T21:03:26Z

I had a PR (closed) which created a MaskArray backed by pyarrow which does exactly this (use a bit mask)

jbrockmendel · 2019-09-13T21:11:33Z

MaskArray backed by pyarrow

If we had pyarrow wouldn't we just use their implementation of IntegerNA?

jreback · 2019-09-13T21:28:19Z

MaskArray backed by pyarrow

If we had pyarrow wouldn't we just use their implementation of IntegerNA?

in theory yes, but there are too many missing methods

so the storage is useful at this point (not a lot else)

jorisvandenbossche · 2019-09-14T08:12:50Z

Personally, I am not sure a bitmask is necessarily worth if without the advantanges of going full pyarrow memory (eg it would make compatibility with future numpy masked dtypes more difficult; will need our own implementation, ..).
But anyway, that's a different discussion (worth having somewhere else), and even without the argument of future use of bitmask, I personally don't think trying to support the concept of infinity is worth the added complexity just for the current mask-based implementation, certainly given that I don't know of any other programming language that does this.

Fair enough, then consider floordiv. ATM the Int64 case gives back an all-zero Int64 result (will have to track down whats causing that) while the Series[int64] result is the same as for the truediv example.

Yep, floordiv is indeed an example of undefined behaviour in this case for ints. The zeros come from numpy (which also raises a warning in that case), I would say that the Series[int§4] behaviour is a bug (maybe that behaviour was done intentional in the past, but I would rather keep the result integer).
But still, I don't think this corner case of floordiv by 0 is worth the added complexity of a mask a non-binary meaning.

jorisvandenbossche added the ExtensionArray Extending pandas with custom dtypes or arrays. label Sep 13, 2019

jorisvandenbossche added the Needs Discussion Requires discussion from core team before further action label Sep 17, 2019

jbrockmendel closed this as completed Sep 18, 2019

jorisvandenbossche mentioned this issue Oct 7, 2019

ROADMAP: Consistent missing value handling with new NA scalar #28095

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

EA: Should IntegerNA support Inf, -Inf? #28423

EA: Should IntegerNA support Inf, -Inf? #28423

jbrockmendel commented Sep 13, 2019

TomAugspurger commented Sep 13, 2019

Uh oh!

jbrockmendel commented Sep 13, 2019

Uh oh!

jorisvandenbossche commented Sep 13, 2019

Uh oh!

jreback commented Sep 13, 2019

Uh oh!

jbrockmendel commented Sep 13, 2019

Uh oh!

jorisvandenbossche commented Sep 13, 2019

Uh oh!

jbrockmendel commented Sep 13, 2019

Uh oh!

jorisvandenbossche commented Sep 13, 2019

Uh oh!

jorisvandenbossche commented Sep 13, 2019

Uh oh!

jbrockmendel commented Sep 13, 2019

Uh oh!

TomAugspurger commented Sep 13, 2019 via email

Uh oh!

jbrockmendel commented Sep 13, 2019

Uh oh!

jreback commented Sep 13, 2019

Uh oh!

jbrockmendel commented Sep 13, 2019

Uh oh!

jreback commented Sep 13, 2019

Uh oh!

jorisvandenbossche commented Sep 14, 2019 •

edited

Loading

Uh oh!

Uh oh!

EA: Should IntegerNA support Inf, -Inf? #28423

EA: Should IntegerNA support Inf, -Inf? #28423

Comments

jbrockmendel commented Sep 13, 2019

TomAugspurger commented Sep 13, 2019

Uh oh!

jbrockmendel commented Sep 13, 2019

Uh oh!

jorisvandenbossche commented Sep 13, 2019

Uh oh!

jreback commented Sep 13, 2019

Uh oh!

jbrockmendel commented Sep 13, 2019

Uh oh!

jorisvandenbossche commented Sep 13, 2019

Uh oh!

jbrockmendel commented Sep 13, 2019

Uh oh!

jorisvandenbossche commented Sep 13, 2019

Uh oh!

jorisvandenbossche commented Sep 13, 2019

Uh oh!

jbrockmendel commented Sep 13, 2019

Uh oh!

TomAugspurger commented Sep 13, 2019 via email

Uh oh!

jbrockmendel commented Sep 13, 2019

Uh oh!

jreback commented Sep 13, 2019

Uh oh!

jbrockmendel commented Sep 13, 2019

Uh oh!

jreback commented Sep 13, 2019

Uh oh!

jorisvandenbossche commented Sep 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jorisvandenbossche commented Sep 14, 2019 •

edited

Loading