feat: add `Series|Expr.is_finite` method #1341

FBruzzesi · 2024-11-09T20:21:24Z

What type of PR is this? (check all applicable)

Related issues

Related issue #
Closes [Enh]: Add support for Series|Expr.is_finite #1297

Checklist

Code follows style guide (ruff)
Tests added
Documented the changes

If you have comments or can explain your changes, please do so below.

As mentioned in the issue itself, pandas and dask treat nan's and null's as same. Actually, even worse, for non nullable backend, np.isfinite returns False and for nullable-backends will return <NA>. I made the opinionated choice to be consistent across different pandas backends and always return False for nulls and nans. I hope the warning in the docstring is enough

tests/expr_and_series/is_finite_test.py

FBruzzesi · 2024-11-12T21:34:52Z

narwhals/_pandas_like/series.py

+        return self._from_native_series(
+            np.isfinite(self._native_series) & ~self._native_series.isna()
+        )


Here is a opinionated choice that na is not finite

🤔 no sure, wouldn't we want to preserve null values?

Behavior is different for different pandas backend dtype. Let me come back with an example

hmmm actually, for classical pandas types, we wouldn't have the option of returning a nullable boolean (if we want to preserve the dtype backend)

🤔 gonna think about this a little longer

These would be the output:

data = [float("nan"), float("inf"), 2.0, None] s = pd.Series(data) np.isfinite(s) 0 False 1 False 2 True 3 False dtype: bool

np.isfinite(s.convert_dtypes(dtype_backend="numpy_nullable")) 0 <NA> 1 False 2 True 3 <NA> dtype: boolean

np.isfinite(s.convert_dtypes(dtype_backend="pyarrow")) 0 False 1 False 2 True 3 False dtype: bool

While for polars:

pl.Series(data).is_finite() shape: (4,) Series: '' [bool] [ false false true null ]

@MarcoGorelli considering the new page we have on booleans, how would you move forward with this?
I am asking just to discuss it, I am ok with keeping the inconsistencies between different pandas dtype backends, and let the use handle those. At the same time, I would aim at some sort of unification towards polars behavior, although in this specific context it seems unfeasible

does it work to do (s > float('-inf')) & (s < float('inf'))?

like this we'd preserve nulls for nullable dtype backends, and we'd get False for the classical numpy ones, which would be in line with the rest of the boolean document

Looks good, I will resolve conflicts with main and adjust

EdAbati

Thanks! Given the difference with NaN/Null across backends, your implementation looks good to me 👌👌

(backends arguing about nulls)

MarcoGorelli · 2024-11-16T11:40:59Z

just noticed that we're already somewhat inconsistent when the resulting column is boolean:

In [6]: nw.from_native(pd.DataFrame(data)).select(nw.col('a')>1).to_native()
Out[6]: 
       a
0  False
1  False
2   True

In [7]: nw.from_native(pl.DataFrame(data)).select(nw.col('a')>1).to_native()
Out[7]: 
shape: (3, 1)
┌───────┐
│ a     │
│ ---   │
│ bool  │
╞═══════╡
│ false │
│ null  │
│ true  │
└───────┘

This may be fine, not sure there's too much we can do to work around this, but we probably just need a page alongside https://narwhals-dev.github.io/narwhals/other/column_names/ to explain what to expect from boolean columns

MarcoGorelli · 2024-11-17T10:10:11Z

I'd put up a page about booleans - if we can agree on it, then I think it unblocks this PR #1392

MarcoGorelli

thanks @FBruzzesi !

FBruzzesi added 5 commits November 1, 2024 19:13

is-finite for eager

Loading
Loading status checks…

89fe3f4

add dask and test

Loading
Loading status checks…

cc3e722

Merge branch 'main' into feat/is-finite

7aff971

pandas treat nulls as nan

Loading
Loading status checks…

f603f6b

rm dask from series warning

Loading
Loading status checks…

28d329a

github-actions bot added the enhancement label Nov 9, 2024

xfail py38 pandas_pyarrow

Loading
Loading status checks…

46f7baa

FBruzzesi commented Nov 12, 2024

View reviewed changes

tests/expr_and_series/is_finite_test.py Outdated Show resolved Hide resolved

FBruzzesi commented Nov 12, 2024

View reviewed changes

FBruzzesi added 2 commits November 12, 2024 22:36

pin numpy instead

Loading
Loading status checks…

d6afe01

nevermind its pandas version to pin

Loading
Loading status checks…

40a894d

EdAbati approved these changes Nov 12, 2024

View reviewed changes

FBruzzesi added 2 commits November 18, 2024 09:21

feedback adjustments

Loading
Loading status checks…

34d354e

Marco's way

Loading
Loading status checks…

139233f

MarcoGorelli approved these changes Nov 18, 2024

View reviewed changes

MarcoGorelli merged commit 2784596 into main Nov 18, 2024
22 checks passed

MarcoGorelli deleted the feat/is-finite branch November 18, 2024 10:26

FBruzzesi mentioned this pull request Nov 18, 2024

refactored CheckNumericMixin azukds/tubular#339

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add `Series|Expr.is_finite` method #1341

feat: add `Series|Expr.is_finite` method #1341

FBruzzesi commented Nov 9, 2024

FBruzzesi Nov 12, 2024

MarcoGorelli Nov 13, 2024

FBruzzesi Nov 13, 2024

MarcoGorelli Nov 13, 2024

FBruzzesi Nov 13, 2024 •

edited

Loading

FBruzzesi Nov 17, 2024

MarcoGorelli Nov 17, 2024

FBruzzesi Nov 18, 2024

EdAbati left a comment

MarcoGorelli commented Nov 16, 2024

MarcoGorelli commented Nov 17, 2024

MarcoGorelli left a comment

feat: add Series|Expr.is_finite method #1341

feat: add Series|Expr.is_finite method #1341

Conversation

FBruzzesi commented Nov 9, 2024

What type of PR is this? (check all applicable)

Related issues

Checklist

If you have comments or can explain your changes, please do so below.

FBruzzesi Nov 12, 2024

Choose a reason for hiding this comment

MarcoGorelli Nov 13, 2024

Choose a reason for hiding this comment

FBruzzesi Nov 13, 2024

Choose a reason for hiding this comment

MarcoGorelli Nov 13, 2024

Choose a reason for hiding this comment

FBruzzesi Nov 13, 2024 • edited Loading

Choose a reason for hiding this comment

FBruzzesi Nov 17, 2024

Choose a reason for hiding this comment

MarcoGorelli Nov 17, 2024

Choose a reason for hiding this comment

FBruzzesi Nov 18, 2024

Choose a reason for hiding this comment

EdAbati left a comment

Choose a reason for hiding this comment

MarcoGorelli commented Nov 16, 2024

MarcoGorelli commented Nov 17, 2024

MarcoGorelli left a comment

Choose a reason for hiding this comment

feat: add `Series|Expr.is_finite` method #1341

feat: add `Series|Expr.is_finite` method #1341

FBruzzesi Nov 13, 2024 •

edited

Loading