-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking βSign up for GitHubβ, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add Series|Expr.is_finite
method
#1341
Conversation
narwhals/_pandas_like/series.py
Outdated
return self._from_native_series( | ||
np.isfinite(self._native_series) & ~self._native_series.isna() | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a opinionated choice that na is not finite
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
π€ no sure, wouldn't we want to preserve null values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Behavior is different for different pandas backend dtype. Let me come back with an example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm actually, for classical pandas types, we wouldn't have the option of returning a nullable boolean (if we want to preserve the dtype backend)
π€ gonna think about this a little longer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These would be the output:
data = [float("nan"), float("inf"), 2.0, None]
s = pd.Series(data)
np.isfinite(s)
0 False
1 False
2 True
3 False
dtype: bool
np.isfinite(s.convert_dtypes(dtype_backend="numpy_nullable"))
0 <NA>
1 False
2 True
3 <NA>
dtype: boolean
np.isfinite(s.convert_dtypes(dtype_backend="pyarrow"))
0 False
1 False
2 True
3 False
dtype: bool
While for polars:
pl.Series(data).is_finite()
shape: (4,)
Series: '' [bool]
[
false
false
true
null
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MarcoGorelli considering the new page we have on booleans, how would you move forward with this?
I am asking just to discuss it, I am ok with keeping the inconsistencies between different pandas dtype backends, and let the use handle those. At the same time, I would aim at some sort of unification towards polars behavior, although in this specific context it seems unfeasible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it work to do (s > float('-inf')) & (s < float('inf'))
?
like this we'd preserve nulls for nullable dtype backends, and we'd get False
for the classical numpy ones, which would be in line with the rest of the boolean document
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I will resolve conflicts with main and adjust
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just noticed that we're already somewhat inconsistent when the resulting column is boolean: In [6]: nw.from_native(pd.DataFrame(data)).select(nw.col('a')>1).to_native()
Out[6]:
a
0 False
1 False
2 True
In [7]: nw.from_native(pl.DataFrame(data)).select(nw.col('a')>1).to_native()
Out[7]:
shape: (3, 1)
βββββββββ
β a β
β --- β
β bool β
βββββββββ‘
β false β
β null β
β true β
βββββββββ This may be fine, not sure there's too much we can do to work around this, but we probably just need a page alongside https://narwhals-dev.github.io/narwhals/other/column_names/ to explain what to expect from boolean columns |
I'd put up a page about booleans - if we can agree on it, then I think it unblocks this PR #1392 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @FBruzzesi !
What type of PR is this? (check all applicable)
Related issues
Series|Expr.is_finite
#1297Checklist
If you have comments or can explain your changes, please do so below.
As mentioned in the issue itself, pandas and dask treat nan's and null's as same. Actually, even worse, for non nullable backend,
np.isfinite
returns False and for nullable-backends will return<NA>
. I made the opinionated choice to be consistent across different pandas backends and always return False for nulls and nans. I hope the warning in the docstring is enough