Skip to content

Series doesn't implement floor/ceil ops (+EA support needed) #26892

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ghost opened this issue Jun 16, 2019 · 13 comments
Closed

Series doesn't implement floor/ceil ops (+EA support needed) #26892

ghost opened this issue Jun 16, 2019 · 13 comments

Comments

@ghost
Copy link

ghost commented Jun 16, 2019

In [19]: import pandas as pd
    ...: s=pd.Series([1,2])
    ...: s.floor() 
AttributeError: 'Series' object has no attribute 'floor'
@ghost ghost changed the title Numeric Series support common floor/ceil operations Series doesn't implement floor/ceil ops Jun 17, 2019
@TomAugspurger
Copy link
Contributor

I'd prefer users just .apply(np.floor). I don't think these are broadly useful enough to warrant a method, especially as they don't work for all dtypes.

@ghost
Copy link
Author

ghost commented Jun 18, 2019

Reasonable, but another case to consider for EA.

s=pd.Series(DecimalArray(to_decimal([1,2,3])))
s.apply(np.floor)

pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
   3705         with np.errstate(all='ignore'):
   3706             if isinstance(f, np.ufunc):
-> 3707                 return f(self)
   3708 
   3709             # row-wise access

AttributeError: 'decimal.Decimal' object has no attribute 'floor'

I think this calls __array__ on the array, which converts ito tobject,
and then the np.floor ufunc sees an object, and tries to all floor on it,
which fails. and yet math.floor(decimal) returns an int, (which might
be also be cast to decimal by the EA).

So I think right now, s.apply(np.floor) doesn't work for EA,
and can't without changes to pandas.

@ghost ghost changed the title Series doesn't implement floor/ceil ops Series doesn't implement floor/ceil ops (+EA support needed) Jun 18, 2019
@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jun 18, 2019 via email

@ghost
Copy link
Author

ghost commented Jun 18, 2019

Well, I was using Decimal as a concrete example. EA of its "kind" aren't handled gracfully in this situation
in that s.apply(np.floor) works for numpy dtypes but not EA.

More generally, a user could apply a floor function appropriate for their data.

Yes, but in the case of something like pint, that means accessing an internal array
two attribute levels deep, and casting back to the proper EA, and wrapping by a Series ctor,
or something about as cumbersome. I though the purpose of EA was to have them as
seamless as "native" pandas dtypes.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jun 18, 2019 via email

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jun 18, 2019 via email

@ghost
Copy link
Author

ghost commented Jun 18, 2019

A specific EA may or may not work with NumPy’s API.

whether or it chooses to implement a numpy api is a separate issue. Whether pandas actually invokes that implementation when the series method is called is another. And I'm talking about
the latter.

In particular, all series methods/ops which work by calling np.asarray(self) or relying on the default EA.__array__ (which calls np.asarray(self.array)) Series.__array__, fail this test. an EA may override __array__ but numpy enforces ndarray type on the result, so the dtype information is necessarily lost.

I think apply fails this test. And not only np.floor(s) fail for an EA, but also np.cos(s) and s.apply(np.cos). The other trig functions are the same.

@TomAugspurger
Copy link
Contributor

Does apply call asarray?

@ghost
Copy link
Author

ghost commented Jun 19, 2019

Series.apply(ufunc) calls ufunc(self) which invokes Series.__array__ to get something it get work on, which calls np.asarray(self.array, dtype). The result is an object ndarray.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jun 19, 2019 via email

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jun 19, 2019

As a short demo, consider the following incomplete Series.__array_ufunc__.

diff --git a/pandas/core/series.py b/pandas/core/series.py
index eaef1f525..2bc1dfe1a 100644
--- a/pandas/core/series.py
+++ b/pandas/core/series.py
@@ -701,6 +701,11 @@ class Series(base.IndexOpsMixin, generic.NDFrame):
     # ----------------------------------------------------------------------
     # NDArray Compat
 
+    def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
+        arrays = tuple(x.array for x in inputs)
+        result = getattr(ufunc, method)(*arrays)
+        return self._constructor(result, index=self.index).__finalize__(self)
+
     def __array__(self, dtype=None):
         """
         Return the values as a NumPy array.

The basic idea is to extract the arrays from the series, apply the ufunc, and box the result in a Series. For Series[Sparse], we dispatch to SparseArray which implements __array_ufunc__

In [1]: import pandas as pd; import numpy as np

In [2]: s = pd.Series(pd.SparseArray([-1, 0, 1]))

In [3]: np.sin(s)
Out[3]:
0   -0.841471
1    0.000000
2    0.841471
dtype: Sparse[float64, 0.0]

In [4]: s.apply(np.sin)
Out[4]:
0   -0.841471
1    0.000000
2    0.841471
dtype: Sparse[float64, 0.0]

On master, that's dense.

@ghost
Copy link
Author

ghost commented Jun 19, 2019

Yes, that's fine in that if the EA implements its own __array_ufunc__ it can take over. But I'm very unhappy with the patchwork Frankenstein way in which different classes of methods rely on different mechanisms for EA authors to implement them. While other (cumprod, for example) some simply don't allow it. See how horrible it looks when documented: #26918.

I've opened #26935 suggesting something better needs to take shape.

@TomAugspurger
Copy link
Contributor

See how horrible it looks when documented

Please don't dismiss the work we've put into it.

@ghost ghost closed this as completed Jul 8, 2019
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant