-
Notifications
You must be signed in to change notification settings - Fork 476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shall we create a QuantityArray class? #1128
Comments
I read through #875, #764 and #753 yesterday and found these super helpful background resources. Unfortunately they made things seem very complicated and I couldn't see an obvious way forward. The big conflict I saw was that, on the one hand, numpy's separation of scalar and array classes is seen as a 'design mistake', whereas on the other hand, not having the separation is causing headaches with Pandas (particularly with how Pandas decides whether something is iterable or not). At risk of poking the hornet's nest, perhaps the simplest solution is to enquire in Pandas about whether https://github.com/pandas-dev/pandas/blob/4743cc91719146167eefb43865413fc0b0aef1d4/pandas/_libs/lib.pyx#L992 could be updated so that it checks |
@znicholls Those discussions are indeed useful. I am pinging the people that were involved. @jthielen @crusaderky @shoyer @keewis |
Please feel free to invite more. Anyone with contacts in the pandas community? |
As discussed before, I am against splitting the class in two. I simply see no benefit. |
@andrewgsavage has done much more work on this than me, but maybe these thoughts are helpful nonetheless... The big problem is with how pandas decides whether something is iterable or not. As @hgrecco shows here, pandas determines whether something cdef inline bint c_is_list_like(object obj, bint allow_sets) except -1:
return (
isinstance(obj, abc.Iterable)
# we do not count strings/unicode/bytes as list-like
and not isinstance(obj, (str, bytes))
# exclude zero-dimensional numpy arrays, effectively scalars
and not (util.is_array(obj) and obj.ndim == 0)
# exclude sets if allow_sets is False
and not (allow_sets is False and isinstance(obj, abc.Set))
) At the moment, all The problem is that the A numpy array suffers from the same issue which is where the So, the options seem to be:
I'm happy to open an issues in Pandas if people think that's a good option (it certainly looks the easier path to me). |
I would definitely suggest bringing up this issue in the pandas GitHub
issues.
…On Sat, Jul 4, 2020 at 7:32 PM Zeb Nicholls ***@***.***> wrote:
@andrewgsavage <https://github.com/andrewgsavage> has done much more work
on this than me, but maybe these thoughts are helpful nonetheless...
The big problem is with how pandas decides whether something is iterable
or not. As @hgrecco <https://github.com/hgrecco> shows here
<hgrecco/pint-pandas#33 (comment)>,
pandas determines whether something is_list_like here
<https://github.com/pandas-dev/pandas/blob/4743cc91719146167eefb43865413fc0b0aef1d4/pandas/_libs/lib.pyx#L992>
(code copied below).
cdef inline bint c_is_list_like(object obj, bint allow_sets) except -1:
return (
isinstance(obj, abc.Iterable)
# we do not count strings/unicode/bytes as list-like
and not isinstance(obj, (str, bytes))
# exclude zero-dimensional numpy arrays, effectively scalars
and not (util.is_array(obj) and obj.ndim == 0)
# exclude sets if allow_sets is False
and not (allow_sets is False and isinstance(obj, abc.Set))
)
At the moment, all Quantity instances have an __iter__ method. It is
designed to raise a TypeError if the quantity isn't actually iterable
(code here
<https://github.com/hgrecco/pint/blob/4ab68ba0a00f96ed68fdf084233b61d68127c461/pint/quantity.py#L242>
).
The problem is that the __iter__ method is defined, regardless of whether
the Quantity instance is actually iterable or not. This is a problem
because it means that, in all cases, isinstance(quantity_instance,
abc.Iterable) returns True (even if quantity_instance isn't actually
iterable and would return a TypeError as soon as one tried to iterate
with it).
A numpy array suffers from the same issue which is where the not
(util.is_array(obj) and obj.ndim == 0) in Panda's is_list_like is
important, because it excludes scalars. The problem is that
util.is_array(obj) returns False if obj is anything other than np.ndarray.
As a result, we can't just add an ndim attribute to Quantity because the obj.ndim
== 0 will never be evaluated anyway.
So, the options seem to be:
1. Split Quantity into scalar and array so that scalars don't have an
__iter__ method and Pandas correctly recognises them as not being
list-like.
2. See if the Pandas API can be updated in some way so that
zero-dimensional, non-numpy arrays are also correctly recognised as not
being iterable.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1128 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJJFVSAGA32KFVLRQTVPFLRZ7Q23ANCNFSM4OOG6UDQ>
.
|
Thanks @shoyer. Whilst others haven't commented yet, for reference I think the most relevant Pandas issue is pandas-dev/pandas#24702 but it's not the same as the one we're having here |
I had a quick play to see what it would take to fix this. It looks like a simple change, but more experienced eyes would be very welcome! See draft PR here pandas-dev/pandas#35127 |
I just want to add that magic methods are always looked up in the class (not in the instance) and therefore it is not possible to add it or remove it depending on the type of |
Ya that seems the easiest path and is preferable to splitting classes up, go for it! |
Ok done in pandas-dev/pandas#35131 and pandas-dev/pandas#35127. I'm not sure if we need to ping anyone to get those moving? |
Looks like they're moving. @andrewgsavage we got a question about places where |
We should remain aligned to the decision of the core numpy developers where they chose to use the same class for scalar and vectorized ndarrays. Pandas should be changed to use >>> class C:
... def __iter__(self):
... # Match behaviour of numpy.ndarray.__iter__
... raise TypeError("iteration over scalar object")
>>> isinstance(C(), abc.Iterable)
True
>>> numpy.iterable(C())
False |
Having fiddled around with pandas-dev/pandas#35127 a bit more, and tried using that branch with pint-pandas, it's clear that we'd need to update the behaviour of pandas'
|
While there seem to be some arguments against a split Why shouldn't we have
Note in particular that a failure in To clarify, if Here is a bit of code to verify those findings above: class MockArray:
def __init__(self, values):
self._values = values
def __repr__(self):
return f"{type(self).__name__}({self._values})"
class MockArrayStrip(MockArray):
def __array__(self, t=None):
return np.asarray(self._values)
class MockArrayRaise(MockArray):
def __array__(self, t=None):
raise TypeError("forbidden")
for mock in [
MockArray(1.),
MockArrayStrip(1.),
MockArrayRaise(1.),
]:
for orig in [mock, [mock]]:
for dt in [None, "float", "object"]:
try:
out = np.asarray(orig,dtype=dt)
except Exception as ex:
out = str(ex)
print(f"np.asarray({orig!r:21},dtype={dt!r:8}) -> {out!r}") |
I have just released
Pint-Pandas 0.1
So first, let me publicly thank @andrewgsavage, @znicholls and @sigvaldm for the awesome work you have done to code, discuss and push this package. Congratulations!I have also released
Pint 0.14
to support a few last minute changes inPint-Pandas
.Pint-Pandas
has exposed (one more time) an old discussion: Do we need aQuantityScalar
andQuantityArray
classes? This was something on the table at the beginning of Pint. The conclusion was to avoid it for simplicity and we found a very nice way to do it without sacrifizing functionality. From very early releases we were able to support a large number of NumPy functions without requiering numpy and to have a nice way to write array quantities ([1, 2, 3] * ureg.meter
). We also made the choice to support but doe not require numpy. I still think this was the right call and we should try to stay on that path.Pint-Pandas has exposed a some cases in which having a
QuantityArray
seems to be unavoidable to fully support Pandas functionality with Pint. The question is ... shall we do it? If yes, how to do it in a way which is transparent to the normal user?PS.- Thanks again to the people behind Pint-Pandas.
The text was updated successfully, but these errors were encountered: