Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantity.copy and Quantity.mean not idempotent #678

Closed
dschien opened this issue Aug 21, 2018 · 7 comments
Closed

Quantity.copy and Quantity.mean not idempotent #678

dschien opened this issue Aug 21, 2018 · 7 comments
Labels
numpy Numpy related bug/enhancement

Comments

@dschien
Copy link

dschien commented Aug 21, 2018

Hi,

for some reason I need to use pandas dataframes with multiindex, instead of numpy arrays.
These dataframes have the structure

time        samples
2016-01-01  0          2
            1          2
2016-02-01  0          2
            1          2
2016-03-01  0          2
            1          2
2016-04-01  0          2
            1          2
2016-05-01  0          2
            1          2
2016-06-01  0          2
            1          2
2016-07-01  0          2
            1          2
2016-08-01  0          2
            1          2
2016-09-01  0          2
            1          2
2016-10-01  0          2
            1          2
2016-11-01  0          2
            1          2
2016-12-01  0          2
            1          2
2017-01-01  0          2
            1          2
Name: test, dtype: int64

Everything works, except mean() and copy() alter the Quantity object - which I would not expect at all.

import pandas as pd
import numpy as np

ureg = UnitRegistry(auto_reduce_dimensions=False)
ureg.default_format = '~H'
Q_ = ureg.Quantity

x = 2
values = np.full((len(pd.date_range('2016-01-01', '2016-02-01', freq='MS')), 2), x)
df = pd.DataFrame(values.ravel())
iterables = [pd.date_range('2016-01-01', '2016-02-01', freq='MS'), range(0, 2)]
index_names = ['time', 'samples']
_multi_index = pd.MultiIndex.from_product(iterables, names=index_names)

df.set_index(_multi_index, inplace=True)

d = df
assert d.index.names == ['time', 'samples']

e = Q_(d, ureg.meters)
assert e.m.index.names == ['time', 'samples']

c = e.copy()
assert e.m.index.names == ['time', 'samples']
e.mean()
assert e.m.index.names == ['time', 'samples']

The above fails at the penultimate and final assertions with AttributeError: 'numpy.ndarray' object has no attribute 'index'. In other words, copy and mean alter the Quantity instance. That is a problem for me.

@dschien
Copy link
Author

dschien commented Aug 22, 2018

The offending line in question is in quantity.py where the magnitude is forced to ndarray.

@hgrecco
Copy link
Owner

hgrecco commented Aug 22, 2018

Try to change the line to make a copy a operate on that instead of working on the same object and run the tests. By the way, there is a very interesting PR for pandas and Pint #672

@dschien
Copy link
Author

dschien commented Aug 22, 2018

@hgrecco I am not quite sure what you are suggesting.
I am making the copy of my quantity.

e = Q_(d, ureg.meters)
e.copy()
assert e.m.index.names == ['time', 'samples']
>>> AttributeError: 'numpy.ndarray' object has no attribute 'index'.

The handling of copy in __getattr__ converts to ndarray and re-assigns the value to the instance.

self._magnitude = _to_magnitude(self._magnitude, True)

I looked at a couple of the methods handled through __getattr__ in this way, for all of them, I would have expected to return a new instance of quantity or some other result, but none of them should change the instance.

Even if it is necessary to force the magnitude to an ndarray - why no apply the method in question to a copy?

@hgrecco
Copy link
Owner

hgrecco commented Aug 22, 2018

sorry if I was not clear. I meant to avoid that reassignment.

@dschien
Copy link
Author

dschien commented Sep 12, 2018

@hgrecco - this tripped me up again. It is a really subtle bug. I still am not sure what reassignment you refer to in your most recent response.

I don't believe you meant, I should avoid calling c = e.copy() or e.mean(), correct? So, what reassignment should I avoid?

@hgrecco
Copy link
Owner

hgrecco commented Dec 3, 2019

Revisit after #905

@hgrecco
Copy link
Owner

hgrecco commented Dec 24, 2019

I am closing this as it is taken care by #925

@hgrecco hgrecco closed this as completed Dec 24, 2019
bors bot added a commit that referenced this issue Dec 27, 2019
957: Add parameterized test for type immutability r=hgrecco a=jthielen

As discussed in #925, this adds a parameterized test to verify that the internal type is not mutated under common operations (as encountered in #399, #481, #509, #622, #678).

- [x] Closes #925, Closes #481
- [x] Executed ``black -t py36 . && isort -rc . && flake8`` with no errors
- [x] The change is fully covered by automated unit tests
- ~~Documented in docs/ as appropriate~~
- [x] Added an entry to the CHANGES file


Co-authored-by: Jon Thielen <github@jont.cc>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
numpy Numpy related bug/enhancement
Projects
None yet
Development

No branches or pull requests

2 participants