WIP pandas #672

andrewgsavage · 2018-08-08T18:58:27Z

Initial commits

>>> from pint.pandas_interface import PintArray >>> import pandas as pd >>> df = pd.DataFrame({"address": PintArray([1, 2, 3])}) >>> df address 0 1 dimensionless 1 2 dimensionless 2 3 dimensionless >>> df.dtypes address Pint dtype: object >>> df['address'] 0 1 dimensionless 1 2 dimensionless 2 3 dimensionless Name: address, dtype: Pint >>> df['address'].values.data <Quantity([1 2 3], 'dimensionless')> but unfortunately >>> df['address'].values <pint.pandas_interface.PintArray object at 0x117517f28>

Initial commits

andrewgsavage · 2018-08-08T19:05:46Z

Most things seem to work well. Here's my tests

import pandas as pd 
import pint
import numpy as np

pd.__version__

'0.24.0.dev0+369.gbb451e89f'

pint.__version__

'0.9.dev0'

ureg=pint.UnitRegistry()
Q_ = ureg.Quantity

b=Q_([1,2,2,3],"m")
c=pint.QuantityArray._from_sequence([item for item in b])
c

<QuantityArray [<Quantity([1 2 2 3], 'meter')>]>

[item for item in b]

[1 <Unit('meter')>, 2 <Unit('meter')>, 2 <Unit('meter')>, 3 <Unit('meter')>]

c.data

[\begin{pmatrix}1 & 2 & 2 & 3\end{pmatrix} meter]

d=c*2
d.data

binop <QuantityArray [<Quantity([1 2 2 3], 'meter')>]> <class 'pint.pandas_array.QuantityArray'> 2 <class 'int'>
binop <QuantityArray [<Quantity([1 2 2 3], 'meter')>]> <class 'pint.pandas_array.QuantityArray'> 2 <class 'int'>

[\begin{pmatrix}2 & 4 & 4 & 6\end{pmatrix} meter]

df=pd.DataFrame({"a":c,"b":c})
df

c:\users\a\0ipython\units\git\pint\quantity.py:1343: UnitStrippedWarning: The unit of the quantity is stripped.
  warnings.warn("The unit of the quantity is stripped.", UnitStrippedWarning)

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	a	b
0	1	1
1	2	2
2	2	2
3	3	3

df.a.dtype

<pint.pandas_array.QuantityType at 0x24585efe898>

df.a.values

<QuantityArray [<Quantity([1 2 2 3], 'meter')>]>

s=df.a*df.b
s

binop <QuantityArray [<Quantity([1 2 2 3], 'meter')>]> <class 'pint.pandas_array.QuantityArray'> 0    1
1    2
2    2
3    3
Name: b, dtype: Quantity <class 'pandas.core.series.Series'>
binop <QuantityArray [<Quantity([1 2 2 3], 'meter')>]> <class 'pint.pandas_array.QuantityArray'> 0    1
1    2
2    2
3    3
Name: b, dtype: Quantity <class 'pandas.core.series.Series'>


c:\users\a\0ipython\units\git\pint\quantity.py:1343: UnitStrippedWarning: The unit of the quantity is stripped.
  warnings.warn("The unit of the quantity is stripped.", UnitStrippedWarning)





0    1
1    4
2    4
3    9
dtype: Quantity

#this rightly shouldnt work
j=df.a**df.b

binop <QuantityArray [<Quantity([1 2 2 3], 'meter')>]> <class 'pint.pandas_array.QuantityArray'> 0    1
1    2
2    2
3    3
Name: b, dtype: Quantity <class 'pandas.core.series.Series'>
binop <QuantityArray [<Quantity([1 2 2 3], 'meter')>]> <class 'pint.pandas_array.QuantityArray'> 0    1
1    2
2    2
3    3
Name: b, dtype: Quantity <class 'pandas.core.series.Series'>


c:\users\a\0ipython\units\git\pint\quantity.py:1343: UnitStrippedWarning: The unit of the quantity is stripped.
  warnings.warn("The unit of the quantity is stripped.", UnitStrippedWarning)



---------------------------------------------------------------------------

DimensionalityError                       Traceback (most recent call last)

<ipython-input-62-d171c5132096> in <module>()
----> 1 j=df.a**df.b


~\Anaconda2\envs\py36\lib\site-packages\pandas\core\ops.py in wrapper(left, right)
   1161         elif (is_extension_array_dtype(left) or
   1162                 is_extension_array_dtype(right)):
-> 1163             return dispatch_to_extension_op(op, left, right)
   1164 
   1165         elif is_datetime64_dtype(left) or is_datetime64tz_dtype(left):


~\Anaconda2\envs\py36\lib\site-packages\pandas\core\ops.py in dispatch_to_extension_op(op, left, right)
   1081         new_right = right
   1082 
-> 1083     res_values = op(new_left, new_right)
   1084     res_name = get_op_result_name(left, right)
   1085 


c:\users\a\0ipython\units\git\pint\pandas_array.py in _binop(self, other)
    614             # If the operator is not defined for the underlying objects,
    615             # a TypeError should be raised
--> 616             res = op(lvalues,rvalues)# [op(a, b) for (a, b) in zip(lvalues, rvalues)]
    617 #             res =[op(a, b) for (a, b) in zip(lvalues, rvalues)]
    618 


c:\users\a\0ipython\units\git\pint\quantity.py in __pow__(self, other)
   1041                 elif np.size(other) > 1:
   1042                     raise DimensionalityError(self._units, 'dimensionless',
-> 1043                                               extra_msg='Quantity array exponents are only allowed '
   1044                                                         'if the base is dimensionless')
   1045 


DimensionalityError: Cannot convert from 'meter' to 'dimensionless'Quantity array exponents are only allowed if the base is dimensionless

e=df.a.values + (ureg.cm * [5,5,5,5      ])
e.data

binop <QuantityArray [<Quantity([1 2 2 3], 'meter')>]> <class 'pint.pandas_array.QuantityArray'> [5 5 5 5] centimeter <class 'pint.quantity.build_quantity_class.<locals>.Quantity'>
binop <QuantityArray [<Quantity([1 2 2 3], 'meter')>]> <class 'pint.pandas_array.QuantityArray'> [5 5 5 5] centimeter <class 'pint.quantity.build_quantity_class.<locals>.Quantity'>


c:\users\a\0ipython\units\git\pint\quantity.py:1343: UnitStrippedWarning: The unit of the quantity is stripped.
  warnings.warn("The unit of the quantity is stripped.", UnitStrippedWarning)

[\begin{pmatrix}1.05 & 2.05 & 2.05 & 3.05\end{pmatrix} meter]

type(c.data)

pint.quantity.build_quantity_class.<locals>.Quantity

df.a

c:\users\a\0ipython\units\git\pint\quantity.py:1343: UnitStrippedWarning: The unit of the quantity is stripped.
  warnings.warn("The unit of the quantity is stripped.", UnitStrippedWarning)





0    1
1    2
2    2
3    3
Name: a, dtype: Quantity

#why is this different to above?!
df=pd.DataFrame(h)
df

c:\users\a\0ipython\units\git\pint\quantity.py:1343: UnitStrippedWarning: The unit of the quantity is stripped.
  warnings.warn("The unit of the quantity is stripped.", UnitStrippedWarning)

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	0
0	1 meter
1	2 meter
2	2 meter
3	3 meter
4	1 meter
5	2 meter
6	2 meter
7	3 meter

type(df[0].values)

numpy.ndarray

#At least cyber pandas has the same issue
from cyberpandas import IPArray

df=pd.DataFrame(IPArray(['192.168.1.1', '192.168.1.10']))
df[0].dtype

dtype('O')

df = pd.DataFrame({"address": IPArray(['192.168.1.1', '192.168.1.10'])})
df.address.dtype

<cyberpandas.ip_array.IPType at 0x24586685048>

df=pd.DataFrame({"a":pint.QuantityArray(Q_([1,2,2,3],"m")),"b":pint.QuantityArray(Q_([5,12,52,53],"m"))})
# df['c']=df.a*df.b
df

c:\users\a\0ipython\units\git\pint\quantity.py:1343: UnitStrippedWarning: The unit of the quantity is stripped.
  warnings.warn("The unit of the quantity is stripped.", UnitStrippedWarning)

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	a	b
0	1	5
1	2	12
2	2	52
3	3	53

#swtiching the order 
# all(not ju.is_na or ju.block.is_extension for ju in join_units) and 
# to
# all(not ju.block.is_extension for ju in join_units or ju.is_na ) and 
# fixes this one
pd.concat([df,df], axis=0)

---------------------------------------------------------------------------

AttributeError                            Traceback (most recent call last)

<ipython-input-71-d8590489f527> in <module>()
----> 1 pd.concat([df,df], axis=0)


~\Anaconda2\envs\py36\lib\site-packages\pandas\core\reshape\concat.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, sort, copy)
    225                        verify_integrity=verify_integrity,
    226                        copy=copy, sort=sort)
--> 227     return op.get_result()
    228 
    229 


~\Anaconda2\envs\py36\lib\site-packages\pandas\core\reshape\concat.py in get_result(self)
    422             new_data = concatenate_block_managers(
    423                 mgrs_indexers, self.new_axes, concat_axis=self.axis,
--> 424                 copy=self.copy)
    425             if not self.copy:
    426                 new_data._consolidate_inplace()


~\Anaconda2\envs\py36\lib\site-packages\pandas\core\internals\managers.py in concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy)
   2028                 values = values.view()
   2029             b = b.make_block_same_class(values, placement=placement)
-> 2030         elif is_uniform_join_units(join_units):
   2031             b = join_units[0].block.concat_same_type(
   2032                 [ju.block for ju in join_units], placement=placement)


~\Anaconda2\envs\py36\lib\site-packages\pandas\core\internals\concat.py in is_uniform_join_units(join_units)
    366         # no blocks that would get missing values (can lead to type upcasts)
    367         # unless we're an extension dtype.
--> 368         all(not ju.is_na or ju.block.is_extension for ju in join_units) and
    369         # no blocks with indexers (as then the dimensions do not fit)
    370         all(not ju.indexers for ju in join_units) and


~\Anaconda2\envs\py36\lib\site-packages\pandas\core\internals\concat.py in <genexpr>(.0)
    366         # no blocks that would get missing values (can lead to type upcasts)
    367         # unless we're an extension dtype.
--> 368         all(not ju.is_na or ju.block.is_extension for ju in join_units) and
    369         # no blocks with indexers (as then the dimensions do not fit)
    370         all(not ju.indexers for ju in join_units) and


pandas\_libs\properties.pyx in pandas._libs.properties.CachedProperty.__get__()


~\Anaconda2\envs\py36\lib\site-packages\pandas\core\internals\concat.py in is_na(self)
    163         chunk_len = max(total_len // 40, 1000)
    164         for i in range(0, total_len, chunk_len):
--> 165             if not isna(values_flat[i:i + chunk_len]).all():
    166                 return False
    167 


AttributeError: 'bool' object has no attribute 'all'

h=c._concat_same_type([c,c])
df=pd.DataFrame({"a":h})
df

concatting [<QuantityArray [<Quantity([1 2 2 3], 'meter')>]>, <QuantityArray [<Quantity([1 2 2 3], 'meter')>]>]


c:\users\a\0ipython\units\git\pint\quantity.py:1343: UnitStrippedWarning: The unit of the quantity is stripped.
  warnings.warn("The unit of the quantity is stripped.", UnitStrippedWarning)

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	a
0	1
1	2
2	2
3	3
4	1
5	2
6	2
7	3

#changing that to right fixes that
df.a==h

---------------------------------------------------------------------------

UnboundLocalError                         Traceback (most recent call last)

<ipython-input-73-e4e3b83fb054> in <module>()
----> 1 df.a==h


~\Anaconda2\envs\py36\lib\site-packages\pandas\core\ops.py in wrapper(self, other, axis)
   1358               (is_extension_array_dtype(other) and
   1359                not is_scalar(other))):
-> 1360             return dispatch_to_extension_op(op, self, other)
   1361 
   1362         elif isinstance(other, ABCSeries):


~\Anaconda2\envs\py36\lib\site-packages\pandas\core\ops.py in dispatch_to_extension_op(op, left, right)
   1072             new_right = list(new_right)
   1073         elif is_extension_array_dtype(right) and type(left) != type(right):
-> 1074             new_right = list(new_right)
   1075         else:
   1076             new_right = right


UnboundLocalError: local variable 'new_right' referenced before assignment

Was making unnecessarily doing np.array(<Quantity>)

This should mean that all the tests pass again. Given pint's focus on avoiding dependencies, we shouldn't import the pandas interface by default (as it depends on pandas). If users want it, they'll have to do `from pint.pandas_array import QuantityArray` which is a little longer than `from pint import QuantityArray` but I think that's ok as it's a specialised usage.

Remove pandas_array import from __init__.py

…perations Add check to ensure array sizes of RHS and LHS match Prevented typeerrors when performing operation with a single value quantity

Fixes great DimensionalityError that occurs when you pow with a single value quantity. DimensionalityError: Cannot convert from 'dimensionless' to 'dimensionless'

hgrecco · 2018-08-09T21:36:36Z

I have a few comments:
1.- Can you describe briefly what is concept behind the way you store Pint Quantities in pandas. The code is quite clear but I think that describing it with word will allow others to better debug the code (And also provide a short text to include in the docs). Maybe you can already write it in docs format
2.- We need to provide a way to use this even without having pandas installed. Maybe you can look at the way the matplotlib code was added.
3.- It is looking good and a very valuable addition to Pint!

…ks painful

Add documentation

Removing docs as I haven't worked out how to set them up properly to past tests. DF accessors added to ease going between QAs and numerical arrays.

WIP: fix pandas tests

Tidy up tests

Add example and tidy up source a tiny bit

Get tests passing again

znicholls · 2018-08-28T20:05:55Z

@andrewgsavage probably best close this?

andrewgsavage · 2018-08-28T20:21:16Z

ya

znicholls and others added 9 commits August 6, 2018 02:24

Initial hacks to look at adding pandas interface

f82e5d4

Getting somewhere, no more errors...

18a2d4c

Pass 2 more tests

22e798f

Get all the pandas tests I can going

4d55b3d

Before trying to remove data thing

23e6db9

Write down my env setup

590eebb

A minor improvement

c00ca17

Initial commits

763745a

Initial commits

Fixed UnitStripWarnings

8f99144

Was making unnecessarily doing np.array(<Quantity>)

znicholls mentioned this pull request Aug 9, 2018

WIP: add pandas interface #671

Closed

znicholls and others added 6 commits August 9, 2018 11:55

Merge pull request #1 from znicholls/patch-1

b5a0d6c

Remove pandas_array import from __init__.py

Start again, basically

25b2bcd

Check array size dimensions when performing arithermetic/compartive o…

eab4ccb

…perations Add check to ensure array sizes of RHS and LHS match Prevented typeerrors when performing operation with a single value quantity

Clean up, ready to start building

a8b5988

Fix dimensionality error message

b14fb07

Fixes great DimensionalityError that occurs when you pow with a single value quantity. DimensionalityError: Cannot convert from 'dimensionless' to 'dimensionless'

znicholls and others added 11 commits August 10, 2018 02:11

42 tests passed, that is a start

2a08a2a

Add some more thoughts about how to do this sensibly

16a0e85

52 passed

d627722

More passing than failing, that is something. arithmetic ops next loo…

49340c5

…ks painful

Add docs

031da9d

Add documentation

Pass a few more tests

d8c5dbc

Over 100 tests passed, seems to be going ok...

172ea64

Only missing and setitem to go

a52992e

Pass all the pandas interface tests

02ad7f3

Add DF accessors, remove docs

433789b

Removing docs as I haven't worked out how to set them up properly to past tests. DF accessors added to ease going between QAs and numerical arrays.

All the stuff I needed to work out why PintArray.values looked weird

4298340

andrewgsavage and others added 6 commits August 20, 2018 23:01

divmod wasnt too hard :o

249c6f9

the last test?

a7d044c

Series accessors

c04101e

Merge pull request #1 from andrewgsavage/master

64411e2

WIP: fix pandas tests

Tidy up tests

efc6049

Tweak based on andrew comments

e8120e4

hgrecco mentioned this pull request Aug 22, 2018

Quantity.copy and Quantity.mean not idempotent #678

Closed

andrewgsavage and others added 20 commits August 22, 2018 17:02

Merge pull request #2 from znicholls/tidy-up-tests

f1295a2

Tidy up tests

Remove unitstrips from isna()

77afffd

remove unitstrip warnining for ops

236e1b6

fix value_counts and it's test

ad1d381

revert the na_Cmp

8e1027e

revert binop warning check

9d1355f

SetItem warning debugging

dc48856

Testmethods

04a2115

fixes TestGetItem hitting unitstrips when comparing nans

3986738

Merge remote-tracking branch 'upstream/master'

ff1980d

Add example and tidy up source a tiny bit

20f1e87

Merge pull request #4 from znicholls/add-example-notebooks

d3b7d7a

Add example and tidy up source a tiny bit

change to __array__ seems to work

fd91a14

Merge branch 'master' into master

21f8a91

get __array__ working as expected

de6cd77

Tidy up

9840c32

forgot to save a file

cdbf864

Get tests passing again

3a88a6b

Change to using pandas master branch

ad6deb3

Merge pull request #4 from znicholls/asavage

680ff04

Get tests passing again

andrewgsavage closed this Aug 28, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP pandas #672

WIP pandas #672

andrewgsavage commented Aug 8, 2018

andrewgsavage commented Aug 8, 2018 •

edited

Loading

hgrecco commented Aug 9, 2018

znicholls commented Aug 28, 2018

andrewgsavage commented Aug 28, 2018

WIP pandas #672

WIP pandas #672

Conversation

andrewgsavage commented Aug 8, 2018

andrewgsavage commented Aug 8, 2018 • edited Loading

hgrecco commented Aug 9, 2018

znicholls commented Aug 28, 2018

andrewgsavage commented Aug 28, 2018

andrewgsavage commented Aug 8, 2018 •

edited

Loading