Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP pandas #672

Closed
wants to merge 87 commits into from
Closed

WIP pandas #672

wants to merge 87 commits into from

Conversation

andrewgsavage
Copy link
Collaborator

Initial commits

znicholls and others added 9 commits August 6, 2018 02:24
>>> from pint.pandas_interface import PintArray
>>> import pandas as pd

>>> df = pd.DataFrame({"address": PintArray([1, 2, 3])})
>>> df
          address
0 1 dimensionless
1 2 dimensionless
2 3 dimensionless

>>> df.dtypes
address    Pint
dtype: object

>>> df['address']
0   1 dimensionless
1   2 dimensionless
2   3 dimensionless
Name: address, dtype: Pint

>>> df['address'].values.data
<Quantity([1 2 3], 'dimensionless')>

but unfortunately

>>> df['address'].values
<pint.pandas_interface.PintArray object at 0x117517f28>
Initial commits
@andrewgsavage
Copy link
Collaborator Author

andrewgsavage commented Aug 8, 2018

Most things seem to work well. Here's my tests

import pandas as pd 
import pint
import numpy as np
pd.__version__
'0.24.0.dev0+369.gbb451e89f'
pint.__version__
'0.9.dev0'
ureg=pint.UnitRegistry()
Q_ = ureg.Quantity
b=Q_([1,2,2,3],"m")
c=pint.QuantityArray._from_sequence([item for item in b])
c
<QuantityArray [<Quantity([1 2 2 3], 'meter')>]>
[item for item in b]
[1 <Unit('meter')>, 2 <Unit('meter')>, 2 <Unit('meter')>, 3 <Unit('meter')>]
c.data

[\begin{pmatrix}1 & 2 & 2 & 3\end{pmatrix} meter]

d=c*2
d.data
binop <QuantityArray [<Quantity([1 2 2 3], 'meter')>]> <class 'pint.pandas_array.QuantityArray'> 2 <class 'int'>
binop <QuantityArray [<Quantity([1 2 2 3], 'meter')>]> <class 'pint.pandas_array.QuantityArray'> 2 <class 'int'>

[\begin{pmatrix}2 & 4 & 4 & 6\end{pmatrix} meter]

df=pd.DataFrame({"a":c,"b":c})
df
c:\users\a\0ipython\units\git\pint\quantity.py:1343: UnitStrippedWarning: The unit of the quantity is stripped.
  warnings.warn("The unit of the quantity is stripped.", UnitStrippedWarning)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
a b
0 1 1
1 2 2
2 2 2
3 3 3
df.a.dtype
<pint.pandas_array.QuantityType at 0x24585efe898>
df.a.values
<QuantityArray [<Quantity([1 2 2 3], 'meter')>]>
s=df.a*df.b
s
binop <QuantityArray [<Quantity([1 2 2 3], 'meter')>]> <class 'pint.pandas_array.QuantityArray'> 0    1
1    2
2    2
3    3
Name: b, dtype: Quantity <class 'pandas.core.series.Series'>
binop <QuantityArray [<Quantity([1 2 2 3], 'meter')>]> <class 'pint.pandas_array.QuantityArray'> 0    1
1    2
2    2
3    3
Name: b, dtype: Quantity <class 'pandas.core.series.Series'>


c:\users\a\0ipython\units\git\pint\quantity.py:1343: UnitStrippedWarning: The unit of the quantity is stripped.
  warnings.warn("The unit of the quantity is stripped.", UnitStrippedWarning)





0    1
1    4
2    4
3    9
dtype: Quantity
#this rightly shouldnt work
j=df.a**df.b
binop <QuantityArray [<Quantity([1 2 2 3], 'meter')>]> <class 'pint.pandas_array.QuantityArray'> 0    1
1    2
2    2
3    3
Name: b, dtype: Quantity <class 'pandas.core.series.Series'>
binop <QuantityArray [<Quantity([1 2 2 3], 'meter')>]> <class 'pint.pandas_array.QuantityArray'> 0    1
1    2
2    2
3    3
Name: b, dtype: Quantity <class 'pandas.core.series.Series'>


c:\users\a\0ipython\units\git\pint\quantity.py:1343: UnitStrippedWarning: The unit of the quantity is stripped.
  warnings.warn("The unit of the quantity is stripped.", UnitStrippedWarning)



---------------------------------------------------------------------------

DimensionalityError                       Traceback (most recent call last)

<ipython-input-62-d171c5132096> in <module>()
----> 1 j=df.a**df.b


~\Anaconda2\envs\py36\lib\site-packages\pandas\core\ops.py in wrapper(left, right)
   1161         elif (is_extension_array_dtype(left) or
   1162                 is_extension_array_dtype(right)):
-> 1163             return dispatch_to_extension_op(op, left, right)
   1164 
   1165         elif is_datetime64_dtype(left) or is_datetime64tz_dtype(left):


~\Anaconda2\envs\py36\lib\site-packages\pandas\core\ops.py in dispatch_to_extension_op(op, left, right)
   1081         new_right = right
   1082 
-> 1083     res_values = op(new_left, new_right)
   1084     res_name = get_op_result_name(left, right)
   1085 


c:\users\a\0ipython\units\git\pint\pandas_array.py in _binop(self, other)
    614             # If the operator is not defined for the underlying objects,
    615             # a TypeError should be raised
--> 616             res = op(lvalues,rvalues)# [op(a, b) for (a, b) in zip(lvalues, rvalues)]
    617 #             res =[op(a, b) for (a, b) in zip(lvalues, rvalues)]
    618 


c:\users\a\0ipython\units\git\pint\quantity.py in __pow__(self, other)
   1041                 elif np.size(other) > 1:
   1042                     raise DimensionalityError(self._units, 'dimensionless',
-> 1043                                               extra_msg='Quantity array exponents are only allowed '
   1044                                                         'if the base is dimensionless')
   1045 


DimensionalityError: Cannot convert from 'meter' to 'dimensionless'Quantity array exponents are only allowed if the base is dimensionless
e=df.a.values + (ureg.cm * [5,5,5,5      ])
e.data
binop <QuantityArray [<Quantity([1 2 2 3], 'meter')>]> <class 'pint.pandas_array.QuantityArray'> [5 5 5 5] centimeter <class 'pint.quantity.build_quantity_class.<locals>.Quantity'>
binop <QuantityArray [<Quantity([1 2 2 3], 'meter')>]> <class 'pint.pandas_array.QuantityArray'> [5 5 5 5] centimeter <class 'pint.quantity.build_quantity_class.<locals>.Quantity'>


c:\users\a\0ipython\units\git\pint\quantity.py:1343: UnitStrippedWarning: The unit of the quantity is stripped.
  warnings.warn("The unit of the quantity is stripped.", UnitStrippedWarning)

[\begin{pmatrix}1.05 & 2.05 & 2.05 & 3.05\end{pmatrix} meter]

type(c.data)
pint.quantity.build_quantity_class.<locals>.Quantity
df.a
c:\users\a\0ipython\units\git\pint\quantity.py:1343: UnitStrippedWarning: The unit of the quantity is stripped.
  warnings.warn("The unit of the quantity is stripped.", UnitStrippedWarning)





0    1
1    2
2    2
3    3
Name: a, dtype: Quantity
#why is this different to above?!
df=pd.DataFrame(h)
df
c:\users\a\0ipython\units\git\pint\quantity.py:1343: UnitStrippedWarning: The unit of the quantity is stripped.
  warnings.warn("The unit of the quantity is stripped.", UnitStrippedWarning)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
0
0 1 meter
1 2 meter
2 2 meter
3 3 meter
4 1 meter
5 2 meter
6 2 meter
7 3 meter
type(df[0].values)
numpy.ndarray
#At least cyber pandas has the same issue
from cyberpandas import IPArray

df=pd.DataFrame(IPArray(['192.168.1.1', '192.168.1.10']))
df[0].dtype
dtype('O')
df = pd.DataFrame({"address": IPArray(['192.168.1.1', '192.168.1.10'])})
df.address.dtype
<cyberpandas.ip_array.IPType at 0x24586685048>
df=pd.DataFrame({"a":pint.QuantityArray(Q_([1,2,2,3],"m")),"b":pint.QuantityArray(Q_([5,12,52,53],"m"))})
# df['c']=df.a*df.b
df
c:\users\a\0ipython\units\git\pint\quantity.py:1343: UnitStrippedWarning: The unit of the quantity is stripped.
  warnings.warn("The unit of the quantity is stripped.", UnitStrippedWarning)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
a b
0 1 5
1 2 12
2 2 52
3 3 53
#swtiching the order 
# all(not ju.is_na or ju.block.is_extension for ju in join_units) and 
# to
# all(not ju.block.is_extension for ju in join_units or ju.is_na ) and 
# fixes this one
pd.concat([df,df], axis=0)
---------------------------------------------------------------------------

AttributeError                            Traceback (most recent call last)

<ipython-input-71-d8590489f527> in <module>()
----> 1 pd.concat([df,df], axis=0)


~\Anaconda2\envs\py36\lib\site-packages\pandas\core\reshape\concat.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, sort, copy)
    225                        verify_integrity=verify_integrity,
    226                        copy=copy, sort=sort)
--> 227     return op.get_result()
    228 
    229 


~\Anaconda2\envs\py36\lib\site-packages\pandas\core\reshape\concat.py in get_result(self)
    422             new_data = concatenate_block_managers(
    423                 mgrs_indexers, self.new_axes, concat_axis=self.axis,
--> 424                 copy=self.copy)
    425             if not self.copy:
    426                 new_data._consolidate_inplace()


~\Anaconda2\envs\py36\lib\site-packages\pandas\core\internals\managers.py in concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy)
   2028                 values = values.view()
   2029             b = b.make_block_same_class(values, placement=placement)
-> 2030         elif is_uniform_join_units(join_units):
   2031             b = join_units[0].block.concat_same_type(
   2032                 [ju.block for ju in join_units], placement=placement)


~\Anaconda2\envs\py36\lib\site-packages\pandas\core\internals\concat.py in is_uniform_join_units(join_units)
    366         # no blocks that would get missing values (can lead to type upcasts)
    367         # unless we're an extension dtype.
--> 368         all(not ju.is_na or ju.block.is_extension for ju in join_units) and
    369         # no blocks with indexers (as then the dimensions do not fit)
    370         all(not ju.indexers for ju in join_units) and


~\Anaconda2\envs\py36\lib\site-packages\pandas\core\internals\concat.py in <genexpr>(.0)
    366         # no blocks that would get missing values (can lead to type upcasts)
    367         # unless we're an extension dtype.
--> 368         all(not ju.is_na or ju.block.is_extension for ju in join_units) and
    369         # no blocks with indexers (as then the dimensions do not fit)
    370         all(not ju.indexers for ju in join_units) and


pandas\_libs\properties.pyx in pandas._libs.properties.CachedProperty.__get__()


~\Anaconda2\envs\py36\lib\site-packages\pandas\core\internals\concat.py in is_na(self)
    163         chunk_len = max(total_len // 40, 1000)
    164         for i in range(0, total_len, chunk_len):
--> 165             if not isna(values_flat[i:i + chunk_len]).all():
    166                 return False
    167 


AttributeError: 'bool' object has no attribute 'all'
h=c._concat_same_type([c,c])
df=pd.DataFrame({"a":h})
df
concatting [<QuantityArray [<Quantity([1 2 2 3], 'meter')>]>, <QuantityArray [<Quantity([1 2 2 3], 'meter')>]>]


c:\users\a\0ipython\units\git\pint\quantity.py:1343: UnitStrippedWarning: The unit of the quantity is stripped.
  warnings.warn("The unit of the quantity is stripped.", UnitStrippedWarning)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
a
0 1
1 2
2 2
3 3
4 1
5 2
6 2
7 3
#changing that to right fixes that
df.a==h
---------------------------------------------------------------------------

UnboundLocalError                         Traceback (most recent call last)

<ipython-input-73-e4e3b83fb054> in <module>()
----> 1 df.a==h


~\Anaconda2\envs\py36\lib\site-packages\pandas\core\ops.py in wrapper(self, other, axis)
   1358               (is_extension_array_dtype(other) and
   1359                not is_scalar(other))):
-> 1360             return dispatch_to_extension_op(op, self, other)
   1361 
   1362         elif isinstance(other, ABCSeries):


~\Anaconda2\envs\py36\lib\site-packages\pandas\core\ops.py in dispatch_to_extension_op(op, left, right)
   1072             new_right = list(new_right)
   1073         elif is_extension_array_dtype(right) and type(left) != type(right):
-> 1074             new_right = list(new_right)
   1075         else:
   1076             new_right = right


UnboundLocalError: local variable 'new_right' referenced before assignment

Was making unnecessarily doing np.array(<Quantity>)
znicholls and others added 6 commits August 9, 2018 11:55
This should mean that all the tests pass again.

Given pint's focus on avoiding dependencies, we shouldn't import the pandas interface by default (as it depends on pandas). If users want it, they'll have to do `from pint.pandas_array import QuantityArray` which is a little longer than `from pint import QuantityArray` but I think that's ok as it's a specialised usage.
Remove pandas_array import from __init__.py
…perations

Add check to ensure array sizes of RHS and LHS match
Prevented typeerrors when performing operation with a single value
quantity
Fixes great DimensionalityError that occurs when you pow with a single
value quantity.
DimensionalityError: Cannot convert from 'dimensionless' to
'dimensionless'
@hgrecco
Copy link
Owner

hgrecco commented Aug 9, 2018

I have a few comments:
1.- Can you describe briefly what is concept behind the way you store Pint Quantities in pandas. The code is quite clear but I think that describing it with word will allow others to better debug the code (And also provide a short text to include in the docs). Maybe you can already write it in docs format
2.- We need to provide a way to use this even without having pandas installed. Maybe you can look at the way the matplotlib code was added.
3.- It is looking good and a very valuable addition to Pint!

@znicholls
Copy link
Contributor

@andrewgsavage probably best close this?

@andrewgsavage
Copy link
Collaborator Author

ya

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants