-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: support apply method for ExtensionArray backed Series #28955
Comments
I think this primarily comes down to pandas' dtype inference not being able to infer extension types from a list of objects yet. In [10]: from pandas.tests.extension.decimal import DecimalArray, make_data
In [11]: import pandas as pd
In [12]: ser = pd.Series(DecimalArray(make_data()))
...: ser
Out[12]:
0 Decimal: 0.35594459693084556928255324237397871...
1 Decimal: 0.28919229388647194056716216437052935...
2 Decimal: 0.87055683509853509782772107428172603...
3 Decimal: 0.26522013357197371519191619881894439...
4 Decimal: 0.84871717478470365403353525834972970...
...
95 Decimal: 0.68471570624769151347521756179048679...
96 Decimal: 0.08382578509377813791303424295620061...
97 Decimal: 0.09951047765425147240136993787018582...
98 Decimal: 0.69957638105169761555401919395080767...
99 Decimal: 0.83568359682548865041695762556628324...
Length: 100, dtype: decimal
In [13]: ser.apply(lambda x: x + 1)
Out[13]:
0 1.355944596930845569282553242
1 1.289192293886471940567162164
2 1.870556835098535097827721074
3 1.265220133571973715191916199
4 1.848717174784703654033535258
...
95 1.684715706247691513475217562
96 1.083825785093778137913034243
97 1.099510477654251472401369938
98 1.699576381051697615554019194
99 1.835683596825488650416957626
Length: 100, dtype: object The Unless we had some kind of |
we should just fix infer_dtype |
Probably, but that's a bit difficult since it's in Cython. It's not clear
to me how to allow EA authors to register something here.
…On Mon, Oct 14, 2019 at 11:37 AM Jeff Reback ***@***.***> wrote:
we should just fix infer_dtype
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#28955?email_source=notifications&email_token=AAKAOIWCBYVQ3DUWW5EZKVLQOSN4FA5CNFSM4JAGT242YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBFPVFQ#issuecomment-541784726>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKAOIQBURE75ABUPVKKW7LQOSN4FANCNFSM4JAGT24Q>
.
|
i don’t think it’s very hard we already have a function to construct from a sequence on EA just need to call it; once we have gone down the object path (meaning we have not inferred to other things); then we need to check if we have an EA scalar and call the appropriate constructor |
Or, we could let the ExtensionArray be responsible for inferring (for this specific case). Either by calling |
sure but this is trickier as we don’t know which EA actually i think you are right for this case @jorisvandenbossche we need some logic in apply to handle a returned sequence of the same type EA (it would be very tricky to handle a different return type) |
We actually have this in our code right now: "Need to figure out if we want ExtensionArray.map first. " pandas/pandas/core/arrays/datetimelike.py Lines 695 to 703 in 5b0bf23
|
apply
method not implemented for ExtensionArray
hmm maybe close this in favor of that issue then (or consolidate issues as that’s slightly different) |
This might be useful in some of the groupby.apply/agg/etc stuff I'm working on. A lot of the corner cases involve ops being done on Categorical or IntegerArray and having to check whether we can/should cast back to the original dtype. |
@topper-123 does your recent apply/map work do anything to improve this? |
There is now in the main branch an |
The example in the OP now works as expected using main pandas and master pint-pandas branches. import pandas as pd
import pint
import pint_pandas
ureg = pint.get_application_registry()
def g(x):
return x+1*ureg.day
df = pd.DataFrame({'A':pd.Series([1,2,3,4], dtype='pint[day]'),'B':pd.Series([5,6,7,8], dtype='pint[day]')})
res = df['A'].apply(g)
print(type(df['A'].values))
print(type(res.values))
<class 'pint_pandas.pint_array.PintArray'>
<class 'pint_pandas.pint_array.PintArray'> |
Great, so I think this can be closed now. If someone objects to that, just ping this thread and I will open it again. |
Code Sample, a copy-pastable example if possible
Problem description
I am experimenting with the pint-pandas project which builds an
ExtensionArray
to be able to work with units on dataframes. The above code sample shows that.apply
method is not implemented for external extension arrays.I am using the pint-pandas-plotting branch of the pint-pandas project as this is the one which is compatible with pandas 0.25. I installed this branch by downloading, navigating to root directory and running something along the lines of:
python setup.py -e .
Expected Output
The expected output for
print(type(res.values))
should be aPintArray
and not a numpy array ofQuantity
objects.Output of
pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.6.7.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None
pandas : 0.25.1
numpy : 1.17.2
pytz : 2019.3
dateutil : 2.8.0
pip : 19.2.3
setuptools : 41.4.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.8.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
The text was updated successfully, but these errors were encountered: