Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: DataFrame.mean(numeric_only=True) raises AttributeError on v1.0.3 #33256

Closed
simonjayhawkins opened this issue Apr 3, 2020 · 5 comments · Fixed by #33761
Closed

REGR: DataFrame.mean(numeric_only=True) raises AttributeError on v1.0.3 #33256

simonjayhawkins opened this issue Apr 3, 2020 · 5 comments · Fixed by #33761
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Numeric Operations Arithmetic, Comparison, and Logical operations Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@simonjayhawkins
Copy link
Member

Code Sample, a copy-pastable example if possible

>>> import numpy as np
>>>
>>> import pandas as pd
>>>
>>> pd.__version__
'1.0.3'
>>>
>>> df_wide = pd.DataFrame(np.random.randint(1000, size=(1000, 100))).astype("Int64").copy()
>>>
>>> df_wide.mean(numeric_only=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\simon\pandas\pandas\core\generic.py", line 11215, in stat_func
    f, name, axis=axis, skipna=skipna, numeric_only=numeric_only
  File "C:\Users\simon\pandas\pandas\core\frame.py", line 7896, in _reduce
    res = df._data.reduce(op, axis=1, skipna=skipna, **kwds)
  File "C:\Users\simon\pandas\pandas\core\internals\managers.py", line 351, in reduce
    bres = func(blk.values, *args, **kwargs)
  File "C:\Users\simon\pandas\pandas\core\nanops.py", line 69, in _f
    return f(*args, **kwargs)
  File "C:\Users\simon\pandas\pandas\core\nanops.py", line 102, in f
    if values.size == 0 and kwds.get("min_count") is None:
AttributeError: 'IntegerArray' object has no attribute 'size'
>>>

Problem description

This is a regression from 0.25.3

0aa48f7 is the first bad commit
commit 0aa48f7
Author: jbrockmendel jbrockmendel@gmail.com
Date: Wed Jan 1 09:18:20 2020 -0800

PERF: perform reductions block-wise (#29847)

on master raises AttributeError: 'int' object has no attribute 'dtype'

>>> df_wide.mean(numeric_only=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\simon\pandas\pandas\core\generic.py", line 11114, in stat_func
    func, name=name, axis=axis, skipna=skipna, numeric_only=numeric_only
  File "C:\Users\simon\pandas\pandas\core\frame.py", line 7990, in _reduce
    res = df._data.reduce(blk_func)
  File "C:\Users\simon\pandas\pandas\core\internals\managers.py", line 362, in reduce
    bres = func(blk.values, *args, **kwargs)
  File "C:\Users\simon\pandas\pandas\core\frame.py", line 7985, in blk_func
    return op(values, axis=0, skipna=skipna, **kwds)
  File "C:\Users\simon\pandas\pandas\core\nanops.py", line 120, in f
    result = bn_func(values, axis=axis, **kwds)
  File "<__array_function__ internals>", line 6, in nanmean
  File "C:\Users\simon\Anaconda3\envs\pandas-dev\lib\site-packages\numpy\lib\nanfunctions.py", line 952, in nanmean
    avg = _divide_by_count(tot, cnt, out=out)
  File "C:\Users\simon\Anaconda3\envs\pandas-dev\lib\site-packages\numpy\lib\nanfunctions.py", line 219, in _divide_by_count
    return a.dtype.type(a / b)
AttributeError: 'int' object has no attribute 'dtype'

Expected Output

>>> import numpy as np
>>>
>>> import pandas as pd
>>>
>>> pd.__version__
'0.25.3'
>>>
>>> df_wide = pd.DataFrame(np.random.randint(1000, size=(1000, 100))).astype("Int64").copy()
>>>
>>> df_wide.mean(numeric_only=True)
0     520.057
1     507.735
2     501.618
3     506.590
4     501.500
       ...
95    507.594
96    483.273
97    506.330
98    497.068
99    508.118
Length: 100, dtype: float64
>>>
@simonjayhawkins simonjayhawkins added Numeric Operations Arithmetic, Comparison, and Logical operations Regression Functionality that used to work in a prior pandas version NA - MaskedArrays Related to pd.NA and nullable extension arrays labels Apr 3, 2020
@jorisvandenbossche
Copy link
Member

I noticed this as well while working on #32867. And normally this should also get solved in that PR.

@jorisvandenbossche jorisvandenbossche added ExtensionArray Extending pandas with custom dtypes or arrays. and removed NA - MaskedArrays Related to pd.NA and nullable extension arrays labels Apr 3, 2020
@simonjayhawkins
Copy link
Member Author

I was just looking at that PR, and couldn't find an issue. then realising it was a regression, felt that a separate issue should be raised. If we can get just the fix part of #32867 broken off, could backport if we do another patch release. (sorry, should have xrefed your work here originally)

@jorisvandenbossche
Copy link
Member

No, problem, you're totally correct to open an issue about it.
I think it should be rather straightforward to separate a fix from that PR.

@jorisvandenbossche jorisvandenbossche added this to the 1.0.4 milestone Apr 3, 2020
@jbrockmendel
Copy link
Member

@simonjayhawkins can you @ me on these for issues that i caused

@simonjayhawkins
Copy link
Member Author

changing the milestone, xref #33300 to track.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Numeric Operations Arithmetic, Comparison, and Logical operations Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants