Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: DataFrame[Sparse] quantile fails because SparseArray has no reshape #24600

Closed
jbrockmendel opened this issue Jan 3, 2019 · 5 comments · Fixed by #35236
Closed

BUG: DataFrame[Sparse] quantile fails because SparseArray has no reshape #24600

jbrockmendel opened this issue Jan 3, 2019 · 5 comments · Fixed by #35236
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions Sparse Sparse Data Type
Milestone

Comments

@jbrockmendel
Copy link
Member

Tried to simplify Block.quantile by arranging for it to only have to handle 2D case by having Series.quantile dispatch to DataFrame implementation. Ended up getting failures in pandas/tests/series/test_quantile.py test_quantile_sparse

ser = pd.Series([0., None, 1., 2.], dtype='Sparse[float]')
df = pd.DataFrame(ser)

>>> ser.quantile(0.5)
1.0
>>> ser.quantile([0.5])
0.5    1.0
dtype: float64
>>> df.quantile(0.5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/core/frame.py", line 7760, in quantile
    transposed=is_transposed)
  File "pandas/core/internals/managers.py", line 500, in quantile
    return self.reduction('quantile', **kwargs)
  File "pandas/core/internals/managers.py", line 432, in reduction
    axe, block = getattr(b, f)(axis=axis, axes=self.axes, **kwargs)
  File "pandas/core/internals/blocks.py", line 1530, in quantile
    result = _nanpercentile(values, qs * 100, axis=axis, **kw)
  File "pandas/core/internals/blocks.py", line 1484, in _nanpercentile
    mask = mask.reshape(values.shape)
AttributeError: 'SparseArray' object has no attribute 'reshape'
>>> df.quantile([0.5])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/core/frame.py", line 7760, in quantile
    transposed=is_transposed)
  File "pandas/core/internals/managers.py", line 500, in quantile
    return self.reduction('quantile', **kwargs)
  File "pandas/core/internals/managers.py", line 432, in reduction
    axe, block = getattr(b, f)(axis=axis, axes=self.axes, **kwargs)
  File "pandas/core/internals/blocks.py", line 1511, in quantile
    axis=axis, **kw)
  File "pandas/core/internals/blocks.py", line 1484, in _nanpercentile
    mask = mask.reshape(values.shape)
AttributeError: 'SparseArray' object has no attribute 'reshape'

datetime64[ns, tz] breaks in a slightly different way (presumably all ExtensionBlocks will fail):

dti = pd.date_range('2016-01-01', periods=3, tz='US/Pacific')

ser = pd.Series(dti)
df = pd.DataFrame(ser)

>>> df.quantile(0.5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/core/frame.py", line 7760, in quantile
    transposed=is_transposed)
  File "pandas/core/internals/managers.py", line 500, in quantile
    return self.reduction('quantile', **kwargs)
  File "pandas/core/internals/managers.py", line 473, in reduction
    values = _concat._concat_compat([b.values for b in blocks])
  File "pandas/core/dtypes/concat.py", line 174, in _concat_compat
    return np.concatenate(to_concat, axis=axis)
ValueError: need at least one array to concatenate
>>> df.quantile([0.5])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/core/frame.py", line 7760, in quantile
    transposed=is_transposed)
  File "pandas/core/internals/managers.py", line 500, in quantile
    return self.reduction('quantile', **kwargs)
  File "pandas/core/internals/managers.py", line 473, in reduction
    values = _concat._concat_compat([b.values for b in blocks])
  File "pandas/core/dtypes/concat.py", line 174, in _concat_compat
    return np.concatenate(to_concat, axis=axis)
ValueError: need at least one array to concatenate

xref #24583

@jbrockmendel
Copy link
Member Author

IntNA is also a catastrophe for quantile

@TomAugspurger
Copy link
Contributor

Do you think this will need to be pushed down to the array for ExtensionArrays?

@jbrockmendel
Copy link
Member Author

Do you think this will need to be pushed down to the array for ExtensionArrays?

quantile itself? Probably not. For SparseArray a patch is now in place that avoids the immediate problem. For IntNA it looks like the problem is in _try_coerce_result not handling things correctly. For DatetimeTZBlock the problem is in _concat._concat_compat. It's eclectic.

I think we'll want to define _try_coerce_result (and possibly _try_coerce_args, not sure) in terms of _holder._from_sequence (and possibly _holder._unbox_scalar or something resembling _scalar_from_string).

@mroeschke mroeschke added Numeric Operations Arithmetic, Comparison, and Logical operations Sparse Sparse Data Type labels Jan 13, 2019
@pglopezamaya
Copy link

Any news on the SparseArray' object has no attribute 'reshape' patch?

@mroeschke
Copy link
Member

These cases look to work on master. Could use a test

In [57]: ser = pd.Series([0., None, 1., 2.], dtype='Sparse[float]')
    ...: df = pd.DataFrame(ser)

In [58]: df.quantile(0.5)
Out[58]:
0    1.0
Name: 0.5, dtype: float64

In [59]: dti = pd.date_range('2016-01-01', periods=3, tz='US/Pacific')
    ...:
    ...: ser = pd.Series(dti)
    ...: df = pd.DataFrame(ser)

In [60]: df.quantile(0.5)
Out[60]: Series([], Name: 0.5, dtype: float64)

In [61]: pd.__version__
Out[61]: '1.1.0.dev0+1108.gcad602e16'

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Numeric Operations Arithmetic, Comparison, and Logical operations Sparse Sparse Data Type quantile quantile method labels Apr 5, 2020
@jreback jreback added this to the 1.1 milestone Jul 16, 2020
@jreback jreback added the Sparse Sparse Data Type label Jul 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions Sparse Sparse Data Type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants