Skip to content

BUG: pandas.cut(data, ..., include_lowest=True) raises IndexError when data is masked array Float64 dtype #42817

@JonAnCla

Description

@JonAnCla
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

# this works
pandas.cut(pandas.Series(numpy.arange(10)), pandas.Series([3, 7]), include_lowest=True)

# this fails with IndexError - stack trace below
pandas.cut(pandas.Series(numpy.arange(10)).astype('Float64'), pandas.Series([3, 7]), include_lowest=True)

Software used

tried pandas 1.2.0-1.3.1 (same error in each case)
python 3.8.10
ubuntu 20.04

Stack trace

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/tmp/ipykernel_220282/2616585682.py in <module>
----> 1 pandas.cut(pandas.Series(numpy.arange(10)).astype('Float64'), pandas.Series([3, 7]), include_lowest=True)

~/.pyenv/versions/3.8.10/envs/testvenv/lib/python3.8/site-packages/pandas/core/reshape/tile.py in cut(x, bins, right, labels, retbins, precision, include_lowest, duplicates, ordered)
    271             raise ValueError("bins must increase monotonically.")
    272 
--> 273     fac, bins = _bins_to_cuts(
    274         x,
    275         bins,

~/.pyenv/versions/3.8.10/envs/testvenv/lib/python3.8/site-packages/pandas/core/reshape/tile.py in _bins_to_cuts(x, bins, right, labels, precision, include_lowest, dtype, duplicates, ordered)
    408 
    409     if include_lowest:
--> 410         ids[x == bins[0]] = 1
    411 
    412     na_mask = isna(x) | (ids == len(bins)) | (ids == 0)

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

From a quick look at line 410 in pdb it seems that ids is a numpy array, while x==bins[0] returns a pandas.Series of dtype boolean i.e. nullable, which cannot be used to index ids.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugNeeds TriageIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions