Skip to content

BUG: Categorical.take with fill_value #23296

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TomAugspurger opened this issue Oct 23, 2018 · 2 comments
Closed

BUG: Categorical.take with fill_value #23296

TomAugspurger opened this issue Oct 23, 2018 · 2 comments
Labels
Categorical Categorical Data Type Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@TomAugspurger
Copy link
Contributor

We need to translate the user-provided fill_value to the code for that category before taking.

In [1]: import pandas as pd

In [2]: c = pd.Categorical(['a', 'b', 'c'])

In [3]: c.take([0, 1, -1], fill_value='a', allow_fill=True)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-97f966c41cb2> in <module>
----> 1 c.take([0, 1, -1], fill_value='a', allow_fill=True)

~/sandbox/pandas/pandas/core/arrays/categorical.py in take_nd(self, indexer, allow_fill, fill_value)
   1806         codes = take(self._codes, indexer, allow_fill=allow_fill,
   1807                      fill_value=fill_value)
-> 1808         result = self._constructor(codes, dtype=self.dtype, fastpath=True)
   1809         return result
   1810

~/sandbox/pandas/pandas/core/arrays/categorical.py in __init__(self, values, categories, ordered, dtype, fastpath)
    371
    372         if fastpath:
--> 373             self._codes = coerce_indexer_dtype(values, categories)
    374             self._dtype = self._dtype.update_dtype(dtype)
    375             return

~/sandbox/pandas/pandas/core/dtypes/cast.py in coerce_indexer_dtype(indexer, categories)
    603     length = len(categories)
    604     if length < _int8_max:
--> 605         return ensure_int8(indexer)
    606     elif length < _int16_max:
    607         return ensure_int16(indexer)

~/sandbox/pandas/pandas/_libs/algos_common_helper.pxi in pandas._libs.algos.ensure_int8()
    413             return arr
    414         else:
--> 415             return arr.astype(np.int8, copy=copy)
    416     else:
    417         return np.array(arr, dtype=np.int8)

ValueError: invalid literal for int() with base 10: 'a'
@TomAugspurger TomAugspurger added Indexing Related to indexing on series/frames, not to indexes themselves Categorical Categorical Data Type labels Oct 23, 2018
@TomAugspurger TomAugspurger added this to the 0.24.0 milestone Oct 23, 2018
@TomAugspurger
Copy link
Contributor Author

API discussion, which we've maybe had before, should we all fill_value that's outside of the original categories? i.e. should this be

In [2]: cat = pd.Categorical(['a', 'a', 'b'])

In [3]: cat.take([0, -1, -1], fill_value='d', allow_fill=True)
Out[3]:
[a, d, d]
Categories (3, object): [a, b, d]

or should it raise an error?

Right now, I think we should allow it, but I could see either way.

cc @jorisvandenbossche @jankatins.

@jorisvandenbossche
Copy link
Member

I would think that take should not alter the dtype, so I would not allow it (meaning, raising a TypeError).

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Oct 23, 2018
TomAugspurger added a commit that referenced this issue Oct 23, 2018
* BUG: Handle fill_value in Categorical.take

Closes #23296

* no new categories

* revert add_categories
tm9k1 pushed a commit to tm9k1/pandas that referenced this issue Nov 19, 2018
* BUG: Handle fill_value in Categorical.take

Closes pandas-dev#23296

* no new categories

* revert add_categories
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this issue Feb 28, 2019
* BUG: Handle fill_value in Categorical.take

Closes pandas-dev#23296

* no new categories

* revert add_categories
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

No branches or pull requests

2 participants