Skip to content

Series.shift() doesn't work for categorical type #9416

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lminer opened this issue Feb 4, 2015 · 2 comments · Fixed by #10497
Closed

Series.shift() doesn't work for categorical type #9416

lminer opened this issue Feb 4, 2015 · 2 comments · Fixed by #10497
Labels
Milestone

Comments

@lminer
Copy link

lminer commented Feb 4, 2015

Not sure if this is intentional, but Series.shift() won't run with categorical dtypes:

ser = pd.Series(['a', 'b', 'c', 'd'], dtype="category")
ser.shift(1)
Traceback (most recent call last):

  File "<ipython-input-15-1a7536b0af06>", line 1, in <module>
    ser.shift(1)

  File "/.../pandas/core/generic.py", line 3394, in shift
    new_data = self._data.shift(periods=periods, axis=block_axis)

  File "/.../pandas/core/internals.py", line 2533, in shift
    return self.apply('shift', **kwargs)

  File "/.../pandas/core/internals.py", line 2497, in apply
    applied = getattr(b, f)(**kwargs)

  File "/.../pandas/core/internals.py", line 893, in shift
    new_values, fill_value = com._maybe_upcast(self.values)

  File "/.../pandas/core/common.py", line 1218, in _maybe_upcast
    new_dtype, fill_value = _maybe_promote(dtype, fill_value)

  File "/.../pandas/core/common.py", line 1124, in _maybe_promote
    if issubclass(np.dtype(dtype).type, compat.string_types):

TypeError: data type not understood
@shoyer
Copy link
Member

shoyer commented Feb 5, 2015

This simply hasn't been implemented, but otherwise was not intentional. Help would be appreciated if you're interested in putting together a PR. The place to get started (I believe) would be to implement the shift method on CategoricalBlock in pandas.core.internals.

@shoyer shoyer added Categorical Categorical Data Type Enhancement labels Feb 5, 2015
@shoyer shoyer added this to the 0.16.0 milestone Feb 5, 2015
@jreback
Copy link
Contributor

jreback commented Feb 5, 2015

here's basically what you would do

In [2]: s = Series(list('aabbcde'),dtype='category')

In [3]: s
Out[3]: 
0    a
1    a
2    b
3    b
4    c
5    d
6    e
dtype: category
Categories (5, object): [a < b < c < d < e]

In [4]: s.values
Out[4]: 
[a, a, b, b, c, d, e]
Categories (5, object): [a < b < c < d < e]

In [5]: s.values.codes  
Out[5]: array([0, 0, 1, 1, 2, 3, 4], dtype=int8)

In [6]: np.roll(s.values.codes,len(s)-1,axis=0)
Out[6]: array([0, 1, 1, 2, 3, 4, 0], dtype=int8)

In [7]: codes = np.roll(s.values.codes,len(s)-1,axis=0)

In [8]: codes[-1] = -1

In [11]: pd.Categorical(codes,categories=s.values.categories,fastpath=True)
Out[11]: 
[a, b, b, c, d, e, NaN]
Categories (5, object): [a, b, c, d, e]

you would use the Block.shift method (and pass the codes to it for the actual shifting), then wrap it back to a catetgorical (their is a method for that too). Should be pretty straightforward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants