Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: RangeIndex.astype('category') #41263

Merged
merged 5 commits into from
May 5, 2021

Conversation

topper-123
Copy link
Contributor

@topper-123 topper-123 commented May 2, 2021

There's currently inconsistent behaviour when converting RangeIndex to CategoricalIndex using different methods. This fixes that.

>>> ridx = pd.RangeIndex(5)
>>> pd.CategoricalIndex(ridx).categories
RangeIndex(start=0, stop=5, step=1)  # both master and this PR
>>> ridx.astype("category").categories
Int64Index([0, 1, 2, 3, 4], dtype='int64')  # master
RangeIndex(start=0, stop=5, step=1)  # this PR

In general, when supplying a Index subclass to Categorical/CategoricalIndex, the new categories should be of that type (unless the supplied index is a CategoricalIndex itself).

Discovered working on tests for #41153.

@@ -667,19 +667,19 @@ def test_astype_category(self, copy, name, ordered, simple_index):
# standard categories
dtype = CategoricalDtype(ordered=ordered)
result = idx.astype(dtype, copy=copy)
expected = CategoricalIndex(idx.values, name=name, ordered=ordered)
expected = CategoricalIndex(idx, name=name, ordered=ordered)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe have a test explicitly for RangeIndex?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests.indexes.ranges.test_range.py::TestRangeIndex inherits from this base class, so it's tested that way. Do you have something else in mind?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some tm.assert_foo functions dont raise when we assert an Int64Index equals a RangeIndex. So off the top my head it isn't obvious that this edited test would fail in master. im asking for a test that is super-obviously tied to this bugfix

@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions Index Related to the Index class or subclasses labels May 2, 2021
@jreback jreback added this to the 1.3 milestone May 2, 2021
@jreback
Copy link
Contributor

jreback commented May 2, 2021

looks fine ex @jbrockmendel comment.

@topper-123 topper-123 force-pushed the RangeIndex.astype_categorical branch from 6ce8940 to 7840030 Compare May 3, 2021 05:30
@topper-123
Copy link
Contributor Author

I`ve updated.

Copy link
Member

@jbrockmendel jbrockmendel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pytest.mark.parametrize("name", [None, "foo"])
@pytest.mark.parametrize("ordered", [True, False])
def test_astype_category(self, copy, name, ordered, simple_index):
super().test_astype_category(copy, name, ordered, simple_index)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm we don't usually call the super functions as they are already called so this is confusing. i think you can remove. (assume my assertion is true)

# dtype='category' defaults to ordered=False, so only test once
dtypes.append("category")

for dtype in dtypes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you not simply construct the expected result and use assert_index_equal(exact=True)

Copy link
Contributor Author

@topper-123 topper-123 May 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's better. It however exposed a bug because assert_index_equal(exact=True) didn't actually work as expected.

I fixed that bug in assert_index_equal and changed these tests to use exact=True.

@jreback jreback merged commit d6cb249 into pandas-dev:master May 5, 2021
@jreback
Copy link
Contributor

jreback commented May 5, 2021

thanks @topper-123 (and also for fixing exact!)

@topper-123 topper-123 deleted the RangeIndex.astype_categorical branch May 5, 2021 13:41
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Index Related to the Index class or subclasses
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants