Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
BUG: Merge on CategoricalIndex fails if left_index=True & right_index=True, but not if on={index} #28189 #28296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Merge on CategoricalIndex fails if left_index=True & right_index=True, but not if on={index} #28189 #28296
Changes from all commits
11cb1f8
95c5cf8
fd1d3f1
8321075
d938b1b
2bbc491
4e8bf1f
be85da6
9915105
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the reset_index required or can the index be included as part of
expected
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left it out of the test because of the issues @hugoecarl mentioned on the PR. If the index are included, the test will break once they're fixed, but if you want, it could be included and I would need to change the
expected
df.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by "the test will break once they're fixed"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What @guipleite is trying to say is that when you use the left-join, there is a problem with the index values output. The bug is related with the issues @hugoecarl pointed here in the PR. With that been said, if we include index as a part of the expected df, we would have to force the wrong output to prove that the bug pointed on this PR is solved, then when those other issues are corrected, this test would be outdated. Does it make sense to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC you are saying there are multiple bugs and this solves one of them, but the other would ned to be solved separately right? If so is there an open issue for that? Might have missed it in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! This is particularly solving issue #28189 . This other bug that we accidentally found have already been reported in issues #28220 and #28243 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm but don't those issues have to do with missing data on merge? Maybe missing the point but don't see how applicable here. Isn't the expected index still just
Index(Categorical(["1"] * 4), name="idx")
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maybe I'm missing the point here, so I'll take a step back. We worked on issue #28189 and the problem was that
merged = pd.merge(pdf, agg, how="left", left_index=True, right_index=True)
with groupby and categorical index was returning an error when it should do the same thing aspd.merge(pdf, agg, how="left", on="idx")
which worked. We understood that in a Cython file there was no support ford type int8 and int16, so we added. After that we discovered the for some unknown reason the index values output was int64 where it should be categorical. After a lot of debugging we concluded that we could not find the problem but we decided do make this pull request for the int8 and int16 support. This other issues mentioned show some bugs related to merge function and index output. We thought that maybe this is relatable with this bugs we are having, that were actually hidden by the int8 and int16 non-support.At this point we think we reached our limit and to continue this, help would be needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm OK - I think I understand now. So this change doesn't raise an error but
left_index=True
andright_index=True
still would not yield the same output ason="idx"
since the former would drop the index, right?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly that! We just added the support to int8 and int16 type that uncovered this output bug.