Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: re-evaluate merge fix of PR #24916 #25001

Closed
jorisvandenbossche opened this issue Jan 29, 2019 · 1 comment · Fixed by #25009
Closed

REGR: re-evaluate merge fix of PR #24916 #25001

jorisvandenbossche opened this issue Jan 29, 2019 · 1 comment · Fixed by #25009
Labels
Blocker Blocking issue or pull request for an upcoming release Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Jan 29, 2019

See my comments and the possible regression on PR #24916 (code review #24916 (review)).

We can maybe have the code discussion there on the diff, but opening this issue as a reminder that we need to fix this for 0.25.0.

@jorisvandenbossche jorisvandenbossche added Regression Functionality that used to work in a prior pandas version Blocker Blocking issue or pull request for an upcoming release labels Jan 29, 2019
@jorisvandenbossche jorisvandenbossche added this to the 0.25.0 milestone Jan 29, 2019
@jorisvandenbossche
Copy link
Member Author

And to copy the regression case from the PR here:

One specific case that seems to be broken by this that I found, is using a categorical as the merge key:

In [15]: left = pd.DataFrame({'a': [1, 2, 3], 'key': pd.Categorical(['a', 'a', 'b'], categories=['a', 'b', 'c'])}) 
    ...: right = pd.DataFrame({'b': [1, 2, 3]}, index=pd.Categorical(['a', 'b', 'c']))                                                                                                                              

In [16]: left.merge(right,  left_on='key', right_index=True, how='right')                                                                                                                                           
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-16-e08f8fc28c75> in <module>
----> 1 left.merge(right,  left_on='key', right_index=True, how='right')

~/scipy/pandas/pandas/core/frame.py in merge(self, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
   6875                      right_on=right_on, left_index=left_index,
   6876                      right_index=right_index, sort=sort, suffixes=suffixes,
-> 6877                      copy=copy, indicator=indicator, validate=validate)
   6878 
   6879     def round(self, decimals=0, *args, **kwargs):

~/scipy/pandas/pandas/core/reshape/merge.py in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
     46                          copy=copy, indicator=indicator,
     47                          validate=validate)
---> 48     return op.get_result()
     49 
     50 

~/scipy/pandas/pandas/core/reshape/merge.py in get_result(self)
    544                 self.left, self.right)
    545 
--> 546         join_index, left_indexer, right_indexer = self._get_join_info()
    547 
    548         ldata, rdata = self.left._data, self.right._data

~/scipy/pandas/pandas/core/reshape/merge.py in _get_join_info(self)
    762                     join_index = self.right.index.take(right_indexer)
    763                     left_indexer = np.array([-1] * len(join_index))
--> 764             elif self.left_index:
    765                 if len(self.right) > 0:
    766                     join_index = self.right.index.take(right_indexer)

~/scipy/pandas/pandas/core/reshape/merge.py in _create_join_index(self, index, other_index, indexer, other_indexer, how)
    811 
    812         # ugh, spaghetti re #733
--> 813         if _any(self.left_on) and _any(self.right_on):
    814             for lk, rk in zip(self.left_on, self.right_on):
    815                 if is_lkey(lk):

ValueError: invalid literal for int() with base 10: 'c'

The above is failing on master now, but works on 0.23.4 / 0.24.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocker Blocking issue or pull request for an upcoming release Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant