Skip to content

BUG: merge on multiple columns with mixed colums/index failing #11354

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorisvandenbossche opened this issue Oct 17, 2015 · 1 comment · Fixed by #29522
Closed

BUG: merge on multiple columns with mixed colums/index failing #11354

jorisvandenbossche opened this issue Oct 17, 2015 · 1 comment · Fixed by #29522
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@jorisvandenbossche
Copy link
Member

From the mailing list, considering this example:

In [56]: s = pd.Series(range(6), pd.MultiIndex.from_product([['A', 'B'], [1,2,3]], names=['lev1', 'lev2']), name='Amount')

In [57]: s
Out[57]:
lev1  lev2
A     1       0
      2       1
      3       2
B     1       3
      2       4
      3       5
Name: Amount, dtype: int64

In [58]: df = pd.DataFrame({'lev1':list('AAABBB'), 'lev2':[1,2,3,1,2,3], 'col':0})

In [59]: df
Out[59]:
   col lev1  lev2
0    0    A     1
1    0    A     2
2    0    A     3
3    0    B     1
4    0    B     2
5    0    B     3

So there are overlapping columns, but as the index in the series and as columns in the dataframe.
To merge both, these work:

In [63]: pd.concat([df.set_index(['lev1', 'lev2']), s], axis=1)
Out[63]:
           col  Amount
lev1 lev2
A    1       0       0
     2       0       1
     3       0       2
B    1       0       3
     2       0       4
     3       0       5

In [64]: pd.merge(df, s.reset_index(), on=['lev1', 'lev2'])
Out[64]:
   col lev1  lev2  Amount
0    0    A     1       0
1    0    A     2       1
2    0    A     3       2
3    0    B     1       3
4    0    B     2       4
5    0    B     3       5

But I would expect this also to work:

In [69]: df.merge(s, left_on=['lev1', 'lev2'], right_index=True)
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-69-1b78b4267e47> in <module>()
----> 1 df.merge(s, left_on=['lev1', 'lev2'], right_index=True)

C:\Anaconda\lib\site-packages\pandas\core\frame.pyc in merge(self, right, how, o
n, left_on, right_on, left_index, right_index, sort, suffixes, copy)
   4069                      left_on=left_on, right_on=right_on,
   4070                      left_index=left_index, right_index=right_index, sor
t=sort,
-> 4071                      suffixes=suffixes, copy=copy)
   4072
   4073     #-------------------------------------------------------------------

---

C:\Anaconda\lib\site-packages\pandas\tools\merge.pyc in merge(left, right, how,
on, left_on, right_on, left_index, right_index, sort, suffixes, copy)
     36                          right_index=right_index, sort=sort, suffixes=su
ffixes,
     37                          copy=copy)
---> 38     return op.get_result()
     39 if __debug__:
     40     merge.__doc__ = _merge_doc % '\nleft : DataFrame'

C:\Anaconda\lib\site-packages\pandas\tools\merge.pyc in get_result(self)
    184
    185     def get_result(self):
--> 186         join_index, left_indexer, right_indexer = self._get_join_info()
    187
    188         ldata, rdata = self.left._data, self.right._data

C:\Anaconda\lib\site-packages\pandas\tools\merge.pyc in _get_join_info(self)
    254     def _get_join_info(self):
    255         left_ax = self.left._data.axes[self.axis]
--> 256         right_ax = self.right._data.axes[self.axis]
    257         if self.left_index and self.right_index:
    258             join_index, left_indexer, right_indexer = \

IndexError: list index out of range
@jorisvandenbossche jorisvandenbossche added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Oct 17, 2015
@mroeschke
Copy link
Member

Looks to work on master now. Could use a test.

In [24]: df.merge(s, left_on=['lev1', 'lev2'], right_index=True)
Out[24]:
  lev1  lev2  col  Amount
0    A     1    0       0
1    A     2    0       1
2    A     3    0       2
3    B     1    0       3
4    B     2    0       4
5    B     3    0       5

In [27]: pd.__version__
Out[27]: '0.26.0.dev0+555.gf7d162b18'

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Oct 13, 2019
@jreback jreback added this to the 1.0 milestone Nov 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants