Skip to content

BUG: concat gives incorrect result when MultiIndex values are all NA #47802

Closed
@RobinFiveWords

Description

@RobinFiveWords

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

idx1 = pd.MultiIndex.from_arrays([[1.0],
                                  [2.0]],
                                 names=['a', 'b'])
ser1 = pd.Series([1], index=idx1, name='count1')

idx2 = pd.MultiIndex.from_arrays([[pd.NA],
                                  [pd.NA]],
                                 names=['a', 'b'])
ser2 = pd.Series([1], index=idx2, name='count2')

print(pd.concat((ser1, ser2), axis=1))
print()
print(pd.concat((ser2, ser1), axis=1))
print()
print(pd.__version__)

#          count1  count2
# a   b                  
# 1.0 2.0     1.0       1
# NaN NaN     NaN       1
# 
#          count2  count1
# a   b                  
# NaN NaN       1     NaN
# 1.0 2.0       1     1.0
# 
# 1.4.2

Issue Description

concat doesn't correctly join where all levels of a MultiIndex are NA. In version 1.4.2 (later confirmed in v.1.4.3), concat "over-matches" these all-NA rows to other rows.

I feel like there is previous discussion online of all-NA rows of a MultiIndex but I was unable to find it.

Please note:

  • Version 1.3.5 gives a different wrong result for the second concat, only returning the row with all-NA index values.
  • The wrong result occurs when the MultiIndex has only one level, and its value is NA, but I thought using a one-level MultiIndex in the example would confuse rather than simplify the issue. The correct result occurs with an Index that is not a MultiIndex.

Expected Behavior

The correct behavior would result in this:

print(pd.concat((ser1, ser2), axis=1))
#          count1  count2
# a   b                  
# 1.0 2.0     1.0     NaN
# NaN NaN     NaN     1.0

Installed Versions

This was run in the interactive shell at

INSTALLED VERSIONS

commit : 4bfe3d0
python : 3.10.2.final.0
python-bits : 32
OS : Emscripten
OS-release : 1.0
Version : #1
machine : wasm32
byteorder : little
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.4.2
numpy : 1.22.3
pytz : 2022.1
dateutil : 2.8.2
setuptools : 62.0.0
IPython : 8.4.0
matplotlib : 3.5.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugNeeds TriageIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions