Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
idx1 = pd.MultiIndex.from_arrays([[1.0],
[2.0]],
names=['a', 'b'])
ser1 = pd.Series([1], index=idx1, name='count1')
idx2 = pd.MultiIndex.from_arrays([[pd.NA],
[pd.NA]],
names=['a', 'b'])
ser2 = pd.Series([1], index=idx2, name='count2')
print(pd.concat((ser1, ser2), axis=1))
print()
print(pd.concat((ser2, ser1), axis=1))
print()
print(pd.__version__)
# count1 count2
# a b
# 1.0 2.0 1.0 1
# NaN NaN NaN 1
#
# count2 count1
# a b
# NaN NaN 1 NaN
# 1.0 2.0 1 1.0
#
# 1.4.2
Issue Description
concat
doesn't correctly join where all levels of a MultiIndex
are NA. In version 1.4.2 (later confirmed in v.1.4.3), concat "over-matches" these all-NA rows to other rows.
I feel like there is previous discussion online of all-NA rows of a MultiIndex but I was unable to find it.
Please note:
- Version 1.3.5 gives a different wrong result for the second concat, only returning the row with all-NA index values.
- The wrong result occurs when the MultiIndex has only one level, and its value is NA, but I thought using a one-level MultiIndex in the example would confuse rather than simplify the issue. The correct result occurs with an Index that is not a MultiIndex.
Expected Behavior
The correct behavior would result in this:
print(pd.concat((ser1, ser2), axis=1))
# count1 count2
# a b
# 1.0 2.0 1.0 NaN
# NaN NaN NaN 1.0
Installed Versions
This was run in the interactive shell at
INSTALLED VERSIONS
commit : 4bfe3d0
python : 3.10.2.final.0
python-bits : 32
OS : Emscripten
OS-release : 1.0
Version : #1
machine : wasm32
byteorder : little
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.4.2
numpy : 1.22.3
pytz : 2022.1
dateutil : 2.8.2
setuptools : 62.0.0
IPython : 8.4.0
matplotlib : 3.5.1