You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
importpandasaspdidx1=pd.MultiIndex.from_arrays([[1.0],
[2.0]],
names=['a', 'b'])
ser1=pd.Series([1], index=idx1, name='count1')
idx2=pd.MultiIndex.from_arrays([[pd.NA],
[pd.NA]],
names=['a', 'b'])
ser2=pd.Series([1], index=idx2, name='count2')
print(pd.concat((ser1, ser2), axis=1))
print()
print(pd.concat((ser2, ser1), axis=1))
print()
print(pd.__version__)
# count1 count2# a b # 1.0 2.0 1.0 1# NaN NaN NaN 1# # count2 count1# a b # NaN NaN 1 NaN# 1.0 2.0 1 1.0# # 1.4.2
Issue Description
concat doesn't correctly join where all levels of a MultiIndex are NA. In version 1.4.2 (later confirmed in v.1.4.3), concat "over-matches" these all-NA rows to other rows.
I feel like there is previous discussion online of all-NA rows of a MultiIndex but I was unable to find it.
Please note:
Version 1.3.5 gives a different wrong result for the second concat, only returning the row with all-NA index values.
The wrong result occurs when the MultiIndex has only one level, and its value is NA, but I thought using a one-level MultiIndex in the example would confuse rather than simplify the issue. The correct result occurs with an Index that is not a MultiIndex.
Expected Behavior
The correct behavior would result in this:
print(pd.concat((ser1, ser2), axis=1))
# count1 count2# a b # 1.0 2.0 1.0 NaN# NaN NaN NaN 1.0
Installed Versions
This was run in the interactive shell at
INSTALLED VERSIONS
commit : 4bfe3d0
python : 3.10.2.final.0
python-bits : 32
OS : Emscripten
OS-release : 1.0
Version : #1
machine : wasm32
byteorder : little
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
A workaround is to replace the NAs in the MultiIndex with some unique value, do the concat, and then put the NAs back in.
From skimming concat.py I get the impression that the join keys are being transcribed into a new list (I'm using list informally here), and perhaps something about this causes an all-NA list to satisfy the join criteria when it should be seen as different. Also, if something in how the keys are determined or the join is done was changed between 1.3 and 1.4, that could explain the different wrong results.
Uh oh!
There was an error while loading. Please reload this page.
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
concat
doesn't correctly join where all levels of aMultiIndex
are NA. In version 1.4.2 (later confirmed in v.1.4.3), concat "over-matches" these all-NA rows to other rows.I feel like there is previous discussion online of all-NA rows of a MultiIndex but I was unable to find it.
Please note:
Expected Behavior
The correct behavior would result in this:
Installed Versions
This was run in the interactive shell at
INSTALLED VERSIONS
commit : 4bfe3d0
python : 3.10.2.final.0
python-bits : 32
OS : Emscripten
OS-release : 1.0
Version : #1
machine : wasm32
byteorder : little
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.4.2
numpy : 1.22.3
pytz : 2022.1
dateutil : 2.8.2
setuptools : 62.0.0
IPython : 8.4.0
matplotlib : 3.5.1
The text was updated successfully, but these errors were encountered: