Skip to content

two dataframes outer join on null values #7473

Closed
@socheon

Description

@socheon

In pandas, unlike SQL, the rows seemed to be joining on null values. Is this a bug?
related SO: http://stackoverflow.com/questions/23940181/pandas-merging-with-missing-values/23940686#23940686

Code snippet

import pandas as pd 
import numpy as np

df1 = pd.DataFrame(
    [[1, None],
    [2, 'y']],
    columns = ['A', 'B']
)
print df1

df2 = pd.DataFrame(
    [['y', 'Y'],
    [None, 'None1'],
    [None, 'None2']],
    columns = ['B', 'C']
)
print df2

print df1.merge(df2, on='B', how='outer')

Output

   A     B
0  1  None
1  2     y

      B      C
0     y      Y
1  None  None1
2  None  None2

   A     B      C
0  1  None  None1
1  1  None  None2
2  2     y      Y

You can see row 0 in df1 unexpectedly joins to both rows in df2.

I would expect the correct answer to be

   A       B      C
0  1      None  None
1  2       y      Y
2 None None None1
3 None None None2

Metadata

Metadata

Assignees

No one assigned

    Labels

    DocsMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions