Closed
Description
In pandas, unlike SQL, the rows seemed to be joining on null values. Is this a bug?
related SO: http://stackoverflow.com/questions/23940181/pandas-merging-with-missing-values/23940686#23940686
Code snippet
import pandas as pd
import numpy as np
df1 = pd.DataFrame(
[[1, None],
[2, 'y']],
columns = ['A', 'B']
)
print df1
df2 = pd.DataFrame(
[['y', 'Y'],
[None, 'None1'],
[None, 'None2']],
columns = ['B', 'C']
)
print df2
print df1.merge(df2, on='B', how='outer')
Output
A B
0 1 None
1 2 y
B C
0 y Y
1 None None1
2 None None2
A B C
0 1 None None1
1 1 None None2
2 2 y Y
You can see row 0 in df1 unexpectedly joins to both rows in df2.
I would expect the correct answer to be
A B C
0 1 None None
1 2 y Y
2 None None None1
3 None None None2