Inconsistent results for groupby on multiple columns with NaN value

When grouping a DataFrame on multiple columns, and one of the columns contains a NaN value, the row will be excluded from the result, this is already documented here: https://github.com/pandas-dev/pandas/issues/3729 and has an open issue.

But the .groups attribute **includes** these NaN rows, which is inconsistent with the groupby results.

See this simple example:

```python
import pandas as pd
import numpy as np
df = pd.DataFrame({'a': ['1', '2', '3'], 'b': ['4', np.NaN, '6'], 'c': [7, 8, 9]})

df.groupby(['a', 'b']).groups
Out[75]: 
{('1', '4'): Int64Index([0], dtype='int64'),
 ('2', nan): Int64Index([1], dtype='int64'), ##### Is excluded when iterating the results
 ('3', '6'): Int64Index([2], dtype='int64')}


{k: g.index for k, g in df.groupby(['a', 'b'])}
Out[76]: 
{('1', '4'): Int64Index([0], dtype='int64'),
 ('3', '6'): Int64Index([2], dtype='int64')}

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Inconsistent results for groupby on multiple columns with NaN value #17445

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Inconsistent results for groupby on multiple columns with NaN value #17445

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions