Skip to content

Commit

Permalink
BUG: Crosstab with margins=True ignoring dropna=True
Browse files Browse the repository at this point in the history
closes #12577
closes #12614
  • Loading branch information
OXPHOS authored and jreback committed Mar 16, 2016
1 parent f71537a commit f7faee0
Show file tree
Hide file tree
Showing 3 changed files with 68 additions and 0 deletions.
37 changes: 37 additions & 0 deletions doc/source/whatsnew/v0.18.1.txt
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,43 @@ Bug Fixes

- Bug in ``Period`` and ``PeriodIndex`` creation raises ``KeyError`` if ``freq="Minute"`` is specified. Note that "Minute" freq is deprecated in v0.17.0, and recommended to use ``freq="T"`` instead (:issue:`11854`)










- Bug in ``value_counts`` when ``normalize=True`` and ``dropna=True`` where nulls still contributed to the normalized count (:issue:`12558`)












- Bug in ``CategoricalIndex.get_loc`` returns different result from regular ``Index`` (:issue:`12531`)
















- Bug in ``pivot_table`` when ``margins=True`` and ``dropna=True`` where nulls still contributed to margin count (:issue:`12577`)
2 changes: 2 additions & 0 deletions pandas/tools/pivot.py
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,8 @@ def pivot_table(data, values=None, index=None, columns=None, aggfunc='mean',
table = table.fillna(value=fill_value, downcast='infer')

if margins:
if dropna:
data = data[data.notnull().all(axis=1)]
table = _add_margins(table, data, values, rows=index,
cols=columns, aggfunc=aggfunc,
margins_name=margins_name)
Expand Down
29 changes: 29 additions & 0 deletions pandas/tools/tests/test_pivot.py
Original file line number Diff line number Diff line change
Expand Up @@ -936,6 +936,35 @@ def test_crosstab_no_overlap(self):

tm.assert_frame_equal(actual, expected)

def test_margin_ignore_dropna_bug(self):
# GH 12577
# pivot_table counts null into margin ('All')
# when margins=true and dropna=true

df = pd.DataFrame({'a': [1, 2, 2, 2, 2, np.nan],
'b': [3, 3, 4, 4, 4, 4]})
actual = pd.crosstab(df.a, df.b, margins=True, dropna=True)
expected = pd.DataFrame([[1, 0, 1], [1, 3, 4], [2, 3, 5]])
expected.index = Index([1.0, 2.0, 'All'], name='a')
expected.columns = Index([3, 4, 'All'], name='b')
tm.assert_frame_equal(actual, expected)

df = DataFrame({'a': [1, np.nan, np.nan, np.nan, 2, np.nan],
'b': [3, np.nan, 4, 4, 4, 4]})
actual = pd.crosstab(df.a, df.b, margins=True, dropna=True)
expected = pd.DataFrame([[1, 0, 1], [0, 1, 1], [1, 1, 2]])
expected.index = Index([1.0, 2.0, 'All'], name='a')
expected.columns = Index([3.0, 4.0, 'All'], name='b')
tm.assert_frame_equal(actual, expected)

df = DataFrame({'a': [1, np.nan, np.nan, np.nan, np.nan, 2],
'b': [3, 3, 4, 4, 4, 4]})
actual = pd.crosstab(df.a, df.b, margins=True, dropna=True)
expected = pd.DataFrame([[1, 0, 1], [0, 1, 1], [1, 1, 2]])
expected.index = Index([1.0, 2.0, 'All'], name='a')
expected.columns = Index([3, 4, 'All'], name='b')
tm.assert_frame_equal(actual, expected)

if __name__ == '__main__':
import nose
nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'],
Expand Down

0 comments on commit f7faee0

Please sign in to comment.