BUG: Crosstab margins ignoring dropna #12614

OXPHOS · 2016-03-14T07:12:22Z

To fix bug #12577 : Crosstab margins ignoring dropna

To fix bug pandas-dev#12577: Crosstab margins ignoring dropna

jreback · 2016-03-14T12:56:18Z

tests!

OXPHOS · 2016-03-14T15:03:14Z

I did test several groups of data on my Mac and they worked out well.. Can I have some hints where could go wrong? Thanks!

jreback · 2016-03-14T15:22:07Z

you need to add tests to the codebase. pandas/tools/tests/test_pivot.py. see the docs

nickeubank · 2016-03-14T17:40:59Z

@OXPHOS Thanks for the PR!

Contributions to pandas require not only that code be modified, but also that you add tests that ensure the proper behavior. This helps ensure that when other people change code in the future, if they do something that breaks the code you've now written, when they run the test suite (which will include your tests) they'll catch the error.

The section of the dogs that talks about it is here. http://pandas.pydata.org/pandas-docs/stable/contributing.html#test-driven-development-code-writing

As a basic test, you can basically write out the case from the issue report. You should probably put around line 938 in pandas/tools/tests/test_pivot.py

if dropna, pass truncated data to _add_margin

OXPHOS · 2016-03-14T18:33:21Z

Hi @nickeubank , thanks for your instruction and thanks @jreback !

I modified the codes and added tests.
For modification: truncated data pass to _add_margin if dropna == true in pivot.pivot_table()
For tests: added 3 tests in test_pivot.py. Yet I am not sure about the test style (i.e. class name).

I tried to add another test case:
df = DataFrame({'a':[1, np.nan, np.nan, np.nan, np.nan, 2], 'b':[np.nan, 3, 4, 4, 4, np.nan]}) print (crosstab(df.a,df.b, margins=True))

All intersected entries are NaN. However, this case leads to a general fail (even without my modification) and to be honest I don't how what the expected result should be like.

Error Information:

raise ValueError('No objects to concatenate')
ValueError: No objects to concatenate

I assume we need try/except somewhere else upstream (but I am rushing for lunch..)

jreback · 2016-03-14T18:51:58Z

pandas/tools/pivot.py

-                             margins_name=margins_name)
+        if dropna:
+            data_dropna = data[data.notnull().all(axis = 1)]
+            table = _add_margins(table, data_dropna, values, rows=index,


you don't need to repeat the if, just do:

if dropna: data = data[......]

You mean like below? I felt unsafe changing data values but if you're okay with it I'll update the codes.

if dropna: data = data[data.notnull().all(axis = 1)] table = _add_margins(table, data, values, rows=index, cols=columns, aggfunc=aggfunc, margins_name=margins_name)

how are you changing data values?

I think she means over-writing data. That's fine if all tests clear!

Got it. Thanks!

jreback · 2016-03-14T18:52:51Z

pls add a whatsnew entry

nickeubank · 2016-03-14T19:14:02Z

@OXPHOS I think that the "general fail" you note where no objects intersect is actually behaving fine -- it's throwing an informative exception, and I can't think of another way it could be handled.

Here are docs for adding a "whatsnew" entry -- basically, this is just a notification file that gives people a headsup about changes. I think this will be in version 0.18.1 (@jreback sound right?)

jreback · 2016-03-14T19:45:47Z

yep

simplified codes in pivot_table() updated what’s new entry updated comments in test_pivot.py

OXPHOS · 2016-03-15T02:09:27Z

... I am so sorry I ran flake8 as the first step for the tests but didn't go back to it right before commit or after modification. I know it took a long time to run through the test line. I'll do better this time.

OXPHOS · 2016-03-16T00:40:14Z

Wow I passed! Thanks @nickeubank and @jreback guys for your help! I saw a conflict with base branch now. Anything else I should do? Thanks!

jreback · 2016-03-16T00:51:21Z

pandas/tools/tests/test_pivot.py

+
+        df = pd.DataFrame({'a': [1, 2, 2, 2, 2, np.nan],
+                           'b': [3, 3, 4, 4, 4, 4]})
+        actual = pd.crosstab(df.a, df.b, margins=True)


explictly add dropna=True for these. tests

then add a test for dropna=False (same set of tests)
and you will find it raises. the level is None fails.

In [10]: pd.crosstab(df.a,df.b,margins=True,dropna=False) KeyError: 'Level None not found'

jreback · 2016-03-16T00:51:40Z

looks good. see my comment above. Have to handle another case here.

OXPHOS · 2016-03-16T05:18:48Z

Okay so here's what I found:

This is an independent issue. Apparently no previous tests have conditions margins=True and dropna=False at the same time.
For now, I added if to table.columns.names level in _add_margins and avoided the error. I am not sure whether we need the same condition in table.rows.names level.
But the problem raised from the if not dropna: table = table.reindex_axis(m, axis=1) in pivot_table, which erased all column names. I have been confused by the column names for a while. I was not sure whether the column names would show up on the side. Then it turns out that the show-up of column names completely depends on dropna, which is kinda weird. Apparently this is a problem for a while and I think it's worth fixing. However, I tried to play with table = table.reindex_axis(m, axis=1) but failed. I haven't come up with a good idea to solve the problem so far.

jreback · 2016-03-16T13:12:30Z

@OXPHOS ok let's do this.

going to merge your current PR.

Pls open an issue for the other case with a simple repro example (and cross reference to this PR).

OXPHOS · 2016-03-16T15:14:04Z

Will open soon.

Modified _generate_marginal_results

9c0d99b

To fix bug pandas-dev#12577: Crosstab margins ignoring dropna

OXPHOS changed the title ~~Modified _generate_marginal_results~~ Modified pivot._generate_marginal_results Mar 14, 2016

jreback added Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Mar 14, 2016

change in pivot_table-if margin

7c28949

if dropna, pass truncated data to _add_margin

OXPHOS changed the title ~~Modified pivot._generate_marginal_results~~ BUG: Crosstab margins ignoring dropna Mar 14, 2016

jreback reviewed Mar 14, 2016
View reviewed changes

Documentaion

0039da5

simplified codes in pivot_table() updated what’s new entry updated comments in test_pivot.py

Fix stylistic errors

76fbacf

jreback added this to the 0.18.1 milestone Mar 16, 2016

jreback reviewed Mar 16, 2016
View reviewed changes

OXPHOS added 2 commits March 16, 2016 00:55

Fixed margin=true, dropna=false error

869a253

Update what's new

d3ed27a

jreback closed this in f7faee0 Mar 16, 2016

jreback mentioned this pull request Mar 16, 2016

BUG: crosstab with margins=True and dropna=False #12642

Closed

This was referenced Mar 16, 2016

BUG: Crosstab/Pivot_table columns.names change when toggling dropna #12645

Closed

BUG: crosstab with margins=True and dropna=False #12643

Closed

dsaxton mentioned this pull request Apr 15, 2020

BUG: margin=True is affected by NaN in other columns in pivot_table #33466

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Crosstab margins ignoring dropna #12614

BUG: Crosstab margins ignoring dropna #12614

OXPHOS commented Mar 14, 2016

jreback commented Mar 14, 2016

OXPHOS commented Mar 14, 2016

jreback commented Mar 14, 2016

nickeubank commented Mar 14, 2016

OXPHOS commented Mar 14, 2016

jreback Mar 14, 2016

OXPHOS Mar 14, 2016

jreback Mar 14, 2016

nickeubank Mar 14, 2016

OXPHOS Mar 14, 2016

jreback commented Mar 14, 2016

nickeubank commented Mar 14, 2016

jreback commented Mar 14, 2016

OXPHOS commented Mar 15, 2016

OXPHOS commented Mar 16, 2016

jreback Mar 16, 2016

jreback commented Mar 16, 2016

OXPHOS commented Mar 16, 2016

jreback commented Mar 16, 2016

OXPHOS commented Mar 16, 2016

BUG: Crosstab margins ignoring dropna #12614

BUG: Crosstab margins ignoring dropna #12614

Conversation

OXPHOS commented Mar 14, 2016

jreback commented Mar 14, 2016

OXPHOS commented Mar 14, 2016

jreback commented Mar 14, 2016

nickeubank commented Mar 14, 2016

OXPHOS commented Mar 14, 2016

jreback Mar 14, 2016

Choose a reason for hiding this comment

OXPHOS Mar 14, 2016

Choose a reason for hiding this comment

jreback Mar 14, 2016

Choose a reason for hiding this comment

nickeubank Mar 14, 2016

Choose a reason for hiding this comment

OXPHOS Mar 14, 2016

Choose a reason for hiding this comment

jreback commented Mar 14, 2016

nickeubank commented Mar 14, 2016

jreback commented Mar 14, 2016

OXPHOS commented Mar 15, 2016

OXPHOS commented Mar 16, 2016

jreback Mar 16, 2016

Choose a reason for hiding this comment

jreback commented Mar 16, 2016

OXPHOS commented Mar 16, 2016

jreback commented Mar 16, 2016

OXPHOS commented Mar 16, 2016