-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Crosstab margins ignoring dropna #12614
Conversation
To fix bug pandas-dev#12577: Crosstab margins ignoring dropna
tests! |
I did test several groups of data on my Mac and they worked out well.. Can I have some hints where could go wrong? Thanks! |
you need to add tests to the codebase. |
@OXPHOS Thanks for the PR! Contributions to pandas require not only that code be modified, but also that you add tests that ensure the proper behavior. This helps ensure that when other people change code in the future, if they do something that breaks the code you've now written, when they run the test suite (which will include your tests) they'll catch the error. The section of the dogs that talks about it is here. http://pandas.pydata.org/pandas-docs/stable/contributing.html#test-driven-development-code-writing As a basic test, you can basically write out the case from the issue report. You should probably put around line 938 in |
if dropna, pass truncated data to _add_margin
Hi @nickeubank , thanks for your instruction and thanks @jreback ! I modified the codes and added tests. I tried to add another test case: All intersected entries are NaN. However, this case leads to a general fail (even without my modification) and to be honest I don't how what the expected result should be like. Error Information:
I assume we need try/except somewhere else upstream (but I am rushing for lunch..) |
margins_name=margins_name) | ||
if dropna: | ||
data_dropna = data[data.notnull().all(axis = 1)] | ||
table = _add_margins(table, data_dropna, values, rows=index, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need to repeat the if, just do:
if dropna:
data = data[......]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean like below? I felt unsafe changing data values but if you're okay with it I'll update the codes.
if dropna: data = data[data.notnull().all(axis = 1)] table = _add_margins(table, data, values, rows=index, cols=columns, aggfunc=aggfunc, margins_name=margins_name)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how are you changing data values
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think she means over-writing data
. That's fine if all tests clear!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thanks!
pls add a whatsnew entry |
@OXPHOS I think that the "general fail" you note where no objects intersect is actually behaving fine -- it's throwing an informative exception, and I can't think of another way it could be handled. Here are docs for adding a "whatsnew" entry -- basically, this is just a notification file that gives people a headsup about changes. I think this will be in version 0.18.1 (@jreback sound right?) |
yep |
simplified codes in pivot_table() updated what’s new entry updated comments in test_pivot.py
... I am so sorry I ran flake8 as the first step for the tests but didn't go back to it right before commit or after modification. I know it took a long time to run through the test line. I'll do better this time. |
Wow I passed! Thanks @nickeubank and @jreback guys for your help! I saw a conflict with base branch now. Anything else I should do? Thanks! |
|
||
df = pd.DataFrame({'a': [1, 2, 2, 2, 2, np.nan], | ||
'b': [3, 3, 4, 4, 4, 4]}) | ||
actual = pd.crosstab(df.a, df.b, margins=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
explictly add dropna=True
for these. tests
then add a test for dropna=False
(same set of tests)
and you will find it raises. the level
is None
fails.
In [10]: pd.crosstab(df.a,df.b,margins=True,dropna=False)
KeyError: 'Level None not found'
looks good. see my comment above. Have to handle another case here. |
Okay so here's what I found:
|
@OXPHOS ok let's do this. going to merge your current PR. Pls open an issue for the other case with a simple repro example (and cross reference to this PR). |
Will open soon. |
To fix bug #12577 : Crosstab margins ignoring dropna