-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Get dummies #4458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get dummies #4458
Conversation
I think this comes from:
Which is confusing, as
|
the failing build is here: https://travis-ci.org/hayd/pandas/builds/9834728 Workaround by just setting the columns manually (which is kinda annoying, should create separate issue). |
The other option is to have a more consistent |
@jreback speaking of assert_frame_equal did you see this weird bug? Also, can we merge this? |
@hayd what the I would add an example of |
otherwise looks ok |
Do you mean in reshaping.rst (it's not a new function btw just untested (!) and undocumented... is in wes' book :) ) |
but didn't you add the NA handling? (if yes, then need an example in whatsnew) (yes maybe an example/better example in reshape too!) |
ah, I put it in the release.rst (should it also be somewhere else too?) |
exp_just_na = DataFrame({nan: {0: 1.0}}) | ||
# hack (NaN handling in assert_index_equal) | ||
exp_just_na.columns = res_just_na.columns | ||
assert_frame_equal(res_just_na, exp_just_na) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback the weird assert_frame_equal bug is here (if you remove the hack, this fails, and can't repo outside of this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh...i see, nan
in indices is very odd (but somewhat supported), prob assert_frame_equal
just does .equals
on the indicies which I think fails when it has nan
...let me look
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm..that's not it...let me look further
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if you read my comment above: #4458 (comment) (I blame numpy)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hayd I actually think this is a more general issue; your hack ok for now....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
funny thing is I canno repro this, e.g. Index(['a','b',np.nan]).equals(Index(['a','b',np.nan]))
is True!
while in your example, the same is False!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know! It's really weird... it's thenp.testing.assert_array_equal
which is failing (and it's supposed to ignore nan!). The good thing is, with get_dummies in master we can now repo this. :)
@hayd ahh...I just thought a mention of this new 'feature' should be in whatsnew (with an example)....seems like a nice feature |
ENH add dummy_na argument to get_dummies TST add tests for get_dummies
added to what's new.. think i will leave doc writing for another day/pr. |
wow that build took a while: https://travis-ci.org/hayd/pandas/builds/10644820 whoop, get_dummies my favourite pandas function (now in the docs!) |
@hayd awesom! |
@jreback crap, this upset travis somehow (which is weird cos the last few commits have been green, and this was before I merged... in fact I linked to it above!) https://travis-ci.org/pydata/pandas/jobs/10646008 |
res_na = get_dummies(s, dummy_na=True) | ||
exp_na = DataFrame({nan: {0: 0.0, 1: 0.0, 2: 1.0}, | ||
'a': {0: 1.0, 1: 0.0, 2: 0.0}, | ||
'b': {0: 0.0, 1: 1.0, 2: 0.0}}).iloc[:, [1, 2, 0]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ha! I was just looking at that test before I saw it failed and thinking "hmmm does that work in python 3" - doh!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
obviously should be using exp_na.reindex_axis(['a', 'b', np.nan], 1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pushed fix to master
fixes #4446, #4444
Added new functionality dummy_na (thoughts?). it's slightly different to a possible dropna argument, which I haven't included (which can be achieved using
pd.get_dummies(s.dropna())
.Example:
Note: atm there is a (strange) test Failure with above example, not quite sure what's going on: