-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
Add normalization to crosstab #12578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fbc15c5 to
0f835b7
Compare
|
Thanks for the PR:) As pointed in #12569, I prefer adding new |
|
@sinhrks Yes, combining an (Or rather, I added two exceptions -- one for passing Since I can't think of an actual function you could pass to |
|
@nickeubank, no you simply allow string args to what @sinhrks suggests prob makes the most sense
|
pandas/tools/pivot.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in any event this is just: table/x.sum(1). we almost never actually want to use .apply
|
@sinhrks little better? |
pandas/tools/pivot.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a versionadded tag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may better to describe normalization is performed for the entire level (not for sub-levels) in MultiIndex case.
|
I think it is nice if |
|
I am slightly -1 on adding this functionality to
|
|
Thanks @jorvisvandenbosse -- I think that's well said. Perhaps we should solicit a few more opinions to see if we can move towards consensus on this? |
|
@jorisvandenbossche soln seems reasonable. you might want to enable |
|
That would be my preference |
+1 |
|
My opinion is based on the understanding that
I don't have strong opposition against @jorisvandenbossche 's option. One concern is API gets complex more than required. It's less likely to normalize other values than count and sum. |
That's what I said above as well.
Indeed, in general I am very reluctant in adding new keyword arguments. But in this case, I think it makes it more complex to explain what
That's indeed true. |
|
Sounds like we're agreed I think my main two thoughts are:
Regarding complexity, I think of crosstab as a tool that may be used to generate analysis outputs to potentially put in papers (that's my interest at least). Given that, I think making it as flexible and powerful is highly desirable, even at the cost of an extra key word. |
|
so we are talking about doing these ops: This kind of feels like a post-processing step rather than a result of a single operation. |
|
You can indeed rather simple do this after |
|
yeah, just thinking if we have a kw (which is fine), then should explain what it is actually doing in the doc-string. As not completely obvious. another option, is maybe to actually have a
|
|
Yes, it's post-processing. Indeed, that's how it's implemented. The one complication with a stand-alone normalize is that it can't be easily designed to deal with the |
|
@nickeubank well one could argue the So that's really another abstraction and just being shoved into the current one. |
e7e8c19 to
300ed92
Compare
c70f569 to
39718b8
Compare
pandas/tools/tests/test_pivot.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something this, just do in a list, as the tests are more clear
for arg in ['index', True, 'columns']:
result = ....
tm.assert_frame.....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this too
|
looks pretty good. |
doc/source/whatsnew/v0.18.1.txt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't you add a mini-example here (same one)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback. Thanks. I'm going to be traveling away from my computer for a
little over two weeks. I'm not sure what the timing is for 0.18.1. I'm
happy to make changes when I get back, but won't be able to do anything
till then.
On Mon, Apr 4, 2016 at 10:59 AM Jeff Reback notifications@github.com
wrote:
In doc/source/whatsnew/v0.18.1.txt
#12578 (comment):@@ -18,8 +18,7 @@ Highlights include:
New features- - +- ``pd.crosstab()`` has gained ``normalize`` argument for normalizing frequency tables (:issue:`12569`). Examples in updated docs :ref:`here <reshaping.crosstabulations>`.why don't you add a mini-example here (same one)
—
You are receiving this because you were mentioned.Reply to this email directly or view it on GitHub
https://github.com/pydata/pandas/pull/12578/files/39718b8f4f398d9cdf6c38583453c804675a5e6a#r58418709
|
@nickeubank when you are back, pls rebase. This looked pretty good. |
39718b8 to
5d27469
Compare
|
@jreback rebased! |
5d27469 to
d9764ec
Compare
|
@jreback tweaked |
d9764ec to
69df06c
Compare
doc/source/reshaping.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very small detail, but I think there is one space too much at the beginning of this line
69df06c to
f2474c3
Compare
|
@jorisvandenbossche all integrated |
pandas/tools/pivot.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never used, correct?
f2474c3 to
c4b5847
Compare
|
@sinhrks ok! updated |
c4b5847 to
e5015f8
Compare
|
thanks @nickeubank |
Closes #12569
Note does NOT address #12577