-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Fix for DataFrame.hist() with by- and weights-keyword #11028
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for DataFrame.hist() with by- and weights-keyword #11028
Conversation
will make this work: import pandas as pd d = {'one' : ['A', 'A', 'B', 'C'], 'two' : [4., 3., 2., 1.], 'three' : [10., 8., 5., 7.]} df = pd.DataFrame(d) df.hist('two', by='one', weights='three', bins=range(0, 10))
ax.hist(group.dropna().values, bins=bins, **kwargs) | ||
def plot_group(group, ax, weights=None): | ||
if weights is not None: | ||
weights=weights.dropna().values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect this will fail if the missing values in the weights
column don't align perfectly with the missing values in the group
column. It might be cleaner to refactor this to drop rows missing either group
or weight
earlier on, so that plot_group only has to deal with valid observations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this because on the next line, group drops na values as well?
ax.hist(group.dropna().values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, issues when na is different on weights and group arrays, did not think of this, will think a little bit more
Just gave a quick look through here. This is a good idea that we should support. We'll need tests for this. Put them in
The plotting stuff is being refactored a bunch currently, so I've tagged this for the 0.18. You might want to hold off on making more changes, but in the meantime you can write tests for this. |
FYI the checking is quite similar to how weights are checked for |
Thanks jreback, I will try my best to add some tests. I will also check what can be done with synching dropna with the weighs. I was thinking in the lines of would this be ok? |
Or more logical to drop all rows that are NA in group or in weights? |
I'd say your lest method. Something like |
@TomAugspurger can you review |
Sorry I have not had time to continue with the tests, my limited git knowledge made it tough interacting with the repo. I planned to look how my patch worked with the new release |
Added a new pull request here: #11441 for the new version, sorry for not using git correctly :( |
Superseded by #11441 |
closes #9540
for example:
does not seem to break anything, but this is my first meddling in the pandas library, so a review would be nice