Fix for DataFrame.hist() with by- and weights-keyword #11028

Twizzledrizzle · 2015-09-08T20:34:19Z

for example:

import pandas as pd
d = {'one' : ['A', 'A', 'B', 'C'],
     'two' : [4., 3., 2., 1.],
     'three' : [10., 8., 5., 7.]}     
df = pd.DataFrame(d)
df.hist('two', by='one', weights='three', bins=range(0, 10))

does not seem to break anything, but this is my first meddling in the pandas library, so a review would be nice

will make this work: import pandas as pd d = {'one' : ['A', 'A', 'B', 'C'], 'two' : [4., 3., 2., 1.], 'three' : [10., 8., 5., 7.]} df = pd.DataFrame(d) df.hist('two', by='one', weights='three', bins=range(0, 10))

weights=weights wrong, weights=None

TomAugspurger · 2015-09-09T12:43:41Z

pandas/tools/plotting.py

-        ax.hist(group.dropna().values, bins=bins, **kwargs)
+    def plot_group(group, ax, weights=None):
+        if weights is not None:
+            weights=weights.dropna().values


I suspect this will fail if the missing values in the weights column don't align perfectly with the missing values in the group column. It might be cleaner to refactor this to drop rows missing either group or weight earlier on, so that plot_group only has to deal with valid observations.

I added this because on the next line, group drops na values as well?
ax.hist(group.dropna().values

ah, issues when na is different on weights and group arrays, did not think of this, will think a little bit more

TomAugspurger · 2015-09-09T12:49:13Z

Just gave a quick look through here. This is a good idea that we should support.

We'll need tests for this. Put them in pandas/tests/test_graphics.py. Make sure to cover cases where

by is None or a column
weights is an array of weights or a string column
values has missing data and / or weights has missing data

The plotting stuff is being refactored a bunch currently, so I've tagged this for the 0.18. You might want to hold off on making more changes, but in the meantime you can write tests for this.

jreback · 2015-09-10T11:12:41Z

FYI the checking is quite similar to how weights are checked for DataFrame.sample, so would want to make this a common function (could be a private function on a dataframe)

Twizzledrizzle · 2015-09-11T20:59:11Z

Thanks jreback, I will try my best to add some tests. I will also check what can be done with synching dropna with the weighs.

I was thinking in the lines of
drow rows in weighs, that are na in group
drop rows na in group
fillna in weighs with zero, so they do not count to anything

would this be ok?

Twizzledrizzle · 2015-09-11T21:00:40Z

Or more logical to drop all rows that are NA in group or in weights?

TomAugspurger · 2015-09-11T21:02:37Z

I'd say your lest method. Something like data.dropna(subset=['group', 'weight'])

jreback · 2015-10-25T15:24:38Z

@TomAugspurger can you review

Twizzledrizzle · 2015-10-25T16:54:25Z

Sorry I have not had time to continue with the tests, my limited git knowledge made it tough interacting with the repo. I planned to look how my patch worked with the new release

Twizzledrizzle · 2015-10-27T14:27:41Z

Added a new pull request here: #11441

for the new version, sorry for not using git correctly :(

TomAugspurger · 2015-10-27T20:25:50Z

Superseded by #11441

Fix for DataFrame.hist() with by- and weights-keyword

18e2f67

will make this work: import pandas as pd d = {'one' : ['A', 'A', 'B', 'C'], 'two' : [4., 3., 2., 1.], 'three' : [10., 8., 5., 7.]} df = pd.DataFrame(d) df.hist('two', by='one', weights='three', bins=range(0, 10))

Twizzledrizzle mentioned this pull request Sep 8, 2015

Using 'by' and 'weights' together with DataFrame.hist() results in ValueError: weights should have the same shape as x #9540

Open

Twizzledrizzle added 2 commits September 9, 2015 08:49

typo :(

32d85f7

weights=weights wrong, weights=None

fix for scatterplot not having the 'weights' implemented

1c0df89

TomAugspurger reviewed Sep 9, 2015
View reviewed changes

TomAugspurger added this to the 0.18.0 milestone Sep 9, 2015

jreback added the Visualization plotting label Sep 10, 2015

Twizzledrizzle mentioned this pull request Oct 27, 2015

Fix for DataFrame.hist() with by- and weights-keyword #11441

Closed

TomAugspurger closed this Oct 27, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix for DataFrame.hist() with by- and weights-keyword #11028

Fix for DataFrame.hist() with by- and weights-keyword #11028

Uh oh!

Twizzledrizzle commented Sep 8, 2015

Uh oh!

TomAugspurger Sep 9, 2015

Uh oh!

Twizzledrizzle Sep 9, 2015

Uh oh!

Twizzledrizzle Sep 9, 2015

Uh oh!

TomAugspurger commented Sep 9, 2015

Uh oh!

jreback commented Sep 10, 2015

Uh oh!

Twizzledrizzle commented Sep 11, 2015

Uh oh!

Twizzledrizzle commented Sep 11, 2015

Uh oh!

TomAugspurger commented Sep 11, 2015

Uh oh!

jreback commented Oct 25, 2015

Uh oh!

Twizzledrizzle commented Oct 25, 2015

Uh oh!

Twizzledrizzle commented Oct 27, 2015

Uh oh!

TomAugspurger commented Oct 27, 2015

Uh oh!

Uh oh!

Uh oh!

Fix for DataFrame.hist() with by- and weights-keyword #11028

Fix for DataFrame.hist() with by- and weights-keyword #11028

Uh oh!

Conversation

Twizzledrizzle commented Sep 8, 2015

Uh oh!

TomAugspurger Sep 9, 2015

Choose a reason for hiding this comment

Uh oh!

Twizzledrizzle Sep 9, 2015

Choose a reason for hiding this comment

Uh oh!

Twizzledrizzle Sep 9, 2015

Choose a reason for hiding this comment

Uh oh!

TomAugspurger commented Sep 9, 2015

Uh oh!

jreback commented Sep 10, 2015

Uh oh!

Twizzledrizzle commented Sep 11, 2015

Uh oh!

Twizzledrizzle commented Sep 11, 2015

Uh oh!

TomAugspurger commented Sep 11, 2015

Uh oh!

jreback commented Oct 25, 2015

Uh oh!

Twizzledrizzle commented Oct 25, 2015

Uh oh!

Twizzledrizzle commented Oct 27, 2015

Uh oh!

TomAugspurger commented Oct 27, 2015

Uh oh!

Uh oh!