Skip to content

Fix for DataFrame.hist() with by- and weights-keyword #11441

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

Twizzledrizzle
Copy link

will make the following work

import numpy as np
import pandas as pd
d = {'one' : ['A', 'A', 'B', 'B', 'C'],
     'two' : [4., 3., 2., 1., np.nan],
     'three' : [10., 8., np.nan, 5., 7.]}     
df = pd.DataFrame(d)
df.hist('two', by='one', weights='three', bins=range(0, 10))
# or
df.hist('two', by=df.one.values, weights='three', bins=range(0, 10))

will make the following work

import pandas as pd
d = {'one' : ['A', 'A', 'B', 'C'],
     'two' : [4., 3., 2., 1.],
     'three' : [10., 8., 5., 7.]}
df = pd.DataFrame(d)
df.hist('two', by='one', weights='three', bins=range(0, 10))
@Twizzledrizzle
Copy link
Author

Because of my poor git skills, this is a continuation for
#11028

upgraded to work with the newest pandas release

I am at loss writing a test for this. @jreback I could not find the place you recommended. The place where weights are tested right now seems to be test_generic.py?

However I am unsure of how to construct a test the best way, I have never written one, so pointers are very welcome!

@Twizzledrizzle
Copy link
Author

Note, I added the column 'weighst' to the supplied DataFrame if an array was supplied instead of the column name... Can this be done in a nicer way? I need it in the groupby logic.

if a column is named 'weights' it will be overwritten... :(

@TomAugspurger
Copy link
Contributor

@Twizzledrizzle no worries about the git snafu, we've all been there. People are usually in the chat if you need help walking through git.

I think Jeff was talking about the logic in DataFrame.sample in pandas/core/frame.py. I haven't had a chance to look at your code to see if it's similar enough to refactor into a common function.

For the tests you can look to see how similar tests are done in pandas/tests/test_graphics.py. Basically you call the function your testing and assert that something about the result equals an expected value that you hardcode in the test.

I'll take a look later, but it won't be until next week sometime.

…nymore

Also made the logic better when using weights without group by by aligning NaN's, and finding out if weights is supplied by column or by array.
@Twizzledrizzle
Copy link
Author

Thanks @TomAugspurger I will take a look.

A lot of commits right now, finding things while working through the travis errors... I hope the server does not crash!

@TomAugspurger
Copy link
Contributor

Yeah, we'll come up with a better way to solve that weights column.

FYI you can run the tests locally, will be quicker than waiting on Travis. Make sure you have nose installed (pip install nose or conda install nose) and then run the tests with nosetests test_graphics from the pandas/tests directory.

Run specific tests with nosetests test_graphics:TestClass.test_function (I think)

@Twizzledrizzle
Copy link
Author

I have python 2.7 and every tests seems to work fine. Why could python 3.4 fail on

nosetests test_data:TestGoogle.test_get_multi1

Does not seem to have anything to do with my pull request?

xrot=None, ylabelsize=None, yrot=None, ax=None, sharex=False,
sharey=False, figsize=None, layout=None, bins=10, **kwds):
def hist_frame(data, column=None, weights=None, by=None, grid=True,
xlabelsize=None, xrot=None, ylabelsize=None, yrot=None, ax=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't change the order of parameters, just add at th eend

@jreback
Copy link
Contributor

jreback commented Oct 28, 2015

need tests

@jreback jreback added the Visualization plotting label Oct 28, 2015
@jreback
Copy link
Contributor

jreback commented Oct 28, 2015

@Twizzledrizzle ignore that goggle multi failure....its an unreliable test on travis

Uses dropna(subset=...) to delete where nan's over the columns supplied
Also doing this in the beginning so we do not have to duplicate this logic
@Twizzledrizzle
Copy link
Author

Made the code more readable now I think, I hope it looks better.

Regarding tests, are we talking tests like this? In test_graphics_others.py? (Looks like some other tests are done one the hist functionality there).

ax = _check_plot_works(df.hist, column='two', by='one', weights='three', bins=range(0, 10))

@Twizzledrizzle
Copy link
Author

Oh also, as a side, the new function df.plot.hist() will miss a lot of this functionality.

df.plot.hist() seem to have problems using the by- keyword. Also missing column keyword. I tried working with the code, but it was more complex than the one I have been editing. I could not get my head around making the by-keyword work.

Should I open a discussion around this perhaps in a separate thread?

subset_cols_drop_nan = []
if weights is not None:
if isinstance(weights, np.ndarray):
# weights supplied as an array instead of a part of the dataframe
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this will not work if weights is a >1 dimensional ndarray... Need to think on this

@jreback
Copy link
Contributor

jreback commented Dec 6, 2015

@TomAugspurger status of this?

@Twizzledrizzle
Copy link
Author

I am awaiting with interest what will happen with

#8018

before putting to much time in this pull request.

@jreback
Copy link
Contributor

jreback commented Jan 11, 2016

@TomAugspurger status of this

@jreback
Copy link
Contributor

jreback commented Mar 12, 2016

@TomAugspurger status?

@TomAugspurger
Copy link
Contributor

I think waiting on #8018, which will conflict with the case where by is not None here.

@jreback
Copy link
Contributor

jreback commented Sep 9, 2016

status?

@jreback
Copy link
Contributor

jreback commented Dec 21, 2016

@TomAugspurger can this be independent #8018 ?

@jreback
Copy link
Contributor

jreback commented Mar 20, 2017

@TomAugspurger ?

@jreback
Copy link
Contributor

jreback commented Apr 3, 2017

closing as stale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants