Skip to content

BUG: value_counts / shift not in groupby dispatch whitelist (0.12-dev) #5480

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gdraps opened this issue Nov 10, 2013 · 19 comments · Fixed by #5604
Closed

BUG: value_counts / shift not in groupby dispatch whitelist (0.12-dev) #5480

gdraps opened this issue Nov 10, 2013 · 19 comments · Fixed by #5604
Labels
API Design Error Reporting Incorrect or improved errors from pandas Groupby
Milestone

Comments

@gdraps
Copy link
Contributor

gdraps commented Nov 10, 2013

Copying comment from #4887 to a new issue for discussion purposes:

While testing pandas-master, hit a method not in the new groupby dispatch whitelist: value_counts (on a SeriesGroupBy object). Another possible addition: shift, as used in this SO answer.

Tried to generate a list of blacklisted methods for DataFrame and Series, see below -- needs further filtering, but may reveal useful blocked methods. any thoughts on the remaining methods?

In [18]: sorted([x for x in set(dir(pd.DataFrame)) - pd.core.groupby._apply_whitelist - set(dir(pd.core.groupby.GroupBy)) if not x.startswith(('_', 'T', 'to', 'from', 'as'))])
Out[18]: ['abs', 'add', 'add_prefix', 'add_suffix', 'align', 'all', 'any', 'append', 'applymap', 'at', 'at_time', 'axes', 'between_time', 'bfill', 'blocks', 'bool', 'clip', 'clip_lower', 'clip_upper', 'columns', 'combine', 'combineAdd', 'combineMult', 'combine_first', 'compound', 'consolidate', 'convert_objects', 'copy', 'corr', 'corrwith', 'cov', 'delevel', 'diff', 'div', 'divide', 'dot', 'drop', 'drop_duplicates', 'dropna', 'dtypes', 'duplicated', 'empty', 'eq', 'eval', 'ffill', 'filter', 'first_valid_index', 'floordiv', 'ftypes', 'ge', 'get', 'get_dtype_counts', 'get_ftype_counts', 'get_value', 'get_values', 'groupby', 'gt', 'iat', 'icol', 'idxmax', 'idxmin', 'iget_value', 'iloc', 'index', 'info', 'insert', 'interpolate', 'irow', 'isin', 'isnull', 'iteritems', 'iterkv', 'iterrows', 'itertuples', 'ix', 'join', 'keys', 'kurt', 'kurtosis', 'last_valid_index', 'le', 'load', 'loc', 'lookup', 'lt', 'mad', 'mask', 'merge', 'mod', 'mode', 'mul', 'multiply', 'ndim', 'ne', 'notnull', 'pct_change', 'pivot', 'pivot_table', 'pop', 'pow', 'product', 'query', 'radd', 'rdiv', 'reindex', 'reindex_axis', 'reindex_like', 'rename', 'rename_axis', 'reorder_levels', 'replace', 'reset_index', 'rfloordiv', 'rmod', 'rmul', 'rpow', 'rsub', 'rtruediv', 'save', 'select', 'set_index', 'set_value', 'shape', 'shift', 'skew', 'sort', 'sort_index', 'sortlevel', 'squeeze', 'stack', 'sub', 'subtract', 'swapaxes', 'swaplevel', 'take', 'transpose', 'truediv', 'truncate', 'tshift', 'tz_convert', 'tz_localize', 'unstack', 'update', 'values', 'where', 'xs']

In [19]: sorted([x for x in set(dir(pd.Series)) - pd.core.groupby._apply_whitelist - set(dir(pd.core.groupby.GroupBy)) if not x.startswith(('_', 'T', 'to', 'from', 'as'))]) 
Out[19]: ['abs', 'add', 'add_prefix', 'add_suffix', 'align', 'all', 'any', 'append', 'argmax', 'argmin', 'argsort', 'at', 'at_time', 'autocorr', 'axes', 'base', 'between', 'between_time', 'bfill', 'blocks', 'bool', 'clip', 'clip_lower', 'clip_upper', 'combine', 'combine_first', 'compound', 'consolidate', 'convert_objects', 'copy', 'corr', 'cov', 'data', 'diff', 'div', 'divide', 'dot', 'drop', 'drop_duplicates', 'dropna', 'duplicated', 'empty', 'eq', 'ffill', 'filter', 'first_valid_index', 'flags', 'floordiv', 'ftype', 'ge', 'get', 'get_dtype_counts', 'get_ftype_counts', 'get_value', 'get_values', 'groupby', 'gt', 'iat', 'idxmax', 'idxmin', 'iget', 'iget_value', 'iloc', 'imag', 'index', 'interpolate', 'irow', 'is_time_series', 'isin', 'isnull', 'item', 'iteritems', 'iterkv', 'ix', 'keys', 'kurt', 'kurtosis', 'last_valid_index', 'le', 'load', 'loc', 'lt', 'mad', 'map', 'mask', 'mod', 'mode', 'mul', 'multiply', 'ndim', 'ne', 'nonzero', 'notnull', 'nunique', 'order', 'pct_change', 'pop', 'pow', 'product', 'ptp', 'put', 'radd', 'ravel', 'rdiv', 'real', 'reindex', 'reindex_axis', 'reindex_like', 'rename', 'rename_axis', 'reorder_levels', 'repeat', 'replace', 'reset_index', 'reshape', 'rfloordiv', 'rmod', 'rmul', 'round', 'rpow', 'rsub', 'rtruediv', 'save', 'select', 'set_value', 'shape', 'shift', 'skew', 'sort', 'sort_index', 'sortlevel', 'squeeze', 'str', 'strides', 'sub', 'subtract', 'swapaxes', 'swaplevel', 'take', 'transpose', 'truediv', 'truncate', 'tshift', 'tz_convert', 'tz_localize', 'unique', 'unstack', 'update', 'valid', 'value_counts', 'values', 'view', 'weekday', 'where', 'xs']
@jtratner
Copy link
Contributor

This seems like a good idea - would be nice to get a pull-request with tests that exercise the methods that we're readding.

@jreback
Copy link
Contributor

jreback commented Nov 10, 2013

you can of course use apply and use whatever methods u want

this magic of operating on the groupby is only for convience (and possibly cythonized methods)

@gdraps
Copy link
Contributor Author

gdraps commented Nov 10, 2013

That's fair and the exception is clear about using apply instead. Just wanted to file this as it broke existing code without a deprecation warning.

@jtratner
Copy link
Contributor

@jreback wouldn't be too hard to set up a DeprecationWarning for this...

   def _make_wrapper(self, name):
        if name not in _apply_whitelist:
            is_callable = callable(getattr(self.obj, name, None))
            kind = ' callable ' if is_callable else ' '
            warn("access to{0}attribute {1!r} of {2!r} objects is deprecated"
                 " and will be removed in a future version, try using the"
                 "'apply' method".format(kind, name, type(self).__name__))

@jtratner
Copy link
Contributor

@jreback I think we need to do the deprecation first, otherwise people are going to get confused. Are you okay with that?

@jreback
Copy link
Contributor

jreback commented Nov 11, 2013

sure.. (though i would add shift/value_counts to whitelist as well).

only thing with deprecation - i think it needs to be dynamic to avoid having the tab completion (the dir find those methods)

@jtratner
Copy link
Contributor

We could keep the whitelist for dir completions and then use the warning if
something isn't in the whitelist, right?

@jreback
Copy link
Contributor

jreback commented Nov 11, 2013

irc their is some kind of message their now.?

@jtratner
Copy link
Contributor

It's an error though right?

On Mon, Nov 11, 2013 at 5:19 PM, jreback notifications@github.com wrote:

irc their is some kind of message their now.?


Reply to this email directly or view it on GitHubhttps://github.com//issues/5480#issuecomment-28245457
.

@jreback
Copy link
Contributor

jreback commented Nov 11, 2013

i am confused now

that error message is already there

i would just add shift and value_counts. not big on other methoods as they should be in the inner apply anyhow (their may be some more exception)....but I think its beter to err on the side of a limited whitelist

@jreback
Copy link
Contributor

jreback commented Nov 11, 2013

its an AttributeError which is correct, not a Deprecation.....

its more explicit to use apply

@jtratner
Copy link
Contributor

I think @gdraps brought this up not only for value_counts/shift, but also because there are probably a few people who relied on some part of this and will be surprised their existing code doesn't work (even if it was buggy/wrong - @gdraps please correct me if I've mischaracterized). We generally deprecate everything before removing it, seems like that's relevant here too.

@jreback
Copy link
Contributor

jreback commented Nov 11, 2013

ok with the deprecation

but u then may need a blacklist that raises for sure - eg to_csv and such

@jtratner
Copy link
Contributor

Okay, @gdraps, do you have time to put this together?

@gdraps
Copy link
Contributor Author

gdraps commented Nov 14, 2013

I took a stab at reviewing the full list of methods and would propose the following be added to the whitelist:

'value_counts',
'mad', 'median',
'any', 'all',
'irow', 'take',
'shift', 'tshift',
'ffill', 'bfill',
'pct_change', 'skew',

As far as the blacklist goes, am thinking all to_* methods:

'to_clipboard', 'to_csv', 'to_dense', 'to_dict', 'to_excel', 'to_frame',
'to_gbq', 'to_hdf', 'to_html', 'to_json', 'to_latex', 'to_msgpack',
'to_panel', 'to_period', 'to_pickle', 'to_records', 'to_sparse', 'to_sql',
'to_stata', 'to_string', 'to_timestamp', 'to_wide',

When's the target for 0.13? I can put a PR later this week.

@jtratner
Copy link
Contributor

Target is any day now. I'll put up a quick PR to enable the ones you listed.

@jreback
Copy link
Contributor

jreback commented Nov 14, 2013

median isalready a defined method

@jreback
Copy link
Contributor

jreback commented Nov 14, 2013

what i mean is tht the whitelist should not include methods which are explicity defined on a groupby (as they are already allowed)

@jtratner
Copy link
Contributor

Yep, that's what I understood, I took off median.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Error Reporting Incorrect or improved errors from pandas Groupby
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants