-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLN/API refactor drop and filter and deprecate select #6599
Conversation
Also, failing tests is to convert to string, basically before doing regex I do index.astype(str), if you have non-ascii this fails. |
@@ -1731,44 +1710,101 @@ def _reindex_axis(self, new_index, fill_method, axis, copy): | |||
else: | |||
return self._constructor(new_data).__finalize__(self) | |||
|
|||
def filter(self, items=None, like=None, regex=None, axis=None): | |||
def filter(self, labels=None, axis=None, inplace=False, regex=True, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may want to intercept previous arguments in kwargs and show a deprecation message? (and then just adjust the args and return)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, that's better than what I was doing (and hadn't set up the warning yet).
Also should regex be match or contains... I'm confused about difference between like/regex. (If it's contains can wrap in |
I think if not specified should be like startswith, e.g. most often you want give me all the columns which start with so |
@jreback should there be a way to distinguish between match and contains? Previously regex arg acted as contains and like arg as match (which is startswith). Was thinking of using regex arg, could hack it to take 'match' or 'contains' (match by default)... :s or maybe regex=None use match, regex=True use contains, regex=False don't use regex ? |
yes that makes sense
|
@jseabold what do you think of this so far? (I think we should put this on the mailng-list when Andy is ready).... |
atm I'm just False-ing numbers (rather than string-ifying them.. I tried to do that but got ascii errors, so not sure it's worth it, easy to do though). It doesn't raise now (it use string methods, although unfortunately these NaN for numbers so have to fill in after). Will check the perf. I can live with exact string args, gives more flexibility later. |
I'm pretty much on board AFAICT. I haven't followed too closely. I'd expect |
@jseabold you can always provide a regex if you want to (or alternately can have |
Well then I'm a bit confused about what exactly this is doing. |
@jreback perf is an issue, for two reasons:
@jseabold This PR makes the API the same for filter and drop, and do the same thing under the hood. You can pass drop a regex, a boolean function (to apply to index) or a list of labels to drop. |
hmmmm, unfortunately regex taking string for match/contains screws up backwards compat. Another arg? alternative soln is to use select over filter (and dep filter instead), but not the best reason for doing so... |
@hayd no I think you should always uses a compiled regex (or isin/== if you can)...that's the issue I think you can just compute the indexer and do an iloc it should work, just construct the tuple say
|
@hayd what do you mean by:
|
Previously regex was an kwarg for filter. In my code I was doing:
if regex is allowed to be a string (e.g. 'match') I can't do that, and of course this will break old code if they did happen to be filtering by the regex 'match' (unlikely!) |
@hayd I believe this should be true
(and I think that that drop should be implemented in terms of filter) |
@jreback yup, that's exactly what the Was thinking about this, and not sure regex should be name of flag to decide how regex is applied, as you may want to do contains without using regex. If that makes sense... |
hmm regex is what replace uses |
replace doesn't offer a way to tweak the way regex works (i.e. match vs contains), you either use regex or you don't. IMO regex should be bool (a la everywhere else) and have another argument to distinguish between contains vs match vs whatever ?
|
I'm still just not sure I understand the issue here. It seems to me (and tell me to shut up, if you get my point already, or this is impossible for backwards compat reasons) that regex should everywhere be search. match is just a way to avoid writing '^'. But since you can just write '^', you shouldn't bend over backwards to provide some weird combinations of keywords that do match and some that do search. AFAICT, you can do everything and more with search (via re.MULTILINE flag) than you can do with match. I always thought it was a mistake (and accident of history) that match was targeted by the old regex functions. Make a one-line note in the docs "if you want to do
|
*old regex functions are those vectorized .str methods that Wes added in an afternoon. |
Yeah, I think contains (aka search?) should be the default, you can always prepend ^ if you so wish, I'll add it as notes in the docstring (and for legacy like kwarg will still do that). regex being bool makes it easier. |
I guess the issue is: that will make it a bigger API change... fixed by making regex=False the default. |
Added a bunch of comments :-) Sorry to sound a bit harsh (and sorry that this comes maybe a bit late, but I thougth Andy had to update it so didn't look at it yet), but I don't think this is ready to merge. If we break backwards compatibility in these functions to clean up the API, we have one shot: we should do it once, and do it good. And I have the impression we should take a bit more care here. My main concerns:
Other todo's:
There was also said above to maybe put this on mailing list once we had a proposal for the API. @jreback I know you want to get this in as we want to release 0.14, but I think this is important enough to not do this in a hurry just to get it in. And maybe my concerns are easily answered, and not much should be adapted. And @hayd, maybe not that clear from all my comments, but really thanks for you work on this! |
ok, I think let's defer this to 0.14.1 (it should be back-compat so no biggie to putting in a minor release). as @jorisvandenbossche brings up some good points...and don't want to throw this in w/o some more discussion / consideration |
@hayd are we pushing this to 0.15 as its an API change? can you update what the state-of-the-art with this is |
I agree with @jorisvandenbossche on this, I had thought I had this a lot cleaner. I will fix up with the earlier suggestions (it's actually not that many, I just need to slog thorough it) and we can merge it soon after 0.14.1. |
@hayd let's revist! |
revisiting this evening, rebase + cleanup. |
@hayd can we resurrect this? (and put the API changes in the top section for comment) |
@hayd ???? |
1 similar comment
@hayd ???? |
closing as stale. But this would be a much needed refactor. |
closes #4818
closes #6189
xref #8530
xref #8594