Skip to content

[Feature request] Add a "filter" method to Index and Series #27439

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jolespin opened this issue Jul 17, 2019 · 4 comments
Closed

[Feature request] Add a "filter" method to Index and Series #27439

jolespin opened this issue Jul 17, 2019 · 4 comments

Comments

@jolespin
Copy link

Code Sample, a copy-pastable example if possible

# Function
def _filter(self, func):
    return self[list(map(func, self))]

# Set the attribute
setattr(pd.Index, "filter", _filter)
index = pd.Index(["iris_1", "notiris_1", "iris_2", "other_thing"])

# New way
func = lambda x:x.startswith("iris")
print("New way:", index.filter(func))

# Old way
print("Old way:", index[index.map(func)])

# New way: Index(['iris_1', 'iris_2'], dtype='object')
# Old way: Index(['iris_1', 'iris_2'], dtype='object')

# Set the attribute
setattr(pd.Series, "filter", _filter)
series = pd.Series(list("imagine_more_complex_stuff_here_instead"))

# New way
func = lambda x:x in ["o", "_"]
print("New way:", series.filter(func))

# Old way
print("Old way:", series[series.map(func)])

# New way: 7     _
# 9     o
# 12    _
# 14    o
# 20    _
# 26    _
# 31    _
# dtype: object
# Old way: 7     _
# 9     o
# 12    _
# 14    o
# 20    _
# 26    _
# 31    _
# dtype: object

Problem description

II always find myself writing really verbose code to filter my pandas indices or series. I love the map method and would like to extend this concept to include filter.

@jreback
Copy link
Contributor

jreback commented Jul 17, 2019

.loc can already accept a callable

how is this different?

@jolespin
Copy link
Author

Am I using the callabale feature incorrectly?

series.loc[func]
# ---------------------------------------------------------------------------
# ValueError                                Traceback (most recent call last)
# <ipython-input-2-da6f788021a1> in <module>
# ----> 1 series.loc[func]

# ~/anaconda/envs/µ_env/lib/python3.6/site-packages/pandas/core/indexing.py in __getitem__(self, key)
#    1497             axis = self.axis or 0
#    1498 
# -> 1499             maybe_callable = com.apply_if_callable(key, self.obj)
#    1500             return self._getitem_axis(maybe_callable, axis=axis)
#    1501 

# ~/anaconda/envs/µ_env/lib/python3.6/site-packages/pandas/core/common.py in apply_if_callable(maybe_callable, obj, **kwargs)
#     327 
#     328     if callable(maybe_callable):
# --> 329         return maybe_callable(obj, **kwargs)
#     330 
#     331     return maybe_callable

# <ipython-input-1-308c816fe0c4> in <lambda>(x)
#       8 
#       9 # New way
# ---> 10 func = lambda x:x in ["o", "_"]
#      11 print("New way:", series.filter(func))
#      12 

# ~/anaconda/envs/µ_env/lib/python3.6/site-packages/pandas/core/generic.py in __nonzero__(self)
#    1476         raise ValueError("The truth value of a {0} is ambiguous. "
#    1477                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
# -> 1478                          .format(self.__class__.__name__))
#    1479 
#    1480     __bool__ = __nonzero__

# ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
# ​

@simonjayhawkins
Copy link
Member

Am I using the callabale feature incorrectly?

from https://pandas.pydata.org/docs/reference/api/pandas.Series.loc.html

A callable function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above)


II always find myself writing really verbose code to filter my pandas indices or series.

to use something like lambda x:x in ["o", "_"] to operate row-wise, you could inline the function in a listcomp instead of using series[series.map(lambda x:x in ["o", "_"])]) to make it easier to read, even if still verbose.

>>> series[[x in ["o", "_"] for x in series]]
7     _
9     o
12    _
14    o
20    _
26    _
31    _
dtype: object

I love the map method and would like to extend this concept to include filter.

consistency would be ideal but unfortunately Series already has a .filter method, https://pandas.pydata.org/docs/reference/api/pandas.Series.filter.html (which operates on labels) so would need a more detailed proposal of the api

see also #26642 and linked issues for discussion on current Series.filter functionality

@mroeschke
Copy link
Member

Thanks for the suggestion but with Series.filter already existing and the notion of pandas potentially moving away from filter in #26642 it doesn't seem likely that this will be implemented. Closing but good to continue discussion in #26642

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants