-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New accessor API #26710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@pandas.Series.extend(namespace='emoji')
def is_monkey(data):
"""
This would also create `Series().emoji.is_monkey`
"""
return data.isin(['🙈', '🙉', '🙊']) The second option (replicated above) seems like a logical one IMO. No overhead of OOP. |
To be clear, what I'm proposing is:
|
That's fair, though I think we should encourage functional over OOP.
Right |
Agree, as far as the class doesn't add value we should encourage using a function, but there will be cases where a class is useful, for example: @pandas.Series.extend
class Emoji:
def __init__(self, data):
self.data = data
def is_monkey(self):
return self.data.isin(['🙈', '🙉', '🙊'])
def is_cat(self):
return '😺' < self.data < '😾'
def is_animal(self):
return self.is_monkey() | self.is_cat() |
Hmmm...that's a good point. Not sure right now how we could compose in the functional version, though that would be quite useful. |
what is the reason for this? is there some notion that things are 'hard' to extend? is that actually a bad thing? these are generally only for other libraries and NOT for users.
better to actually have these projects use an official api. if they want to do something ad-hoc that is up to them. |
Two nits to pick:
|
Good points @jbrockmendel, I was a bit unsure about @jreback I used those libraries as example, but they are not the point. I think it's about code readability, of third-party libraries, pandas itself, and users of pandas. Adding methods to For pandas, an example where this could be useful is For third-party packages and users code, I agree that they should use an official API. If they do, we can warn them when they overwrite an attribute, we can keep track of registered methods... We do all that for accessors, but without providing a way to register methods directly, they use Not sure what's the drawback here. I see a lot of potential on better code organisation of pandas, more modularity, and more scalability. And may be we'd give up in something by implementing this, but I don't see what. |
I don't love encouraging users to monkey-patch methods directly onto I like the class method, though. |
Agreed with Stephan. If people want to monkey patch directly, they're
welcome to just do that without our help :)
…On Fri, Jun 7, 2019 at 4:42 PM Stephan Hoyer ***@***.***> wrote:
I don't love encouraging users to monkey-patch methods directly onto
pandas.Series. I guess the argument is that people do it anyways, but
that feels like an anti-pattern to me.
I like the class method, though. pandas.Series.extension could be a good
name.
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub
<#26710?email_source=notifications&email_token=AAKAOISEBOHVB3KHS6U7PILPZLI47A5CNFSM4HVZETP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXHCFPI#issuecomment-500048573>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOIUQI3HE4FYLRP3PZ53PZLI47ANCNFSM4HVZETPQ>
.
|
These examples look like more of a case of not knowing about pandas' accessor API. They already use a prefix for their special methods, so they might as well use a namespace:
|
I like to see the question as the same as Python with the standard library. Python was design "batteries included" with lots of modules. But also with an standard and easy way for developers to implement an ecosystem of modules around it. That while not included with Python and not maintained by the Python core devs, work exactly the same as the ones in the standard library. Once a module is installed, the difference between a module of the standard library and a third-party module is minimal. And I guess we are all happy and all benefited from this design. In opposition, pandas is designed as a single piece, with an increasing integration with the ecosystem, but still with a clear distinction on what we provide, and third-party packages. To me, conceptually,
Personally, I think the modular design of Python or Django (which also follows the same model) worked really well for them. And I think the steps in the extension arrays, to create a single interface, no matter if it's the core numpy, the other we provide, or third parties also are simplifying things for us. I see this as moving in the same direction for
I think the proposal here is a good first step to move in this direction, and I don't see any drawback. |
@datapythonista your points are pretty general, not objectionable but orthogonal to the issues at hand how does your proposal I am also -1 on patching directly to the main namespace as this very very confusing how does a shorter accessor api actually help here? it is library accessible a crucial difference |
For me the key issue of this proposal is being able to register methods directly in I think being able to register methods directly does advance the current state significantly. For example, I could register I understand your point about patching the main pandas classes, but |
Something I have been thinking about, not the same but certainly related (quickly going to put it here before I am away for the weekend): that the data type can decide which methods are availabe on a Series. Like we now have the |
@datapythonista I think this merits its own discussion. Framing the issue in terms of decoupling will make it appealing. |
Thanks @jbrockmendel that makes sense. I thought this would be non-controversial besides naming things, and once implemented would allow to have the discussion over a simple prototype PR, which would make things less abstract. Will see what I can do, but I really thing a more modular code base for pandas would be extremely beneficial, so will open the discussion again once I can present my ideas in a more clear way. |
Currently, to extend pandas
Series
,DataFrame
andIndex
with user-defined methods, we use accessors in the next way:While this works well, I think there are two problems with this approach:
pandas.api.extensions.register_series_accessor
is too long and lives inpandas.api
, separate of functionality most users know.Series().is_monkey
instead ofSeries().emoji.is_monkey
)I think all the projects extending pandas I've seen, simply "inject" the methods (except the ones implemented by pandas maintainers). For example:
What I propose is to have a easier/simpler API for the user. To be specific, this is the syntax I'd like when extending
Series
...This would make things much easier for the user, because:
pandas.Series.extend
is much easier to rememberSeries
... can be createdCC: @pandas-dev/pandas-core
The text was updated successfully, but these errors were encountered: