Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decorators for registering custom accessors in xarray #806

Merged
merged 2 commits into from
May 13, 2016

Conversation

shoyer
Copy link
Member

@shoyer shoyer commented Mar 28, 2016

Fixes #706

New (experimental) decorators xarray.register_dataset_accessor and xarray.register_dataarray_accessor for registering custom xarray extensions without subclassing. They are described in the new documentation page on internals.

CC @rafa-guedes @rabernat @fmaussion @khaeru @ajdawson -- as people who might use such an interface, it would be great to get some feedback about how well this would work for you.

@rabernat
Copy link
Contributor

👍 very cool

@dopplershift
Copy link
Contributor

👍 Very nice.

@khaeru
Copy link

khaeru commented Mar 28, 2016

Of the two different projects I'm working (sporadically) on that both subclass Dataset, it seems like one (pyGDX) should more properly be a backend, while the other could work as an accessor. This code looks good!

Just to be clear—xarray_obj is passed to the __init__() method of an accessor. Will this happen before, or after Dataset.__init__()/DataArray.__init__() is invoked?

============

.. autosummary::
:toctree: generate/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be generated ?

@fmaussion
Copy link
Member

Yes, this looks very good, thanks!

@khaeru : I had the same question and just tested it:

import xarray as xr
import numpy as np

@xr.register_dataset_accessor('geo')
class GeoAccessor(object):
    def __init__(self, xarray_obj):
        print('In constructor')
        self._obj = xarray_obj

    @property
    def center(self):
        # return the geographic center point of this dataset
        lon = self._obj.latitude
        lat = self._obj.longitude
        return (float(lon.mean()), float(lat.mean()))

    def plot(self):
        print('plot data on a map')  # e.g., using Cartopy

# Nothing is printed
ds = xr.Dataset({'longitude': np.linspace(0, 10),
                  'latitude': np.linspace(0, 20)})

# out: In constructor
out = ds.geo.center

# out: In constructor
#      plot data on a map
ds.geo.plot()

So the constructor is called only if the property is accessed, which is good. the drawback is that a new object is instantiated at each access.

@fmaussion
Copy link
Member

Oh, I realize that this doesn't answer your question though 😳

@shoyer
Copy link
Member Author

shoyer commented Mar 28, 2016

Just to be clear—xarray_obj is passed to the init() method of an accessor. Will this happen before, or after Dataset.init()/DataArray.init() is invoked?

This will happen after Dataset.init/DataArray.init. For cases where you want to do custom initialization, the suggestion (which I should add) is to simply write your own function to use in place of xarray.open_dataset.

It's probably worth noting in the documentation that this is equivalent to just adding a property, e.g.,

class Dataset:
    ...
    @property
    def geo(self)
        return GeoAccessor(self)

@fmaussion
Copy link
Member

Thanks for the clarification. Would it make sense to make a @lazy_property out of it in oder to avoid multiple builds?

@khaeru
Copy link

khaeru commented Mar 28, 2016

@fmaussion that's still helpful, thanks.

For cases where you want to do custom initialization, the suggestion (which I should add) is to simply write your own function to use in place of xarray.open_dataset.

Now that I think of it, it should also be possible to use some other in logic in __init__()—or even as a kludge store something like xarray_obj.attrs['_geoaccessor_state']—to determine whether the object is already, or needs to be, "initialized" (whatever that happens to mean for each accessor).

For instance, if the accessor creates and uses certain variables in a Dataset, it could check for their presence, and skip any initialization code if they already exist.

@rafa-guedes
Copy link
Contributor

👍 nice one

Fixes GH706

New (experimental) decorators `xarray.register_dataset_accessor` and
`xarray.register_dataarray_accessor` for registering custom xarray extensions
without subclassing. They are described in the new documentation page on
`internals`.
@shoyer shoyer force-pushed the register-accessor branch 2 times, most recently from c52efb0 to 3489fd0 Compare May 10, 2016 21:46
@shoyer
Copy link
Member Author

shoyer commented May 10, 2016

Finally got around to updating this.

I added caching of these properties per @fmaussion's suggestion. Caching is a little error prone (I wrote a custom descriptor), but I think test cases should cover it pretty well.

@shoyer shoyer force-pushed the register-accessor branch from 3489fd0 to 2396aaa Compare May 10, 2016 21:47
@jhamman
Copy link
Member

jhamman commented May 11, 2016

This looks good to me. Anyone else have any final comments?


The intent here is that libraries that extend xarray could add such an accessor
to implement subclass specific functionality rather than using actual subclasses
rather patching in a large number of domain specific methods.
Copy link
Member

@fmaussion fmaussion May 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't something wrong in this sentence?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@fmaussion
Copy link
Member

Apart for the doc typo I mentioned, this looks very good, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants