Supporting xarray.apply_ufunc #67

TomNicholas · 2022-07-18T22:23:13Z

A very large proportion of xarray's code goes through xarray.core.computation.apply_ufunc, which is also exposed publicly for advanced users to wrap their own ufuncs.

Internally if a dask array is present then we call dask.array.apply_gufunc - perhaps cubed should expose a similar function in the same way that it exposes map_blocks etc.?

xref pydata/xarray#6807

The text was updated successfully, but these errors were encountered:

TomNicholas · 2022-07-18T22:24:50Z

Another question I had is about the fact that xarray.apply_ufunc allows users to execute arbitrary code inside their custom ufunc (we have one in a group project that calls out to some fortran!). Does that break the assumptions behind the whole memory predictability thing?

tomwhite · 2022-07-19T09:47:00Z

perhaps cubed should expose a similar function in the same way that it exposes map_blocks

Yes, I think it would be possible to write a apply_gufunc in cubed. It's not a part of the array API so it would live somewhere else in the namespace (probably the top-level like map_blocks).

Another question I had is about the fact that xarray.apply_ufunc allows users to execute arbitrary code inside their custom ufunc (we have one in a group project that calls out to some fortran!). Does that break the assumptions behind the whole memory predictability thing?

To some extent it does, but there are things we could do.

We could ask users to provide an (estimate) of the extra memory required, like map_direct already does (although that function is not publicly exposed at the moment).

I've created some notebooks that help judge if the memory estimates are working in practice, see https://github.com/tomwhite/cubed/blob/main/examples/lithops-add-random-local.ipynb for example. So there's some tooling like this we could provide to make it easier to provide and check memory estimates.

TomNicholas · 2022-07-19T17:26:08Z

Sounds good.

I would be interested in trying to implement apply_gufunc in cubed so I can learn more about the internals, but I probably won't have any time to work on it for the next 2 weeks, so if you have added it already by then then no worries!

tomwhite · 2022-07-19T20:32:12Z

I would be interested in trying to implement apply_gufunc in cubed so I can learn more about the internals, but I probably won't have any time to work on it for the next 2 weeks, so if you have added it already by then then no worries!

That would be great!

tomwhite · 2022-07-20T12:02:23Z

A couple of thoughts on implementation.

It should be possible to follow Dask's implementation, at least for the case where a single output array is returned, which can use blockwise. That's probably the thing to get working first.

For the multiple output array case, we could use Zarr structured arrays, like we do already for the implementation of mean. Related: #69

This should live in the core package. It really belongs in core.ops, but that is getting very big, so we should probably break it into smaller modules. But don't worry about that too much - we can reorganise as needed once something is working.

tomwhite · 2023-02-21T09:40:05Z

This has been implemented in #149 and #151, which should be enough to enable support in xarray, so I'm going to close this issue now. The multiple outputs case is being tracked by #152.

tomwhite linked a pull request Sep 12, 2022 that will close this issue

[WIP] apply_gufunc #119

Closed

TomNicholas mentioned this issue Sep 19, 2022

[Fail case] Almost-blockwise weighted arithmetic vorticity calculation pangeo-data/distributed-array-examples#1

Open

tomwhite mentioned this issue Oct 31, 2022

Pangeo TEM example #145

Closed

This was referenced Feb 17, 2023

Add apply_gufunc #149

Merged

Support multiple outputs in apply_gufunc #152

Open

tomwhite closed this as completed Feb 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supporting xarray.apply_ufunc #67

Supporting xarray.apply_ufunc #67

TomNicholas commented Jul 18, 2022

TomNicholas commented Jul 18, 2022 •

edited

Loading

tomwhite commented Jul 19, 2022

TomNicholas commented Jul 19, 2022

tomwhite commented Jul 19, 2022

tomwhite commented Jul 20, 2022

tomwhite commented Feb 21, 2023

Supporting xarray.apply_ufunc #67

Supporting xarray.apply_ufunc #67

Comments

TomNicholas commented Jul 18, 2022

TomNicholas commented Jul 18, 2022 • edited Loading

tomwhite commented Jul 19, 2022

TomNicholas commented Jul 19, 2022

tomwhite commented Jul 19, 2022

tomwhite commented Jul 20, 2022

tomwhite commented Feb 21, 2023

TomNicholas commented Jul 18, 2022 •

edited

Loading