Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement RaggedArray.to_zarr() and RaggedArray.from_zarr()? #43

Closed
milancurcic opened this issue Oct 19, 2022 · 3 comments
Closed

Implement RaggedArray.to_zarr() and RaggedArray.from_zarr()? #43

milancurcic opened this issue Oct 19, 2022 · 3 comments
Labels
enhancement New feature or request

Comments

@milancurcic
Copy link
Member

From #40 by @selipot:

The latest version of ocean parcels now outputs in zarr format, see https://github.com/OceanParcels/parcels/releases/tag/v2.4.0. It is a priority to write a new recipe that takes such zarr output (still written as a sparse 2D array) into a RaggedArray. We also should add a functionality to output the RaggedArray to zarr with RaggedArray.to_zarr()

@milancurcic milancurcic added the enhancement New feature or request label Oct 19, 2022
@milancurcic
Copy link
Member Author

Currently RaggedArray serves as an intermediate data structure between dataset-specific formats and awkward.Array and xarray.Dataset structures.

xarray.Dataset already provides to_zarr() method, just like to_netcdf(). One question is, should RaggedArray provide wrappers such that RaggedArray.to_netcdf() (already implemented) wraps xarray.Dataset().to_netcdf(), and RaggedArray.to_zarr() (considered in this issue) wraps xarray.Dataset().to_zarr(). On first thought, I don't think so. Our users won't be doing work on the RaggedArray instance directly (or will they?), but either on the awkward.Array or the xarray.Dataset. Plus, our wrapper obscures additional functionality that the wrapped methods provide. Implementing a wrapper may seem like a convenience for the user, but it's one more thing that we need to teach them. And you wouldn't want to go from a Zarr dataset to a RaggedArray only to go back to the xarray.Dataset. There may be a use case if somebody wants to read an awkward array from a zarr dataset (zarr-developers/zarr-specs#62).

So, as I understand it, our current state is:

From Zarr data to RaggedArray via NetCDF file

from clouddrift.dataformat import RaggedArray
import xarray as xr
...
xr.open_zarr(path_to_zarr).to_netcdf(path_to_nc)
ra = RaggedArray.from_netcdf(path_to_nc)

We could introduce a class method RaggedArray.from_zarr() so that the intermediate step of writing to NetCDF is not necessary.

But, again, I think this would only be useful if somebody wants to work with an awkward array instead of an xarray Dataset.

From RaggedArray to Zarr data

This we already have:

from clouddrift.dataformat import RaggedArray
import xarray as xr
ra = RaggedArray.from_files(...)
ra.to_xarray().to_zarr(path_to_zarr)

As mentioned above, I'm not in favor for maintaining our own shallow wrappers for .to_fileformat(), but rather, providing a good documentation (recipes) on how to do those.

@milancurcic
Copy link
Member Author

After a discussion about this with @selipot on 10/20, we are not yet sure whether RaggedArray should provide 2-way adapters for file-formats, or if reading/writing from/to files should be deferred to the data structure libraries (i.e. awkward and xarray).

@milancurcic
Copy link
Member Author

Once we have #44, we'll be able to bypass the NetCDF file step for going from Zarr to RaggedArray:

ra = RaggedArray.from_xarray(xr.open_zarr(path_to_zarr))

@philippemiron philippemiron closed this as not planned Won't fix, can't repro, duplicate, stale Feb 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants