-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loading data from ManifestArrays without saving references to disk first #124
Comments
Thinking about this more, once zarr-python def to_zarr_array(self: ManifestArray) -> zarr.Array:
... This opens up some interesting possibilities. Currently when you call
The result would be that a user could actually treat a "virtual" xarray Dataset as a normal xarray Dataset, because if they tried to Then you could open any data format that virtualizarr understands via I still need to think through some of the details, but this could potentially be a neat alternative approach to pydata/xarray#9281, and not actually require any upstream changes to xarray! cc @d-v-b |
(One subtlety I'm not sure about here would be around indexes. I think you would probably want to have a solution for loading indexes as laid out in #18, and then have the indexes understand how they can be loaded.) |
Another subtlety to consider is when should the CF decoding happen? You would then have effectively done |
I am working on a feature in
virtualizarr
to read dmrpp metadata files and create a virtualxr.Dataset
containing manifest array's that can then be virtualized. This is the current workflow:However the chunk manifest, encoding, attrs, etc. is already in
mds
so is it possible to read data directly from this dataset? My understanding is that once the "chunk manifest" ZEP is approved and thezarr-python
reader inxarray
is updated this should be possible. Thexarray
reader forkerchunk
can accept a file or the reference json object directly fromkerchunk
SingleHdf5ToZarr
andMultiZarrToZarr
. So similarly can we extract the refs frommds
and pass it toxr.open_dataset()
directly?There probably still needs to be a function that extracts the refs so that xarray can make a new
Dataset
object with all the indexes, cf_time handling, andopen_dataset
checks.Even reading directly from the ManifestArray dataset is possible but not sure how the new dataset object with numpy arrays and indexes would be separate from the original dataset
The text was updated successfully, but these errors were encountered: