-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding support for Zarr datasets #70
Comments
Thanks for opening this @charlesbluca ! We definitely want to get this functionality into intake-stac. Some specific responses and other thoughts below. First, how is intake-STAC currently structured? We open the entire catalog (or collection) but each asset is mapped to an intake driver (many specified in the intake-xarray library). Your example asset has intake-stac/intake_stac/catalog.py Line 36 in 3b0b181
I think functionality will need to be coordinated with @martindurant's open PR in intake-xarray to use xr.open_dataset(engine='zarr')
Not sure this needs to be in the metadata, but this would be specified as a global fsspec/s3fs setting in your code.
In your current cataloging efforts is there any Collection or Item that points to a simple asset with a .zarr 'href' ? I think we'd want to come up with a solution that works for both
My understanding is that any STAC Item needs a bbox and datetime property, but a collection (your example) does not. I'm not up to speed on the STAC Zarr discussion, so hopefully @rabernat or @matthewhanson can clarify but the widest possible ranges seem reasonable to me |
Thanks for these clarifications! The current efforts don't have any I'm interested in the arguments provided within the type for some of the drivers; for example: intake-stac/intake_stac/catalog.py Line 25 in 3b0b181
Are things like As for the extent, I was basing the current use of the widest possible ranges on some notebooks @rabernat put together, so I think I'll go ahead with that method unless there's any problems later on. I'll certainly check out the open PR to intake-xarray. |
Yes. Just thinking of how we might get a PR started for this with a couple simple test files, something with consolidated metadata and another without. Note also, if the STAC Asset omits 'type' we currently make a guess based on URL suffix: intake-stac/intake_stac/catalog.py Lines 446 to 450 in 3b0b181
This seems like an interesting idea! I don't know much about media types, it seems there is lengthy discussion on allowable formatting for COG mimetypes here (and linked issues) radiantearth/stac-spec#251 Would be keen to hear @andersy005 and @wildintellect 's thoughts here as well! |
If you do end up using the contents type to discriminate, please link with intake/intake#494 , which is a similar idea, but not very fleshed out. |
Thanks for sharing the discussion on mimetypes going on within STAC spec - it would probably be good to get some perspective from the general STAC maintainers on what they consider allowable in terms of specifying consolidated versus non-consolidated Zarr (or if it even needs to be done if it can be guessed based on URL). @martindurant, do you think it would be worthwhile to make progress on the mimetype handling within Intake and use that to handle the functionality of Intake-STAC's initialization of a StacEntry? It seems like they are both accomplishing similar tasks. |
Maybe? It wasn't too clear how to continue there, since MIME is not a very good spec, but where we do have a good spec, as being discussed here, it would make sense to use a registry for dispatch. It does't make much difference whether this registry lives in intake or here until a broader set of types might be added. |
As part of Pangeo's general integration of STAC, we currently have a STAC Catalog roughly mirroring Pangeo's Intake catalogs, as well as support for rendering Zarr metadata with STAC Browser. Another major step forward with this integration would be adding support to load Zarr datasets through Intake-STAC.
What steps need to be taken to make something like this happen? At the moment, Zarr datasets are represented in STAC as Collections with a single asset - a link to the consolidated metadata file of the Zarr dataset, with a role of
zarr-consolidated-metadata
; an example of this here:Some random obstacles that come to mind:
The text was updated successfully, but these errors were encountered: