Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load HDF-Files with odc.stac.load #114

Open
muelj12 opened this issue May 8, 2023 · 10 comments
Open

Load HDF-Files with odc.stac.load #114

muelj12 opened this issue May 8, 2023 · 10 comments

Comments

@muelj12
Copy link

muelj12 commented May 8, 2023

Hello together,

i currently try to load HDF files from a not published STAC-Collection (so i can't show you the collection) with odc.stac.load and receive a error message:

MODIS_13Q1_061_sampel = stac_load( stac_items)

`ValueError: not enough values to unpack (expected at least 1, got 0)`

So my question is: can STAC items that are available as HDF files be loaded with odc.stac.load at all? If yes, is there an example?

@Kirill888
Copy link
Member

@muelj12 please share full stack trace and version of odc-stac and pystac libraries in your environment.

If you can, please share stac item json as well.

This is probably a dupe of #107, but there could also be issues with odc-stac not seeing those assets as data, hard to say anything without input data and error traces.

@muelj12
Copy link
Author

muelj12 commented May 9, 2023

@Kirill888
Here are the versions of the libraries in my environment:

  • odc-stac==0.3.5
  • pystac==1.7.2
  • pystac-client==0.6.1

And here the full stack trace:

MODIS_13Q1_061_sampel = stac_load(stac_items)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-10-80d9d806a4a1> in <module>
----> 1 MODIS_13Q1_061_sampel = stac_load(stac_items)
      2 
      3 print ("Data is extracted on the server and downloaded to local storage.")

~/.local/lib/python3.8/site-packages/odc/stac/_load.py in load(items, bands, groupby, resampling, dtype, chunks, pool, crs, resolution, anchor, geobox, bbox, lon, lat, x, y, like, geopolygon, progress, fail_on_error, stac_cfg, patch_url, preserve_original_order, **kw)
    472 
    473     items = list(items)
--> 474     _parsed = list(parse_items(items, cfg=stac_cfg))
    475 
    476     gbox = output_geobox(

~/.local/lib/python3.8/site-packages/odc/stac/_mdtools.py in parse_items(items, cfg)
    679             proc_cache[collection_id] = proc
    680 
--> 681         proc.update(item)
    682         yield parse_item(item, proc.md)
    683 

~/.local/lib/python3.8/site-packages/odc/stac/_mdtools.py in update(self, item)
    536         # pylint: disable=too-many-locals,too-many-branches
    537         if self.md is None:
--> 538             self._bootstrap(item)
    539             return
    540 

~/.local/lib/python3.8/site-packages/odc/stac/_mdtools.py in _bootstrap(self, item)
    523             _, band2grid = compute_eo3_grids(data_bands)
    524         else:
--> 525             band2grid = band2grid_from_gsd(data_bands)
    526 
    527         self.md = RasterCollectionMetadata(

~/.local/lib/python3.8/site-packages/odc/stac/_mdtools.py in band2grid_from_gsd(assets)
    353     # Default grid is one with largest number of bands
    354     # .. and lowest gsd when there is a tie
--> 355     (_, default_gsd), *_ = sorted((-len(bands), gsd) for gsd, bands in grids.items())
    356     band2grid = {}
    357     for gsd, bands in grids.items():

ValueError: not enough values to unpack (expected at least 1, got 0)

I also tried this call Modis_sampel = stac_load(stac_items_modis, bands = "hdf") but received the same error message.
I also saw that u do not include .hdf files here odc-stac/odc/stac/_mdtools.py
RASTER_FILE_EXTENSIONS = {"tif", "tiff", "jpeg", "jpg", "jp2", "img"}

I know it is a special use case, but we are facing storage problems and don't want to create COG-tiffs from the hdf file right now.
And this is the json file:

{
  "id": "MOD13Q1.A2022145.h35v10.061.2022168103104",
  "bbox": [
    172.479315908761,
    -19.180676222654,
    -179.856407114504,
    -9.97534112170732
  ],
  "type": "Feature",
  "links": [
    {
      "rel": "collection",
      "type": "application/json",
      "href": "-MyURL-/api/collections/modis-13q1-061"
    },
    {
      "rel": "parent",
      "type": "application/json",
      "href": "-MyURL-/api/collections/modis-13q1-061"
    },
    {
      "rel": "root",
      "type": "application/json",
      "href": "-MyURL-/api/"
    },
    {
      "rel": "self",
      "type": "application/geo+json",
      "href": "-MyURL-/api/collections/modis-13q1-061/items/MOD13Q1.A2022145.h35v10.061.2022168103104"
    }
  ],
  "assets": {
    "hdf": {
      "href": "-MyFilepath-/MOD13Q1.A2022145.h35v10.061.2022168103104.hdf",
      "type": "application/x-hdf",
      "roles": [
        "data"
      ],
      "title": "Source data containing all bands"
    },
    "metadata": {
      "href": "-MyFilepath-/MOD13Q1.A2022145.h35v10.061.2022168103104.hdf.xml",
      "type": "application/xml",
      "roles": [
        "metadata"
      ],
      "title": "Federal Geographic Data Committee (FGDC) Metadata"
    }
  },
  "geometry": {
    "type": "MultiPolygon",
    "coordinates": [
      [
        [
          [
            180,
            -19.18040533808951
          ],
          [
            180,
            -10.007120826311045
          ],
          [
            179.999502554976,
            -9.97534112170732
          ],
          [
            172.622770159185,
            -9.98364248250805
          ],
          [
            172.479315908761,
            -19.1662177463598
          ],
          [
            180,
            -19.18040533808951
          ]
        ]
      ],
      [
        [
          [
            -180,
            -10.007120826311045
          ],
          [
            -180,
            -19.18040533808951
          ],
          [
            -179.856407114504,
            -19.180676222654
          ],
          [
            -180,
            -10.007120826311045
          ]
        ]
      ]
    ]
  },
  "collection": "modis-13q1-061",
  "properties": {
    "created": "2023-05-03T02:02:50.232600Z",
    "updated": "2022-06-17T09:38:16.263000Z",
    "datetime": null,
    "platform": "terra",
    "instruments": [
      "modis"
    ],
    "end_datetime": "2022-06-09T23:59:59Z",
    "modis:tile-id": "51035010",
    "start_datetime": "2022-05-25T00:00:00Z",
    "modis:vertical-tile": 10,
    "modis:horizontal-tile": 35
  },
  "stac_extensions": [],
  "stac_version": "1.0.0"
}

@Kirill888
Copy link
Member

Kirill888 commented May 10, 2023

This change is not in any release yet:

dc9f5d5

It would fix this exact error, but it then fails later on as it doesn't detect any asset as a data band.

EDIT: nope.. separate issue.

Kirill888 added a commit that referenced this issue May 10, 2023
- Add pass-list of non-image media types
- Extend extension list
Kirill888 added a commit that referenced this issue May 10, 2023
- Add pass-list of non-image media types
- Extend extension list
@Kirill888
Copy link
Member

@muelj12 #116 fixes detection of raster assets for hdf like sources. Would you be able to try this code and report next place this breaks?

I suspect that your asset definition will need to be expanded to include subdataset information, not sure how this is meant to be done in STAC. And most likely data loading code will need to be tweaked to construct appropriate uri for GDAL when input has subdatasets.

@muelj12
Copy link
Author

muelj12 commented May 10, 2023

@Kirill888 how can i install a odc.stac version with the fixed version?

@Kirill888
Copy link
Member

@muelj12 This should work:

pip install git+https://github.com/opendatacube/odc-stac

@muelj12
Copy link
Author

muelj12 commented May 10, 2023

@Kirill888
MODIS_13Q1_061_sampel = stac_load(stac_items)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<timed exec> in <module>

~/.local/lib/python3.8/site-packages/odc/stac/_load.py in load(items, bands, groupby, resampling, dtype, chunks, pool, crs, resolution, anchor, geobox, bbox, lon, lat, x, y, like, geopolygon, progress, fail_on_error, stac_cfg, patch_url, preserve_original_order, **kw)
    492 
    493     if gbox is None:
--> 494         raise ValueError("Failed to auto-guess CRS/resolution.")
    495 
    496     if chunks is not None:

ValueError: Failed to auto-guess CRS/resolution.

Same with Modis_sampel = stac_load(stac_items_modis, crs="EPSG:3857")

@Kirill888
Copy link
Member

Kirill888 commented May 10, 2023

Just add crs=4326, resolution=1 or whatever resolution/crs makes sense to stac_load(..).

When STAC is missing projection extension we can not guess appropriate resolution without looking at the raster data file first.

@muelj12
Copy link
Author

muelj12 commented May 10, 2023

Modis_sampel = stac_load(stac_items_modis, crs="EPSG:3857", resolution=250)

Aborting load due to failure while reading:-MyPath-/MYD10A1.A2022172.h18v03.061.2022174044428.hdf:1
---------------------------------------------------------------------------
CPLE_OpenFailedError                      Traceback (most recent call last)
rasterio/_base.pyx in rasterio._base.DatasetBase.__init__()

rasterio/_base.pyx in rasterio._base.open_dataset()

rasterio/_err.pyx in rasterio._err.exc_wrap_pointer()

CPLE_OpenFailedError: '-MyPath-/MYD10A1.A2022172.h18v03.061.2022174044428.hdf' not recognized as a supported file format.

During handling of the above exception, another exception occurred:

RasterioIOError                           Traceback (most recent call last)
<timed exec> in <module>

~/.local/lib/python3.8/site-packages/odc/stac/_load.py in load(items, bands, groupby, resampling, dtype, chunks, pool, crs, resolution, anchor, geobox, bbox, lon, lat, x, y, like, geopolygon, progress, fail_on_error, stac_cfg, patch_url, preserve_original_order, **kw)
    608         _work = progress(SizedIterable(_work, total_tasks))
    609 
--> 610     for _ in _work:
    611         pass
    612 

~/.local/lib/python3.8/site-packages/odc/stac/_utils.py in pmap(func, inputs, pool)
     36     """
     37     if pool is None:
---> 38         yield from map(func, inputs)
     39         return
     40 

~/.local/lib/python3.8/site-packages/odc/stac/_load.py in _do_one(task)
    599         ]
    600         with rio_env(**_rio_env):
--> 601             _ = _fill_2d_slice(srcs, task.dst_gbox, task.cfg, dst_slice)
    602         t, y, x = task.idx_tyx
    603         return (task.band, t, y, x)

~/.local/lib/python3.8/site-packages/odc/stac/_load.py in _fill_2d_slice(srcs, dst_gbox, cfg, dst)
    696 
    697     src, *rest = srcs
--> 698     _roi, pix = rio_read(src, cfg, dst_gbox, dst=dst)
    699 
    700     for src in rest:

~/.local/lib/python3.8/site-packages/odc/stac/_reader.py in rio_read(src, cfg, dst_geobox, dst)
    192                 src.band,
    193             )
--> 194             raise e
    195 
    196     # Failed to read, but asked to continue

~/.local/lib/python3.8/site-packages/odc/stac/_reader.py in rio_read(src, cfg, dst_geobox, dst)
    184 
    185     try:
--> 186         return _rio_read(src, cfg, dst_geobox, dst)
    187     except rasterio.errors.RasterioIOError as e:
    188         if cfg.fail_on_error:

~/.local/lib/python3.8/site-packages/odc/stac/_reader.py in _rio_read(src, cfg, dst_geobox, dst)
    217     ttol = 0.9 if cfg.nearest else 0.05
    218 
--> 219     with rasterio.open(src.uri, "r", sharing=False) as rdr:
    220         assert isinstance(rdr, rasterio.DatasetReader)
    221         ovr_idx: Optional[int] = None

~/.local/lib/python3.8/site-packages/rasterio/env.py in wrapper(*args, **kwds)
    449 
    450         with env_ctor(session=session):
--> 451             return f(*args, **kwds)
    452 
    453     return wrapper

~/.local/lib/python3.8/site-packages/rasterio/__init__.py in open(fp, mode, driver, width, height, count, crs, transform, dtype, nodata, sharing, **kwargs)
    302 
    303         if mode == "r":
--> 304             dataset = DatasetReader(path, driver=driver, sharing=sharing, **kwargs)
    305         elif mode == "r+":
    306             dataset = get_writer_for_path(path, driver=driver)(

rasterio/_base.pyx in rasterio._base.DatasetBase.__init__()

RasterioIOError: '-MyPath-/MYD10A1.A2022172.h18v03.061.2022174044428.hdf' not recognized as a supported file format.

@Kirill888
Copy link
Member

odc-stac relies solely on rasterio for reading raster data, which in turn uses GDAL under the hood. As such odc-stac can only read data formats supported by you installation of rasterio/GDAL. Depending on how you installed rasterio it might or might not support hdf inputs.

It is also possible that rasterio in your environment supports hdf inputs, but requires special syntax for specifying subdataset that needs to be open. When odc-stac was being developed focus was mostly on TIFF assets as this format is used in the majority of public datasets exposed via STAC. TIFF images, especially COG variant, have the most robust software support for access over HTTP and a well-defined and supported geo-referencing.

Supporting hdf/netcdf/zarr inputs would be nice, but I'm certain that this will require some extra development work on the odc-stac side, and possibly even on STAC extension level. I suspect that this won't be exactly straightforward.

Have you tried reading your data with stackstac? Can rioxarray load those hdf files?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants