Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maybe support hdf4 #494

Merged
merged 14 commits into from
Sep 12, 2024
Merged

Maybe support hdf4 #494

merged 14 commits into from
Sep 12, 2024

Conversation

martindurant
Copy link
Member

@martindurant martindurant commented Aug 19, 2024

Working:

  • find all file tags
  • several informational tags
  • follow LINKED lists
  • find array CHUNK table references
  • follow COMPressed tags
  • build hierarchy

kerchunk/hdf.py Outdated Show resolved Hide resolved
@martindurant
Copy link
Member Author

martindurant commented Aug 22, 2024

This has now some interesting features

In [324]: tags, root = kerchunk.hdf.HDF4ToZarr("/Users/mdurant/data/MOD14.hdf4").translate()

In [326]: root
Out[326]: {('VG', 614)}

In [327]: tags[('VG', 614)]
Out[327]:
{'number_of_scan_lines': 2030,
 'pixels_per_scan_line': 1354,
 'number_of_active_fires': 0,
 'cmg_cells_night': 6390,
 'cmg_values': 8,
 'fire mask': {'number_of_scan_lines': 2030,
  'pixels_per_scan_line': 1354,

The two variables in here are actually "fire_mask" and "algorithm QA" in a single group, which were not shown by rasterio/xarray. The additional coordinates reported by rasterio/xarray are not in the data: "band", "spatial_ref" - I suppose rasterio wants everything to be geo-like and makes these up; there is a warning to say it is doing this:

NotGeoreferencedWarning: Dataset has no geotransform, gcps, or rpcs. The identity matrix will be returned.

@martindurant
Copy link
Member Author

Latest:

In [1]: import xarray as xr
In [2]: import kerchunk.hdf
In [3]: out = kerchunk.hdf.HDF4ToZarr("/Users/mdurant/data/MOD14.hdf4").translate()
In [4]: ds = xr.open_dataset(out, engine="kerchunk")
In [5]: ds
Out[5]:
<xarray.Dataset>
Dimensions:        (CMG_night_x: 6390, CMG_night_y: 8, 0: 0,
                    algorithm QA_x: 2030, algorithm QA_y: 1354,
                    fire mask_x: 2030, fire mask_y: 1354)
Dimensions without coordinates: CMG_night_x, CMG_night_y, 0, algorithm QA_x,
                                algorithm QA_y, fire mask_x, fire mask_y
Data variables: (12/30)
    CMG_night      (CMG_night_x, CMG_night_y) float32 ...
    FP_AdjCloud    (0) float32 ...
    FP_AdjWater    (0) float32 ...
    FP_CMG_col     (0) float32 ...
    FP_CMG_row     (0) float32 ...
    FP_MAD_DT      (0) float32 ...
    ...             ...
    FP_line        (0) float32 ...
    FP_longitude   (0) float32 ...
    FP_power       (0) float32 ...
    FP_sample      (0) float32 ...
    algorithm QA   (algorithm QA_x, algorithm QA_y) float64 ...
    fire mask      (fire mask_x, fire mask_y) float32 ...
Attributes: (12/91)
    ADDITIONALATTRIBUTENAME:           identifier_product_doi_authority
    ANCILLARYINPUTPOINTER:             MOD03.A2024226.2345.061.2024227053247.hdf
    ANCILLARYINPUTTYPE:                Geolocation
    ASSOCIATEDINSTRUMENTSHORTNAME:     MODIS
    ASSOCIATEDPLATFORMSHORTNAME:       Terra
    ASSOCIATEDSENSORSHORTNAME:         MODIS
    ...                                ...
    cmg_values:                        8
    identifier_product_doi:            10.5067/MODIS/MOD14.061
    identifier_product_doi_authority:  https://doi.org
    number_of_active_fires:            0
    number_of_scan_lines:              2030
    pixels_per_scan_line:              1354

@martindurant martindurant marked this pull request as ready for review September 12, 2024 19:21
@martindurant martindurant merged commit 10a9248 into fsspec:main Sep 12, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant