Non-kerchunk backend for HDF5/netcdf4 files. #87

sharkinsspatial · 2024-04-22T18:37:25Z

This is a rudimentary initial implementation for #78. The core code is ported directly from kerchunk's hdf backend. I have not ported the bulk of the kerchunk backend's specialized encoding translation logic but I'll try to do so incrementally so that we can build complete test coverage for the many edge cases it currently covers.

TomNicholas

This is looking great so far @sharkinsspatial !

kerchunk backend's specialized encoding translation logic

This part I would really like to either factor out, or at a least really understand what it's doing. See #68

TomNicholas · 2024-04-22T20:53:03Z

virtualizarr/readers/hdf.py

@@ -0,0 +1,206 @@
+from typing import List, Mapping, Optional
+
+import fsspec


Does one need fsspec if reading a local file? Is there any other way to read from S3 without fsspec at all?

Not with a filesystem-like API. You would have to use boto3 or aiobotocore directly.

This is one of the great virtues of fsspec and is not to be under-valued.

TomNicholas · 2024-04-22T20:56:06Z

virtualizarr/readers/hdf.py

+def virtual_vars_from_hdf(
+    path: str,
+    drop_variables: Optional[List[str]] = None,
+) -> Mapping[str, xr.Variable]:


I like this an a way to interface with the code in open_virtual_dataset

rabernat · 2024-04-22T21:42:46Z

This looks cool @sharkinsspatial!

My opinion is that it doesn't make sense to just forklift the kerchunk code into virtualizarr. What I would love to see is an extremely tight, strictly typed, unit-tested total refactor of the parsing logic. I think you're headed down the right path, but I encourage you to push as far as you can in that direction.

for more information, see https://pre-commit.ci

sharkinsspatial · 2024-05-13T19:26:21Z

@rabernat Fully agree with your take above 👆 👍 . I'm trying to work through this incrementally whenever I can find some spare time. In the spirit of thorough test coverage 🎊 looking through your issue pydata/xarray#7388 and the corresponding PR I'm not sure what the proper incantation of variable encoding configuration is to use blosc with the netcdf4 engine? Do you have an example of this that you can provide?

for more information, see https://pre-commit.ci

…Zarr into hdf5_reader

…9554.

codecov · 2024-11-18T20:11:37Z

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

sharkinsspatial · 2024-11-18T22:50:47Z

@TomNicholas This should be ready for final review now.

TomNicholas

Looks good! Only one question about warnings in the test suite when dependencies aren't installed.

Also let's add a release note saying this is available for experimental use.

TomNicholas · 2024-11-19T00:39:05Z

virtualizarr/tests/test_readers/conftest.py

+    import hdf5plugin  # type: ignore
+except ModuleNotFoundError:
+    hdf5plugin = None  # type: ignore
+    warnings.warn("hdf5plugin is required for HDF reader")


Why wouldn't this either error or just not be a problem?

I added a release note in #307, but I'm still curious about this @sharkinsspatial

sharkinsspatial added 4 commits April 19, 2024 13:31

Generate chunk manifest backed variable from HDF5 dataset.

6b7abe2

Transfer dataset attrs to variable.

bca0aab

Get virtual variables dict from HDF5 file.

384ff6b

Update virtual_vars_from_hdf to use fsspec and drop_variables arg.

4c5f9bd

sharkinsspatial marked this pull request as draft April 22, 2024 18:37

sharkinsspatial added 3 commits April 22, 2024 13:02

mypy fix to use ChunkKey and empty dimensions list.

1dd3370

Extract attributes from hdf5 root group.

d92c75c

Use hdf reader for netcdf4 files.

0ed8362

TomNicholas reviewed Apr 22, 2024

View reviewed changes

TomNicholas added enhancement New feature or request references generation Reading byte ranges from archival files labels Apr 22, 2024

[pre-commit.ci] auto fixes from pre-commit.com hooks

f4485fa

for more information, see https://pre-commit.ci

sharkinsspatial mentioned this pull request Apr 23, 2024

How to handle encoding #68

Open

sharkinsspatial added 4 commits May 8, 2024 17:53

Merge branch 'main' into hdf5_reader

3cc1254

Fix ruff complaints.

0123df7

First steps for handling HDF5 filters.

332bcaa

Initial step for hdf5plugin supported codecs.

c51e615

TomNicholas mentioned this pull request May 14, 2024

open_virtual_dataset with dmr++ #113

Merged

6 tasks

sharkinsspatial and others added 10 commits May 16, 2024 16:24

Small commit to check compression support in CI environment.

0083f77

Merge branch 'main' into hdf5_reader

3c00071

[pre-commit.ci] auto fixes from pre-commit.com hooks

207c4b5

for more information, see https://pre-commit.ci

Fix mypy complaints for hdf_filters.

c573800

Merge branch 'hdf5_reader' of https://github.com/TomNicholas/Virtuali…

ef0d7a8

…Zarr into hdf5_reader

Local pre-commit fix for hdf_filters.

588e06b

Use fsspec reader_options introduced in #37.

725333e

Fix incorrect zarr_v3 if block position from merge commit ef0d7a8.

72df108

Fix early return from hdf _extract_attrs.

d1e85cb

Test that _extract_attrs correctly handles multiple attributes.

1e2b343

TomNicholas had a problem deploying to test-release November 18, 2024 17:05 — with GitHub Actions Failure

Use soft import strategy for optional dependencies see xarray/issues/…

ee6fa0b

…9554.

sharkinsspatial force-pushed the hdf5_reader branch from 13f82f9 to ee6fa0b Compare November 18, 2024 18:15

sharkinsspatial added 2 commits November 18, 2024 13:27

Merge branch 'main' into hdf5_reader

44bce08

Handle mypy for soft imports.

5de9d2c

sharkinsspatial had a problem deploying to test-release November 18, 2024 18:53 — with GitHub Actions Failure

sharkinsspatial force-pushed the hdf5_reader branch 2 times, most recently from b2c89df to 06a5ae1 Compare November 18, 2024 19:28

sharkinsspatial had a problem deploying to test-release November 18, 2024 19:28 — with GitHub Actions Failure

Attempt at nested optional depedency usage.

a8cc82f

sharkinsspatial force-pushed the hdf5_reader branch from 06a5ae1 to a8cc82f Compare November 18, 2024 19:48

sharkinsspatial had a problem deploying to test-release November 18, 2024 19:48 — with GitHub Actions Failure

sharkinsspatial had a problem deploying to test-release November 18, 2024 20:01 — with GitHub Actions Failure

sharkinsspatial force-pushed the hdf5_reader branch from 13c51a4 to a1c1ff1 Compare November 18, 2024 20:10

sharkinsspatial had a problem deploying to test-release November 18, 2024 20:10 — with GitHub Actions Failure

Handle use of soft import sub modules for typing.

65a6b14

sharkinsspatial force-pushed the hdf5_reader branch from a1c1ff1 to 65a6b14 Compare November 18, 2024 20:14

sharkinsspatial temporarily deployed to test-release November 18, 2024 20:14 — with GitHub Actions Inactive

sharkinsspatial changed the title ~~[Draft] Non-kerchunk backend for HDF5/netcdf4 files.~~ Non-kerchunk backend for HDF5/netcdf4 files. Nov 19, 2024

TomNicholas approved these changes Nov 19, 2024

View reviewed changes

sharkinsspatial merged commit 647d175 into main Nov 19, 2024
11 checks passed

TomNicholas deleted the hdf5_reader branch November 19, 2024 15:30

TomNicholas added a commit that referenced this pull request Nov 19, 2024

release note for #87

78c5291

TomNicholas added a commit that referenced this pull request Nov 19, 2024

release note for #87 (#307)

633cd5e

TomNicholas mentioned this pull request Nov 19, 2024

Split optional dependencies in pyproject.toml #309

Open

2 tasks

TomNicholas mentioned this pull request Nov 29, 2024

Dependency Issue for Kerchunk -> Icechunk via Virtualizarr #321

Open

sharkinsspatial mentioned this pull request Dec 5, 2024

Typing errors in HDF reader #324

Open

rabernat mentioned this pull request Dec 7, 2024

open_virtual_dataset fails when there is a subgroup #336

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-kerchunk backend for HDF5/netcdf4 files. #87

Non-kerchunk backend for HDF5/netcdf4 files. #87

sharkinsspatial commented Apr 22, 2024 •

edited

Loading

TomNicholas left a comment

TomNicholas Apr 22, 2024

rabernat Apr 22, 2024

TomNicholas Apr 22, 2024

rabernat commented Apr 22, 2024

sharkinsspatial commented May 13, 2024

codecov bot commented Nov 18, 2024 •

edited

Loading

sharkinsspatial commented Nov 18, 2024

TomNicholas left a comment

TomNicholas Nov 19, 2024

TomNicholas Nov 19, 2024

		@@ -0,0 +1,206 @@
		from typing import List, Mapping, Optional

		import fsspec

Non-kerchunk backend for HDF5/netcdf4 files. #87

Non-kerchunk backend for HDF5/netcdf4 files. #87

Conversation

sharkinsspatial commented Apr 22, 2024 • edited Loading

TomNicholas left a comment

Choose a reason for hiding this comment

TomNicholas Apr 22, 2024

Choose a reason for hiding this comment

rabernat Apr 22, 2024

Choose a reason for hiding this comment

TomNicholas Apr 22, 2024

Choose a reason for hiding this comment

rabernat commented Apr 22, 2024

sharkinsspatial commented May 13, 2024

codecov bot commented Nov 18, 2024 • edited Loading

Welcome to Codecov 🎉

sharkinsspatial commented Nov 18, 2024

TomNicholas left a comment

Choose a reason for hiding this comment

TomNicholas Nov 19, 2024

Choose a reason for hiding this comment

TomNicholas Nov 19, 2024

Choose a reason for hiding this comment

sharkinsspatial commented Apr 22, 2024 •

edited

Loading

codecov bot commented Nov 18, 2024 •

edited

Loading