Skip to content

opendap / dap4 support for pydap backend #10182

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

Mikejmnez
Copy link
Contributor

@Mikejmnez Mikejmnez commented Mar 28, 2025

There have been some (revitalizing) upgrades over the last year to pydap (with some more to come), including authentication, pydap data model (dap2 vs dap4). This Draft PR aims to

  • DOC: Update info on how to authenticate (handled by requests, which recovers credentials )
  • DOC: updates links to new (regularly updated) pydap documentation.
  • DOC: How to define opendap protocol via URLs.
  • DOC: Some info on caching sessions to avoid repeatedly downloading same metadata/data from server. At least point to pydap documentation. (Performance upgrade)
  • Removes slashes in variable names (part of dap4 protocol, which allows for hierarchical variables in dataset)
  • Support for DataTree.
  • Improves testing of backend.
  • removes (largely unused) output_grid option. It is now set to False as default in pydap.
  • Add deprecation warning of output_grid as an input parameter. It will always be False as default (and there is no need for it to be True, as it only adds unnecessary logic).

This PR enables, for example (check test opendap dap4 url)

URL = "dap4://test.opendap.org/opendap/dap4/SimpleGroup.nc4.h5"
ds = xr.open_datatree(URL, engine='pydap')

Screenshot 2025-04-14 at 2 23 17 PM

@github-actions github-actions bot added topic-backends topic-indexing topic-documentation topic-plotting topic-performance topic-zarr Related to zarr storage library CI Continuous Integration tools topic-rolling Automation Github bots, testing workflows, release automation io labels Apr 12, 2025
@Mikejmnez
Copy link
Contributor Author

Apologies for all the added labels / contributors. Didn't have much time to work on this pr this week, and so I had to rebase and merge the main branch to this one. Changed files are correct though.

@Mikejmnez Mikejmnez marked this pull request as ready for review April 14, 2025 15:38
@Mikejmnez
Copy link
Contributor Author

Ready for review :)

Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Mikejmnez

The test_cmp_local_file failure is real.

@@ -49,36 +51,37 @@ def __getitem__(self, key):
)

def _getitem(self, key):
# pull the data from the array attribute if possible, to avoid
# downloading coordinate data twice
array = getattr(self.array, "array", self.array)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we need this any more?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I set the output_grid=False by default on pydap, so that should not be necessary. But I'll double check (All the tests passed on the xarray-tests environment)

pytest -v xarray/tests/test_backends.py --run-network-tests

assert "chunks" in actual
assert actual["chunks"] == ("auto" if has_zarr_v3 else None)

var = xr.Variable("x", [1, 2], encoding={"chunks": (1,)})
actual = backends.zarr.extract_zarr_variable_encoding(var, zarr_format=3)
actual = backends.zarr.extract_zarr_variable_encoding(var)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is needed, let's undo this change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really know where this is from... I merged main branch into this PR and did a git pull --rebase origin pydap4 . Totally happy to change that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feel free to just merge, no rebase is necessary if that makes it easier

@@ -2628,10 +2627,12 @@ def test_hidden_zarr_keys(self) -> None:
for var in expected.variables.keys():
assert self.DIMENSION_KEY not in expected[var].attrs

if has_zarr_v3:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bad merge

@@ -3343,67 +3344,6 @@ def test_cache_members(self) -> None:
observed_keys_2 = sorted(zstore_mut.array_keys())
assert observed_keys_2 == sorted(array_keys + [new_key])

@requires_dask
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bad merge

@Mikejmnez
Copy link
Contributor Author

Mikejmnez commented Apr 16, 2025

It looks like the failing for ubuntu-latest py3.10 min-all-deps has pydap pinned to 3.4 , which is at least 1 year old (and would explain the errors). The latest conda version is 3.5.5.

@github-actions github-actions bot added the dependencies Pull requests that update a dependency file label Apr 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Automation Github bots, testing workflows, release automation CI Continuous Integration tools dependencies Pull requests that update a dependency file io topic-backends topic-documentation topic-indexing topic-performance topic-plotting topic-rolling topic-zarr Related to zarr storage library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants