Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

modified: xarray/backends/api.py #817

Closed
wants to merge 3 commits into from
Closed

Conversation

swnesbitt
Copy link

This pull request addresses issue #816, allowing the user to open remote openDAP files that are gzipped using the netCDF4 openDAP-enabled backend. Thanks to @dopplershift and @shoyer for the quick guidance!

if is_remote_uri(filename_or_obj):
if engine !='netcdf4':
raise ValueError('can only read remote gzipped netCDF files '
"with engine='netcdf4'")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is actually not quite right -- pydap would also work.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't deal with the mess, but added pydap to the possibilities.

@shoyer
Copy link
Member

shoyer commented Apr 6, 2016

I think this is probably correct, but the heuristics here are starting to get convoluted enough that I worry about test coverage. Is there any way we can test this? Maybe try to pull the gzip logic into a helper function (an extended variant of _get_default_engine) that we could test?

@swnesbitt
Copy link
Author

Certainly can take a crack at this. This is working for my application, but would be good to clean up the code to simplify the cases as you suggest.

Regarding testing, it would be good to have a test module that includes grabbing files for local access and accessing openDAP datasets (from UNIDATA?) to ensure that the installed backends are all working as expected.

@shoyer
Copy link
Member

shoyer commented Apr 6, 2016

We do have existing tests for backends: https://github.com/pydata/xarray/blob/master/xarray/test/test_backends.py

This includes a test accessing an OpenDAP dataset from the OpenDAP test server (via pydap, at the end). But in my experience, there servers are somewhat unreliable (maybe available 90%), so we don't require that test to pass for the build to pass. Also, even in the best case scenario network access is slow. So it would be nice to modularize this enough that our logic is testable without actually using opendap.

@jhamman
Copy link
Member

jhamman commented Apr 6, 2016

Regarding testing, it would be good to have a test module that includes grabbing files for local access and accessing openDAP datasets (from UNIDATA?) to ensure that the installed backends are all working as expected.

I'd be a little hesitant to enforce that we get access to opendap (or other remote datasets) on travis. I've seen this come back to bite us in the past. We can allow tests to fail on travis and just print a warning via pytest if it is something we really want to see tested in that way.

@shoyer
Copy link
Member

shoyer commented Apr 6, 2016

Currently the way we handle this is that the only test that accessing
remote resources is the pydap test, which only runs in one build (not
required to pass).

@swnesbitt
Copy link
Author

OK - understood. Incidentally, I didn't include pydap because it doesn't seem to be python 3.* compatible. I tested it on 2.7 for my application and it seems to work.

@jhamman
Copy link
Member

jhamman commented May 11, 2016

@shoyer - are we okay with the final logic here?

@swnesbitt - can we get an entry in the what's new docs?

if engine is not None and engine != 'scipy':
raise ValueError('can only read gzipped netCDF files with '
"default engine, engine='scipy'")
# if the string ends with .gz, then gunzip and open as netcdf file
if sys.version_info[:2] < (2, 7):
raise ValueError('reading a gzipped netCDF not '
'supported on Python 2.6')
try:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This try/except clause needs to only wrap the call to ScipyDataStore -- otherwise it's catching too many potentially wrong conditions.

@shoyer
Copy link
Member

shoyer commented May 11, 2016

I am reluctant to merge this without having any way to test the logic. Without automated tests this issue is likely to recur.

That said, I suppose we could leave the refactoring for a TODO. Let's add a note on that and also one minimal test to verify that we raise an error if you try to use engine='netcdf4' with a path to local gzipped file.

@tsaoyu
Copy link

tsaoyu commented May 11, 2018

How is this issue progressed so far? I am running into the same problem now and I might able to offer some help in testing.

@jhamman
Copy link
Member

jhamman commented May 11, 2018

@tsaoyu - I don't think anyone has worked on developing a test case for this feature. I assume @swnesbitt would appreciate help there.

@dopplershift
Copy link
Contributor

Regarding the testing issue, another option is to use something like vcrpy to record and playback http responses for opendap requests. I've had good luck with that for Siphon.

@tsaoyu
Copy link

tsaoyu commented May 14, 2018

I am trying to understand the logic by looking through the comments/discussion on this commit.
After two years of development, there are quite some changes on the API itself e.g. the logic on the gzipped file handling seems already moved from api.py
to scipy_.py. So I think it would be better to start a new commit from the latest version.

To bring the concept on the same table, there are 3 backends that can be used to handle gzipped netCDF file namely scipy, netcdf4 and pydap. scipy backend only support netCDF3 so far and that is the only implemented method to handle gzipped netCDF3 file. netcdf4 and pydap should able to deal with netCDF4 file given the file is first opened by gzip.open (as it was implement in scipy_.py).

So what we need to do is to make both netCDF4_.py and pydap_.py aware of the file is end with .gz and first open it using gzip.open. Following tests also need to be done to verify the implementation of the logic. I have looked at vcrpy and it seems a nice idea to speed up tests with HTTP stuff involved. I must confess I only use xarray with local file so it may takes quite a while to understand the issue.

@shoyer
Copy link
Member

shoyer commented May 14, 2018

The only way we could make reading a gzipped netCDF4 file is to load the entire file into memory. That's why we didn't support this before. It's also less relevant for netCDF4, because netCDF4 supports in-file compression directly.

With netCDF3, we can use scipy's netcdf reader, which supports Python file objects. But netCDF4-Python does not support Python file objects.

This issue is concerned about supporting paths ending with .gz in remote URLs, which are not local files but rather files exposed via the OpenDAP protocol over a network. If the DAP server can server a file with a .gz extension then xarray should be OK with it, too.

@jhamman
Copy link
Member

jhamman commented Jan 31, 2023

Closing this as stale and out of date with our current backends. @swnesbitt (or others) - feel free to open a new PR if you feel there is more to do here.

@jhamman jhamman closed this Jan 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants