Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xarray 0.18.0 raises ValueError, not FileNotFoundError, when opening a non-existent file #5329

Open
pont-us opened this issue May 18, 2021 · 8 comments · May be fixed by #5351
Open

xarray 0.18.0 raises ValueError, not FileNotFoundError, when opening a non-existent file #5329

pont-us opened this issue May 18, 2021 · 8 comments · May be fixed by #5351

Comments

@pont-us
Copy link
Contributor

pont-us commented May 18, 2021

What happened:

In a Python environment with xarray 0.18.0 and python-netcdf4 installed, I called xarray.open_dataset("nonexistent"). (The file "nonexistent" does not exist.) xarray threw a ValueError: cannot guess the engine, try passing one explicitly.

What you expected to happen:

I expected a FileNotFoundError error to be thrown, as in xarray 0.17.0.

Minimal Complete Verifiable Example:

import xarray as xr
xr.open_dataset("nonexistent")

Anything else we need to know?:

This is presumably related to Issue #5295, but is not fixed by PR #5296: ValueError is also thrown with the currently latest commit in master (9165c26).

This change in behaviour produced a hard-to-diagnose bug deep in xcube, where we were catching the FileNotFound exception to deal gracefully with a missing file, but the new ValueError was of course not caught -- and the error message did not make the cause obvious. Catching ValueError is a workaround, but not a great solution since it may also be thrown for files which do exist but don't have a recognizable data format. I suspect that other codebases may be similarly affected.

xarray 0.17.0 was capable of throwing a ValueError for a non-existent file, but only in the (rare?) case that neither netCDF4-python nor scipy was installed.

Environment:

Output of xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.9.4 | packaged by conda-forge | (default, May 10 2021, 22:13:33) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.8.0-53-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.8.0

xarray: 0.18.0
pandas: 1.2.4
numpy: 1.20.2
scipy: None
netCDF4: 1.5.6
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.4.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 49.6.0.post20210108
pip: 21.1.1
conda: None
pytest: None
IPython: None
sphinx: None

@mathause
Copy link
Collaborator

Thanks for the report - I added that to the list of todos for the 0.18.1 patch release.

@aurghs aurghs self-assigned this May 18, 2021
@aurghs
Copy link
Collaborator

aurghs commented May 18, 2021

I think that It's not a bug: filename_or_obj in open_dataset can be a file, file-like, bytes, URL. The accepted inputs depend on the engine. So it doesn't make sense to raise a FileNotFoundError if the engine is not defined by the user or not automatically detected by xarray.

@aurghs aurghs removed their assignment May 18, 2021
@pont-us
Copy link
Contributor Author

pont-us commented May 18, 2021

@aurghs Point taken, but in that case the xarray.open_dataset documentation probably needs updating: currently it says ‘Strings and Path objects are interpreted as a path to a netCDF file or an OpenDAP URL and opened with python-netCDF4, unless the filename ends with .gz…’. In particular I find the behaviour with Path confusing: this type is unambiguously a filesystem path, but still results in a ValueError rather than FileNotFoundError. And in Python 3, str and bytes are distinct (though I suppose a str might still be a URL rather than a file).

In principle, though, I'd be OK with any selection of potential exceptions, as long as they're documented. The current practical problem for us (the xcube developers) is that the behaviour change from 0.17 to 0.18 means that we now have to audit our existing codebases for potential uncaught ValueErrors.

@shoyer
Copy link
Member

shoyer commented May 19, 2021

Ouch! I agree this is annoying. And yes, the open_dataset docs definitely need updating.

The general rule is that ValueError is too generic of an exception for it to be appropriate to catch in user code. So indeed, if you need to catch that error something has gone wrong :).

Given this use case, perhaps we should have a special exception type here, something like xarray.NoMatchingEngineError.

@shoyer
Copy link
Member

shoyer commented May 19, 2021

Actually, thinking about this a little more, you should get NoMatchingEngineError if you try to open a text file with Xarray.

But if you pass in a local filesystem path and try to open an entirely non-existent file, then FileNotFoundError would be appropriate.

In theory, one could write an open_dataset backend that interprets a local path in an arbitrary way, but that is not the intent of open_dataset, which is intended to handle things like files/directories, other URIs and file contents. If you want to interpret strings very differently you should write a new function for opening.

@alexamici
Copy link
Collaborator

@shoyer and @pont-us my take is that engine=None is mostly and feature to help new users experiment with xarray. The logic for autodetecting the engine has always been extremely fragile and the result depends on what packages are installed on the system.

I think it is OK to add a dedicated xarray.EngineGuessingError, but I'd not try anything more complex and in general invite users to pass engine explicitly.

Examples of strings that identify a dataset when the right backend is installed, but are not paths and the FileNotFoundError would sound confusing:

  • /vsizip/my.zip/my.tif with rioxarray
  • https://myhost/myfile.nc with netcdf4 and others

@pont-us
Copy link
Contributor Author

pont-us commented May 19, 2021

I think it is OK to add a dedicated xarray.EngineGuessingError, but I'd not try anything more complex and in general invite users to pass engine explicitly.

I'm reluctantly inclined to agree. (But "reluctantly" only because this means that we may have a lot of xcube code to check and update now.) If the engine is explicitly specified, the question of which exceptions to expect becomes a lot easier to handle, since it's now just between the caller and the engine.

Probably the open_dataset documentation should actively discourage reliance on engine guessing for anything but interactive use, and point out that callers need to check the documentation for the selected engine to determine which exceptions (other than EngineGuessingError) might be thrown.

@ryan-williams
Copy link

I just accidentally tried to open myfile.h5.h5 (I meant to open myfile.h5) and was confused about the ValueError.

  • Should .h5 be added to the list of extensions that map to the NetCDF engine?
  • Could the error message for the ValueError (or any of the alternate exception classes discussed above) include text suggesting that an incorrect file path (or other URL?) could be the culprit?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants