Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: NetCDF: Invalid dimension ID or name #8072

Closed
4 tasks done
jerabaul29 opened this issue Aug 15, 2023 · 7 comments
Closed
4 tasks done

RuntimeError: NetCDF: Invalid dimension ID or name #8072

jerabaul29 opened this issue Aug 15, 2023 · 7 comments
Labels
bug needs triage Issue that has not been reviewed by xarray team member upstream issue

Comments

@jerabaul29
Copy link
Contributor

jerabaul29 commented Aug 15, 2023

What happened?

I got a long error message and the RuntimeError: NetCDF: Invalid dimension ID or name error when trying to open a nc file with xarray. I did a bit of digging around, but could not find a workaround (the issues with similar problems I found were old and supposed to be solved).

What did you expect to happen?

No response

Minimal Complete Verifiable Example

See the jupyter notebook: https://github.com/jerabaul29/public_bug_reports/blob/main/xarray/2023_08_15/illustrate_issue_xr.ipynb .

Copying the commands (note that it should be run as a notebook):

import os
import xarray as xr
xr.show_versions()
# get the file
!wget https://arcticdata.io/metacat/d1/mn/v2/object/urn%3Auuid%3Ad5f179a3-76a8-4e4f-b45f-8e8d85960ba6
# rename the file
!mv urn\:uuid\:d5f179a3-76a8-4e4f-b45f-8e8d85960ba6 bar_BSO3_a_rcm7_2008_09.nc
# the file is not xarray-friendly, modify and rename
!ncrename -v six_con,six_con_dim bar_BSO3_a_rcm7_2008_09.nc bar_BSO3_a_rcm7_2008_09_xr.nc
xr.open_dataset("./bar_BSO3_a_rcm7_2008_09_xr.nc")

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~/miniconda3/envs/pytmd/lib/python3.11/site-packages/xarray/backends/file_manager.py:211, in CachingFileManager._acquire_with_cache_info(self, needs_lock)
    210 try:
--> 211     file = self._cache[self._key]
    212 except KeyError:

File ~/miniconda3/envs/pytmd/lib/python3.11/site-packages/xarray/backends/lru_cache.py:56, in LRUCache.__getitem__(self, key)
     55 with self._lock:
---> 56     value = self._cache[key]
     57     self._cache.move_to_end(key)

KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('/home/jrmet/Desktop/Git/public_bug_reports/xarray/2023_08_15/bar_BSO3_a_rcm7_2008_09_xr.nc',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False)), 'b771184f-a474-4093-b5f1-784542ef9ce6']

During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
Cell In[6], line 1
----> 1 xr.open_dataset("./bar_BSO3_a_rcm7_2008_09_xr.nc")

File ~/miniconda3/envs/pytmd/lib/python3.11/site-packages/xarray/backends/api.py:570, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, chunked_array_type, from_array_kwargs, backend_kwargs, **kwargs)
    558 decoders = _resolve_decoders_kwargs(
    559     decode_cf,
    560     open_backend_dataset_parameters=backend.open_dataset_parameters,
   (...)
    566     decode_coords=decode_coords,
    567 )
    569 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 570 backend_ds = backend.open_dataset(
    571     filename_or_obj,
    572     drop_variables=drop_variables,
    573     **decoders,
    574     **kwargs,
    575 )
    576 ds = _dataset_from_backend_dataset(
    577     backend_ds,
    578     filename_or_obj,
   (...)
    588     **kwargs,
    589 )
    590 return ds

File ~/miniconda3/envs/pytmd/lib/python3.11/site-packages/xarray/backends/netCDF4_.py:602, in NetCDF4BackendEntrypoint.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, group, mode, format, clobber, diskless, persist, lock, autoclose)
    581 def open_dataset(  # type: ignore[override]  # allow LSP violation, not supporting **kwargs
    582     self,
    583     filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore,
   (...)
    599     autoclose=False,
    600 ) -> Dataset:
    601     filename_or_obj = _normalize_path(filename_or_obj)
--> 602     store = NetCDF4DataStore.open(
    603         filename_or_obj,
    604         mode=mode,
    605         format=format,
    606         group=group,
    607         clobber=clobber,
    608         diskless=diskless,
    609         persist=persist,
    610         lock=lock,
    611         autoclose=autoclose,
    612     )
    614     store_entrypoint = StoreBackendEntrypoint()
    615     with close_on_error(store):

File ~/miniconda3/envs/pytmd/lib/python3.11/site-packages/xarray/backends/netCDF4_.py:400, in NetCDF4DataStore.open(cls, filename, mode, format, group, clobber, diskless, persist, lock, lock_maker, autoclose)
    394 kwargs = dict(
    395     clobber=clobber, diskless=diskless, persist=persist, format=format
    396 )
    397 manager = CachingFileManager(
    398     netCDF4.Dataset, filename, mode=mode, kwargs=kwargs
    399 )
--> 400 return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose)

File ~/miniconda3/envs/pytmd/lib/python3.11/site-packages/xarray/backends/netCDF4_.py:347, in NetCDF4DataStore.__init__(self, manager, group, mode, lock, autoclose)
    345 self._group = group
    346 self._mode = mode
--> 347 self.format = self.ds.data_model
    348 self._filename = self.ds.filepath()
    349 self.is_remote = is_remote_uri(self._filename)

File ~/miniconda3/envs/pytmd/lib/python3.11/site-packages/xarray/backends/netCDF4_.py:409, in NetCDF4DataStore.ds(self)
    407 @property
    408 def ds(self):
--> 409     return self._acquire()

File ~/miniconda3/envs/pytmd/lib/python3.11/site-packages/xarray/backends/netCDF4_.py:403, in NetCDF4DataStore._acquire(self, needs_lock)
    402 def _acquire(self, needs_lock=True):
--> 403     with self._manager.acquire_context(needs_lock) as root:
    404         ds = _nc4_require_group(root, self._group, self._mode)
    405     return ds

File ~/miniconda3/envs/pytmd/lib/python3.11/contextlib.py:137, in _GeneratorContextManager.__enter__(self)
    135 del self.args, self.kwds, self.func
    136 try:
--> 137     return next(self.gen)
    138 except StopIteration:
    139     raise RuntimeError("generator didn't yield") from None

File ~/miniconda3/envs/pytmd/lib/python3.11/site-packages/xarray/backends/file_manager.py:199, in CachingFileManager.acquire_context(self, needs_lock)
    196 @contextlib.contextmanager
    197 def acquire_context(self, needs_lock=True):
    198     """Context manager for acquiring a file."""
--> 199     file, cached = self._acquire_with_cache_info(needs_lock)
    200     try:
    201         yield file

File ~/miniconda3/envs/pytmd/lib/python3.11/site-packages/xarray/backends/file_manager.py:217, in CachingFileManager._acquire_with_cache_info(self, needs_lock)
    215     kwargs = kwargs.copy()
    216     kwargs["mode"] = self._mode
--> 217 file = self._opener(*self._args, **kwargs)
    218 if self._mode == "w":
    219     # ensure file doesn't get overridden when opened again
    220     self._mode = "a"

File src/netCDF4/_netCDF4.pyx:2485, in netCDF4._netCDF4.Dataset.__init__()

File src/netCDF4/_netCDF4.pyx:1863, in netCDF4._netCDF4._get_dims()

File src/netCDF4/_netCDF4.pyx:2029, in netCDF4._netCDF4._ensure_nc_success()

RuntimeError: NetCDF: Invalid dimension ID or name


### Anything else we need to know?

_No response_

### Environment

<details>

INSTALLED VERSIONS
------------------
commit: None
python: 3.11.4 | packaged by conda-forge | (main, Jun 10 2023, 18:08:17) [GCC 12.2.0]
python-bits: 64
OS: Linux
OS-release: 5.15.0-78-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.1
libnetcdf: 4.9.2

xarray: 2023.7.0
pandas: 2.0.3
numpy: 1.25.2
scipy: 1.11.1
netCDF4: 1.6.4
pydap: None
h5netcdf: None
h5py: 3.9.0
Nio: None
zarr: None
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.7.2
cartopy: 0.22.0
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 68.0.0
pip: 23.2.1
conda: None
pytest: None
mypy: None
IPython: 8.14.0
sphinx: None

</details>
@jerabaul29 jerabaul29 added bug needs triage Issue that has not been reviewed by xarray team member labels Aug 15, 2023
@keewis
Copy link
Collaborator

keewis commented Aug 16, 2023

interestingly enough, I can open the file using the netcdf4 engine even without the ncrename call (when did we relax the "cannot open files with self-referential coordinates" constraint? Should we close #2368?).

In any case, ncrename might have created a corrupted file: according to ncdump -h, the all_con dimension appears to have been replaced by another nchar dimension of size 4 (while originally all_con had a size of 67).

@jerabaul29
Copy link
Contributor Author

Yes, I am aware that the file can be opened with netcdf4 without the ncrename call :) . I think that xarray is just being careful / doing some additional checks and refusing to "take chances", but the initial file looked good to me when I looked into it. But I would like to fully switch my workflows from netCDF4 to xarray, hence my report hoping this can help :) .

Ok, I did not investigate the health of the file following ncrename use. Do you think my call is valid (if so, should I open an issue at ncrename?), if not any pointers? :) .

Wondering if it could be doable to make xarray less "careful" about what it accepts to open, so that I could open the file without the issue that comes up when not doing the ncrename?

(for reference: see the updated notebook at the link above:

xr.open_dataset("./bar_BSO3_a_rcm7_2008_09.nc")

produces:

---------------------------------------------------------------------------
MissingDimensionsError                    Traceback (most recent call last)
<ipython-input-8-6a9791186661> in <module>
----> 1 xr.open_dataset("./bar_BSO3_a_rcm7_2008_09.nc")

~/.local/lib/python3.8/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, backend_kwargs, **kwargs)
    539 
    540     overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 541     backend_ds = backend.open_dataset(
    542         filename_or_obj,
    543         drop_variables=drop_variables,

~/.local/lib/python3.8/site-packages/xarray/backends/netCDF4_.py in open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, group, mode, format, clobber, diskless, persist, lock, autoclose)
    590         store_entrypoint = StoreBackendEntrypoint()
    591         with close_on_error(store):
--> 592             ds = store_entrypoint.open_dataset(
    593                 store,
    594                 mask_and_scale=mask_and_scale,

~/.local/lib/python3.8/site-packages/xarray/backends/store.py in open_dataset(self, store, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta)
     45         )
     46 
---> 47         ds = Dataset(vars, attrs=attrs)
     48         ds = ds.set_coords(coord_names.intersection(vars))
     49         ds.set_close(store.close)

~/.local/lib/python3.8/site-packages/xarray/core/dataset.py in __init__(self, data_vars, coords, attrs)
    611             coords = coords.variables
    612 
--> 613         variables, coord_names, dims, indexes, _ = merge_data_and_coords(
    614             data_vars, coords, compat="broadcast_equals"
    615         )

~/.local/lib/python3.8/site-packages/xarray/core/merge.py in merge_data_and_coords(data_vars, coords, compat, join)
    573     objects = [data_vars, coords]
    574     explicit_coords = coords.keys()
--> 575     return merge_core(
    576         objects,
    577         compat,

~/.local/lib/python3.8/site-packages/xarray/core/merge.py in merge_core(objects, compat, join, combine_attrs, priority_arg, explicit_coords, indexes, fill_value)
    753         coerced, join=join, copy=False, indexes=indexes, fill_value=fill_value
    754     )
--> 755     collected = collect_variables_and_indexes(aligned, indexes=indexes)
    756     prioritized = _get_priority_vars_and_indexes(aligned, priority_arg, compat=compat)
    757     variables, out_indexes = merge_collected(

~/.local/lib/python3.8/site-packages/xarray/core/merge.py in collect_variables_and_indexes(list_of_mappings, indexes)
    363                 append_all(coords_, indexes_)
    364 
--> 365             variable = as_variable(variable, name=name)
    366             if name in indexes:
    367                 append(name, variable, indexes[name])

~/.local/lib/python3.8/site-packages/xarray/core/variable.py in as_variable(obj, name)
    160         # convert the Variable into an Index
    161         if obj.ndim != 1:
--> 162             raise MissingDimensionsError(
    163                 f"{name!r} has more than 1-dimension and the same name as one of its "
    164                 f"dimensions {obj.dims!r}. xarray disallows such variables because they "

MissingDimensionsError: 'six_con' has more than 1-dimension and the same name as one of its dimensions ('nchar', 'six_con'). xarray disallows such variables because they conflict with the coordinates used to label dimensions.

)

@keewis
Copy link
Collaborator

keewis commented Aug 16, 2023

it appears we recently relaxed this: see #7989 (I totally missed that PR, hence my confusion). That means that after we release the next version you will be able to open the file as-is.

To rename the dimension, I think you would need to use ncrename -d six_con,six_con_dim old_file new_file (I never really used NCO before). However, that also creates a corrupted netcdf file for me, so I'm not sure what to recommend. Even

import netcdf4 as nc

with nc.Dataset("bar_BSO3_a_rcm7_2008_09.nc", "r+") as ds:
    ds.renameDimension("six_con", "six_con_dim")
    ds.renameDimension("all_con", "all_con_dim")

does not appear to work, so I'm not sure what is wrong with that.

@jerabaul29
Copy link
Contributor Author

Ok, so if I understand well, the issue with the initial dataset without ncrename should disappear (which would be enough to more or less close this issue), but there is an issue with ncrename, right? I think this is where this function is developed:

https://github.com/nco/nco/blob/master/src/nco/ncrename.c

Do you think I should report this issue on their repo at https://github.com/nco/nco for them to look into? :)

@keewis
Copy link
Collaborator

keewis commented Aug 18, 2023

I don't know a lot about nco, but given that netcdf-python also seems to be affected this might be a bug in netcdf-c. In any case, opening an issue on the nco repository should help figuring out what exactly is going wrong.

@jerabaul29
Copy link
Contributor Author

So nco confirmed this is a bug with the netcdf library, and offered a workaround. Looks like the maintainers of the netcdf library do not have the resources to fix the issue at the moment.

Then I guess that this answer from nco, + the fact that a recent pull request fixes this, means that we can close this issue, right? Or am I forgetting something? :)

@dcherian dcherian added bug upstream issue needs triage Issue that has not been reviewed by xarray team member and removed bug needs triage Issue that has not been reviewed by xarray team member labels Aug 20, 2023
@dcherian
Copy link
Contributor

Thanks for following up on the NCO repository @jerabaul29

the fact that a recent pull request fixes this, means that we can close this issue, right? Or am I forgetting something? :)

Yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug needs triage Issue that has not been reviewed by xarray team member upstream issue
Projects
None yet
Development

No branches or pull requests

3 participants