open_mfdataset: Not a valid ID #5276

minhhg · 2021-05-07T05:34:02Z

I have about 601 NETCDF4 files saved using xarray. We try to use open_mfdataset to access these files. The main code calls this function many times.
At the first few calls, it works fine, after for a while it throw the following error message "RuntimeError: NetCDF: Not a valid ID"

def func(xpath, spec):
    doc = deepcopy(spec)
    with xr.open_mfdataset(xpath + "/*.nc", concat_dim='maturity') as data:	
            var_name=  list(data.data_vars)[0]
            ar = data[var_name]
            maturity = spec['maturity']
            ann = ar.cumsum(dim='maturity')
            ann = ann - 1
            ar1 = ann.sel(maturity=maturity)
            doc['data'] = ar1.load().values
    return doc

Environment:

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.8.final.0
python-bits: 64
OS: Linux
OS-release: 5.4.0-1047-aws
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: en_US.UTF-8

xarray: 0.11.0
pandas: 0.24.1
numpy: 1.15.4
scipy: 1.2.0
netCDF4: 1.4.2
h5netcdf: None
h5py: 2.9.0
Nio: None
zarr: None
cftime: 1.0.3.4
PseudonetCDF: None
rasterio: None
iris: None
bottleneck: 1.2.1
cyordereddict: None
dask: 1.1.1
distributed: 1.25.3
matplotlib: 3.0.2
cartopy: None
seaborn: 0.9.0
setuptools: 40.7.3
pip: 19.0.1
conda: None
pytest: 4.2.0
IPython: 7.1.1
sphinx: 1.8.4

This error also happens with xarray version 0.10.9

Error trace:

2021-05-05 09:28:19,911, DEBUG 7621, sim_io.py:483 - load_unique_document(), xpa
th=/home/ubuntu/runs/20210331_001/nominal_dfs/uk
2021-05-05 09:28:42,774, ERROR 7621, run_gov_ret.py:33 -             <module>(),
 Unknown error=NetCDF: Not a valid ID
Traceback (most recent call last):
  File "/home/ubuntu/dev/py36/python/ev/model/api3/run_gov_ret.py", line 31, in 
<module>
    res = govRet()
  File "/home/ubuntu/dev/py36/python/ev/model/api3/returns.py", line 56, in __ca
ll__
    decompose=self.decompose))
  File "/home/ubuntu/dev/py36/python/ev/model/returns/returnsGenerator.py", line
 70, in calc_returns
    dfs_data = self.mongo_dfs.get_data(mats=[1,mat,mat-1])
  File "/home/ubuntu/dev/py36/python/ev/model/api3/dfs.py", line 262, in get_dat
a
    record = self.mdb.load_unique_document(self.dfs_collection_name, spec)
  File "/home/ubuntu/dev/py36/python/ev/model/api3/sim_io.py", line 1109, in load_unique_document
  
    return self.collections[collection].load_unique_document(query, *args, **kwargs)
  File "/home/ubuntu/dev/py36/python/ev/model/api3/sim_io.py", line 501, in load_unique_document
    doc['data'] = ar1.load().values
  File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/xarray/core/dataarray.py", line 631, in load
    ds = self._to_temp_dataset().load(**kwargs)
  File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/xarray/core/dataset.py", line 494, in load
    evaluated_data = da.compute(*lazy_data.values(), **kwargs)
  File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/dask/base.py", line 398, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/dask/threaded.py", line 76, in get
    pack_exception=pack_exception, **kwargs)
    pack_exception=pack_exception, **kwargs)
  File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/dask/local
.py", line 459, in get_async
    raise_exception(exc, tb)
  File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/dask/compa
tibility.py", line 112, in reraise
    raise exc
  File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/dask/local
.py", line 230, in execute_task
    result = _execute_task(task, data)
  File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/dask/core.
py", line 119, in _execute_task
    return func(*args2)
  File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/dask/array
/core.py", line 82, in getter
    c = np.asarray(c)
  File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/numpy/core
/numeric.py", line 501, in asarray
    return array(a, dtype, copy=False, order=order)
  File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/xarray/cor
e/indexing.py", line 602, in __array__
    return np.asarray(self.array, dtype=dtype)
 File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/numpy/core/numeric.py", line 501, in asarray
    return array(a, dtype, copy=False, order=order)
  File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/xarray/core/indexing.py", line 508, in __array__
    return np.asarray(array[self.key], dtype=None)
  File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/xarray/backends/netCDF4_.py", line 64, in __getitem__
    self._getitem)
  File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/xarray/core/indexing.py", line 776, in explicit_indexing_adapter
    result = raw_indexing_method(raw_key.tuple)
  File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/xarray/backends/netCDF4_.py", line 76, in _getitem
    array = getitem(original_array, key)
  File "netCDF4/_netCDF4.pyx", line 4095, in netCDF4._netCDF4.Variable.__getitem__
  File "netCDF4/_netCDF4.pyx", line 3798, in netCDF4._netCDF4.Variable.shape.__get__
  File "netCDF4/_netCDF4.pyx", line 3746, in netCDF4._netCDF4.Variable._getdims
  File "netCDF4/_netCDF4.pyx", line 1754, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: Not a valid ID

The text was updated successfully, but these errors were encountered:

TomNicholas · 2021-05-07T16:32:08Z

Hi there - are you able to narrow down the problem a little bit for us? Can you get the same error by opening any one file individually with xarray.open_dataset()? What if you use the netCDF4 library to access a variable directly without xarray? Failing that can you give us a file/files for which this fails, so that we can reproduce the problem locally? Also, have you tried using a more recent version of netCDF4? The most recent version on PyPI seems to be 1.5.6, but you're using 1.4.2.

minhhg · 2021-05-10T15:41:56Z

I was able to generate a small code to re-produce the error.

import numpy as np
import xarray as xr
from copy import deepcopy
import random
import os
import logging
log = logging.getLogger(__name__)


def generate_data(xpath, m=10000,n=900,k=601):
    """

    :param m:
    :param n:
    :param k:
    :return:
    """
    coord1=dict(dim1=range(m), dim2=range(n))
    for mat in range(k):
        a = np.random.rand(m,n)
        file_name = "{:04d}.nc".format(mat)
        full_file_name = os.path.join(xpath, file_name)
        coord1['maturity'] = [mat]
        a = np.reshape(a,[m,n,1])
        ar = xr.DataArray(a, dims=['dim1', 'dim2', 'maturity'], coords=coord1)
        ar.to_netcdf(full_file_name)

def func(xpath, spec):
    log.debug("xpath=%s, spec=%s", xpath, spec)

    doc = deepcopy(spec)

    with xr.open_mfdataset(xpath + "/*.nc", concat_dim='maturity') as data:
            var_name=  list(data.data_vars)[0]
            ar = data[var_name]
            maturity = spec['maturity']
            ann = ar.cumsum(dim='maturity')
            ann = ann - 1
            ar1 = ann.sel(maturity=maturity)
            doc['data'] = ar1.load().values
    return doc


def repeat_access_data_generate_error(xpath, k=601, l=1000):
    """
    Dummy to generate error
    :return:
    """
    spec={}
    dum=0.0
    for _l1 in range(l):
        try:
            spec['maturity'] = np.random.randint(0,k,1)[0]
            doc = func(xpath, spec)
            #do dummy operation with doc
            dum =  (dum + np.sum(doc['data']))/2.0
        except Exception as err:
            log.error("spec=%s, err=%s", spec, err, exc_info=True)

    return dum


if __name__ == "__main__":
    FORMAT = '%(asctime)-15s, %(levelname)s %(process)d, %(filename)s:%(lineno)s - %(funcName)20s(), %(message)s'
    log_file_name = "errors.log"
    logging.basicConfig(filename=log_file_name, filemode='a+', level=logging.DEBUG, format=FORMAT)
    xpath="/home/ubuntu/runs/xrdata/" # folder to hold files
    # change values of m,n,k to re-produce the error
    generate_data(xpath,m=1000)
    repeat_access_data_generate_error(xpath)

minhhg · 2021-05-10T15:50:37Z

Hi there - are you able to narrow down the problem a little bit for us? Can you get the same error by opening any one file individually with xarray.open_dataset()? What if you use the netCDF4 library to access a variable directly without xarray? Failing that can you give us a file/files for which this fails, so that we can reproduce the problem locally? Also, have you tried using a more recent version of netCDF4? The most recent version on PyPI seems to be 1.5.6, but you're using 1.4.2.

As mentioned above, the error only happens after successful call the func for a few times, i.e, first few calls it works fine as expected.
"Can you get the same error by opening any one file individually with `xarray.open_dataset()"
Yes, I can. But the whole point is to use open_mfdataset.

max-sixty · 2022-04-09T15:49:50Z

We didn't do a good job answering this (thank you @TomNicholas for responding), but likely the moment has passed, so I'll close. Please reopen with a MCVE if there's still an issue.

max-sixty closed this as completed Apr 9, 2022

tomvothecoder mentioned this issue Oct 30, 2023

[Bug]: RuntimeError: NetCDF: Not a valid ID appears randomly when running operations on the same datasets repeatedly xCDAT/xcdat#561

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

open_mfdataset: Not a valid ID #5276

open_mfdataset: Not a valid ID #5276

minhhg commented May 7, 2021 •

edited

Loading

INSTALLED VERSIONS

TomNicholas commented May 7, 2021

minhhg commented May 10, 2021 •

edited

Loading

minhhg commented May 10, 2021

max-sixty commented Apr 9, 2022

open_mfdataset: Not a valid ID #5276

open_mfdataset: Not a valid ID #5276

Comments

minhhg commented May 7, 2021 • edited Loading

INSTALLED VERSIONS

TomNicholas commented May 7, 2021

minhhg commented May 10, 2021 • edited Loading

minhhg commented May 10, 2021

max-sixty commented Apr 9, 2022

minhhg commented May 7, 2021 •

edited

Loading

minhhg commented May 10, 2021 •

edited

Loading