Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

open_mfdataset: Not a valid ID #5276

Closed
minhhg opened this issue May 7, 2021 · 4 comments
Closed

open_mfdataset: Not a valid ID #5276

minhhg opened this issue May 7, 2021 · 4 comments

Comments

@minhhg
Copy link

minhhg commented May 7, 2021

I have about 601 NETCDF4 files saved using xarray. We try to use open_mfdataset to access these files. The main code calls this function many times.
At the first few calls, it works fine, after for a while it throw the following error message "RuntimeError: NetCDF: Not a valid ID"

def func(xpath, spec):
    doc = deepcopy(spec)
    with xr.open_mfdataset(xpath + "/*.nc", concat_dim='maturity') as data:	
            var_name=  list(data.data_vars)[0]
            ar = data[var_name]
            maturity = spec['maturity']
            ann = ar.cumsum(dim='maturity')
            ann = ann - 1
            ar1 = ann.sel(maturity=maturity)
            doc['data'] = ar1.load().values
    return doc

Environment:

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.8.final.0
python-bits: 64
OS: Linux
OS-release: 5.4.0-1047-aws
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: en_US.UTF-8

xarray: 0.11.0
pandas: 0.24.1
numpy: 1.15.4
scipy: 1.2.0
netCDF4: 1.4.2
h5netcdf: None
h5py: 2.9.0
Nio: None
zarr: None
cftime: 1.0.3.4
PseudonetCDF: None
rasterio: None
iris: None
bottleneck: 1.2.1
cyordereddict: None
dask: 1.1.1
distributed: 1.25.3
matplotlib: 3.0.2
cartopy: None
seaborn: 0.9.0
setuptools: 40.7.3
pip: 19.0.1
conda: None
pytest: 4.2.0
IPython: 7.1.1
sphinx: 1.8.4

This error also happens with xarray version 0.10.9

Error trace:

2021-05-05 09:28:19,911, DEBUG 7621, sim_io.py:483 - load_unique_document(), xpa
th=/home/ubuntu/runs/20210331_001/nominal_dfs/uk
2021-05-05 09:28:42,774, ERROR 7621, run_gov_ret.py:33 -             <module>(),
 Unknown error=NetCDF: Not a valid ID
Traceback (most recent call last):
  File "/home/ubuntu/dev/py36/python/ev/model/api3/run_gov_ret.py", line 31, in 
<module>
    res = govRet()
  File "/home/ubuntu/dev/py36/python/ev/model/api3/returns.py", line 56, in __ca
ll__
    decompose=self.decompose))
  File "/home/ubuntu/dev/py36/python/ev/model/returns/returnsGenerator.py", line
 70, in calc_returns
    dfs_data = self.mongo_dfs.get_data(mats=[1,mat,mat-1])
  File "/home/ubuntu/dev/py36/python/ev/model/api3/dfs.py", line 262, in get_dat
a
    record = self.mdb.load_unique_document(self.dfs_collection_name, spec)
  File "/home/ubuntu/dev/py36/python/ev/model/api3/sim_io.py", line 1109, in load_unique_document
  
    return self.collections[collection].load_unique_document(query, *args, **kwargs)
  File "/home/ubuntu/dev/py36/python/ev/model/api3/sim_io.py", line 501, in load_unique_document
    doc['data'] = ar1.load().values
  File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/xarray/core/dataarray.py", line 631, in load
    ds = self._to_temp_dataset().load(**kwargs)
  File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/xarray/core/dataset.py", line 494, in load
    evaluated_data = da.compute(*lazy_data.values(), **kwargs)
  File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/dask/base.py", line 398, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/dask/threaded.py", line 76, in get
    pack_exception=pack_exception, **kwargs)
    pack_exception=pack_exception, **kwargs)
  File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/dask/local
.py", line 459, in get_async
    raise_exception(exc, tb)
  File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/dask/compa
tibility.py", line 112, in reraise
    raise exc
  File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/dask/local
.py", line 230, in execute_task
    result = _execute_task(task, data)
  File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/dask/core.
py", line 119, in _execute_task
    return func(*args2)
  File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/dask/array
/core.py", line 82, in getter
    c = np.asarray(c)
  File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/numpy/core
/numeric.py", line 501, in asarray
    return array(a, dtype, copy=False, order=order)
  File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/xarray/cor
e/indexing.py", line 602, in __array__
    return np.asarray(self.array, dtype=dtype)
 File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/numpy/core/numeric.py", line 501, in asarray
    return array(a, dtype, copy=False, order=order)
  File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/xarray/core/indexing.py", line 508, in __array__
    return np.asarray(array[self.key], dtype=None)
  File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/xarray/backends/netCDF4_.py", line 64, in __getitem__
    self._getitem)
  File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/xarray/core/indexing.py", line 776, in explicit_indexing_adapter
    result = raw_indexing_method(raw_key.tuple)
  File "/home/ubuntu/miniconda3/envs/egan/lib/python3.6/site-packages/xarray/backends/netCDF4_.py", line 76, in _getitem
    array = getitem(original_array, key)
  File "netCDF4/_netCDF4.pyx", line 4095, in netCDF4._netCDF4.Variable.__getitem__
  File "netCDF4/_netCDF4.pyx", line 3798, in netCDF4._netCDF4.Variable.shape.__get__
  File "netCDF4/_netCDF4.pyx", line 3746, in netCDF4._netCDF4.Variable._getdims
  File "netCDF4/_netCDF4.pyx", line 1754, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: Not a valid ID
@TomNicholas
Copy link
Member

Hi there - are you able to narrow down the problem a little bit for us? Can you get the same error by opening any one file individually with xarray.open_dataset()? What if you use the netCDF4 library to access a variable directly without xarray? Failing that can you give us a file/files for which this fails, so that we can reproduce the problem locally? Also, have you tried using a more recent version of netCDF4? The most recent version on PyPI seems to be 1.5.6, but you're using 1.4.2.

@minhhg
Copy link
Author

minhhg commented May 10, 2021

I was able to generate a small code to re-produce the error.

import numpy as np
import xarray as xr
from copy import deepcopy
import random
import os
import logging
log = logging.getLogger(__name__)


def generate_data(xpath, m=10000,n=900,k=601):
    """

    :param m:
    :param n:
    :param k:
    :return:
    """
    coord1=dict(dim1=range(m), dim2=range(n))
    for mat in range(k):
        a = np.random.rand(m,n)
        file_name = "{:04d}.nc".format(mat)
        full_file_name = os.path.join(xpath, file_name)
        coord1['maturity'] = [mat]
        a = np.reshape(a,[m,n,1])
        ar = xr.DataArray(a, dims=['dim1', 'dim2', 'maturity'], coords=coord1)
        ar.to_netcdf(full_file_name)

def func(xpath, spec):
    log.debug("xpath=%s, spec=%s", xpath, spec)

    doc = deepcopy(spec)

    with xr.open_mfdataset(xpath + "/*.nc", concat_dim='maturity') as data:
            var_name=  list(data.data_vars)[0]
            ar = data[var_name]
            maturity = spec['maturity']
            ann = ar.cumsum(dim='maturity')
            ann = ann - 1
            ar1 = ann.sel(maturity=maturity)
            doc['data'] = ar1.load().values
    return doc


def repeat_access_data_generate_error(xpath, k=601, l=1000):
    """
    Dummy to generate error
    :return:
    """
    spec={}
    dum=0.0
    for _l1 in range(l):
        try:
            spec['maturity'] = np.random.randint(0,k,1)[0]
            doc = func(xpath, spec)
            #do dummy operation with doc
            dum =  (dum + np.sum(doc['data']))/2.0
        except Exception as err:
            log.error("spec=%s, err=%s", spec, err, exc_info=True)

    return dum


if __name__ == "__main__":
    FORMAT = '%(asctime)-15s, %(levelname)s %(process)d, %(filename)s:%(lineno)s - %(funcName)20s(), %(message)s'
    log_file_name = "errors.log"
    logging.basicConfig(filename=log_file_name, filemode='a+', level=logging.DEBUG, format=FORMAT)
    xpath="/home/ubuntu/runs/xrdata/" # folder to hold files
    # change values of m,n,k to re-produce the error
    generate_data(xpath,m=1000)
    repeat_access_data_generate_error(xpath)









@minhhg
Copy link
Author

minhhg commented May 10, 2021

Hi there - are you able to narrow down the problem a little bit for us? Can you get the same error by opening any one file individually with xarray.open_dataset()? What if you use the netCDF4 library to access a variable directly without xarray? Failing that can you give us a file/files for which this fails, so that we can reproduce the problem locally? Also, have you tried using a more recent version of netCDF4? The most recent version on PyPI seems to be 1.5.6, but you're using 1.4.2.

As mentioned above, the error only happens after successful call the func for a few times, i.e, first few calls it works fine as expected.
"Can you get the same error by opening any one file individually with `xarray.open_dataset()"
Yes, I can. But the whole point is to use open_mfdataset.

@max-sixty
Copy link
Collaborator

We didn't do a good job answering this (thank you @TomNicholas for responding), but likely the moment has passed, so I'll close. Please reopen with a MCVE if there's still an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants