-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
open_mfdataset: Not a valid ID #5276
Comments
Hi there - are you able to narrow down the problem a little bit for us? Can you get the same error by opening any one file individually with |
I was able to generate a small code to re-produce the error. import numpy as np
import xarray as xr
from copy import deepcopy
import random
import os
import logging
log = logging.getLogger(__name__)
def generate_data(xpath, m=10000,n=900,k=601):
"""
:param m:
:param n:
:param k:
:return:
"""
coord1=dict(dim1=range(m), dim2=range(n))
for mat in range(k):
a = np.random.rand(m,n)
file_name = "{:04d}.nc".format(mat)
full_file_name = os.path.join(xpath, file_name)
coord1['maturity'] = [mat]
a = np.reshape(a,[m,n,1])
ar = xr.DataArray(a, dims=['dim1', 'dim2', 'maturity'], coords=coord1)
ar.to_netcdf(full_file_name)
def func(xpath, spec):
log.debug("xpath=%s, spec=%s", xpath, spec)
doc = deepcopy(spec)
with xr.open_mfdataset(xpath + "/*.nc", concat_dim='maturity') as data:
var_name= list(data.data_vars)[0]
ar = data[var_name]
maturity = spec['maturity']
ann = ar.cumsum(dim='maturity')
ann = ann - 1
ar1 = ann.sel(maturity=maturity)
doc['data'] = ar1.load().values
return doc
def repeat_access_data_generate_error(xpath, k=601, l=1000):
"""
Dummy to generate error
:return:
"""
spec={}
dum=0.0
for _l1 in range(l):
try:
spec['maturity'] = np.random.randint(0,k,1)[0]
doc = func(xpath, spec)
#do dummy operation with doc
dum = (dum + np.sum(doc['data']))/2.0
except Exception as err:
log.error("spec=%s, err=%s", spec, err, exc_info=True)
return dum
if __name__ == "__main__":
FORMAT = '%(asctime)-15s, %(levelname)s %(process)d, %(filename)s:%(lineno)s - %(funcName)20s(), %(message)s'
log_file_name = "errors.log"
logging.basicConfig(filename=log_file_name, filemode='a+', level=logging.DEBUG, format=FORMAT)
xpath="/home/ubuntu/runs/xrdata/" # folder to hold files
# change values of m,n,k to re-produce the error
generate_data(xpath,m=1000)
repeat_access_data_generate_error(xpath)
|
As mentioned above, the error only happens after successful call the func for a few times, i.e, first few calls it works fine as expected. |
We didn't do a good job answering this (thank you @TomNicholas for responding), but likely the moment has passed, so I'll close. Please reopen with a MCVE if there's still an issue. |
I have about 601 NETCDF4 files saved using xarray. We try to use open_mfdataset to access these files. The main code calls this function many times.
At the first few calls, it works fine, after for a while it throw the following error message "RuntimeError: NetCDF: Not a valid ID"
Environment:
Output of xr.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.8.final.0
python-bits: 64
OS: Linux
OS-release: 5.4.0-1047-aws
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: en_US.UTF-8
xarray: 0.11.0
pandas: 0.24.1
numpy: 1.15.4
scipy: 1.2.0
netCDF4: 1.4.2
h5netcdf: None
h5py: 2.9.0
Nio: None
zarr: None
cftime: 1.0.3.4
PseudonetCDF: None
rasterio: None
iris: None
bottleneck: 1.2.1
cyordereddict: None
dask: 1.1.1
distributed: 1.25.3
matplotlib: 3.0.2
cartopy: None
seaborn: 0.9.0
setuptools: 40.7.3
pip: 19.0.1
conda: None
pytest: 4.2.0
IPython: 7.1.1
sphinx: 1.8.4
Error trace:
The text was updated successfully, but these errors were encountered: