Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ECMWF Forecast generated Grib data cannot be converted to NetCDF #232

Open
whatnick opened this issue May 16, 2021 · 4 comments
Open

ECMWF Forecast generated Grib data cannot be converted to NetCDF #232

whatnick opened this issue May 16, 2021 · 4 comments

Comments

@whatnick
Copy link

I am using cfgrib with ecCodes v2.21.0 see below.

cfgrib selfcheck
Found: ecCodes v2.21.0.
Your system is ready.

And getting the following error converting ECMWF forecast data I have to NetCDF.

cfgrib to_netcdf data/A1D05011200050812001 
Traceback (most recent call last):
  File "/home/whatnick/.local/bin/cfgrib", line 8, in <module>
    sys.exit(cfgrib_cli())
  File "/usr/lib/python3/dist-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3/dist-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3/dist-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3/dist-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/home/whatnick/.local/lib/python3.6/site-packages/cfgrib/__main__.py", line 61, in to_netcdf
    ds = xr.open_dataset(inpaths[0], engine=engine)  # type: ignore
  File "/usr/local/lib/python3.6/dist-packages/xarray/backends/api.py", line 517, in open_dataset
    filename_or_obj, lock=lock, **backend_kwargs
  File "/usr/local/lib/python3.6/dist-packages/xarray/backends/cfgrib_.py", line 43, in __init__
    self.ds = cfgrib.open_file(filename, **backend_kwargs)
  File "/home/whatnick/.local/lib/python3.6/site-packages/cfgrib/dataset.py", line 713, in open_file
    index, read_keys=read_keys, time_dims=time_dims, extra_coords=extra_coords, **kwargs
  File "/home/whatnick/.local/lib/python3.6/site-packages/cfgrib/dataset.py", line 668, in build_dataset_components
    attributes = build_dataset_attributes(index, filter_by_keys, encoding)
  File "/home/whatnick/.local/lib/python3.6/site-packages/cfgrib/dataset.py", line 592, in build_dataset_attributes
    attributes = enforce_unique_attributes(index, GLOBAL_ATTRIBUTES_KEYS, filter_by_keys)
  File "/home/whatnick/.local/lib/python3.6/site-packages/cfgrib/dataset.py", line 273, in enforce_unique_attributes
    raise DatasetBuildError("multiple values for key %r" % key, key, fbks)
cfgrib.dataset.DatasetBuildError: multiple values for key 'edition'
@iainrussell
Copy link
Member

This is an interesting one! It seems potentially useful to store the original GRIB edition in the NetCDF metadata, and as a global variable it needs to be unique, so I see the problem. On the other hand , it's not essential to have it, so I wonder if there's a way for a user to remove a key from GLOBAL_ATTRIBUTES_KEYS if this is the only thing that's preventing the NetCDF from being generated?

@whatnick
Copy link
Author

I have made some progress after a bunch of searching and reading the docs. Copied to another Grib

grib_copy data/A1S05031800050618001 data_[typeOfLevel].grb

Verified there was only one file in the collection. Then listed contents of it

grib_ls data_surface.grb
data_surface.grb
edition      centre       typeOfLevel  level        dataDate     stepRange    dataType     shortName    packingType  gridType     
1            ecmf         surface      0            20210503     72           fc           i10fg        grid_simple  regular_ll  
1            ecmf         surface      0            20210503     72           fc           cp           grid_simple  regular_ll  
1            ecmf         surface      0            20210503     72           fc           100u         grid_simple  regular_ll  
1            ecmf         surface      0            20210503     72           fc           100v         grid_simple  regular_ll  
1            ecmf         surface      0            20210503     72           fc           lsp          grid_simple  regular_ll  
2            ecmf         surface      0            20210503     72           fc           ptype        grid_simple  regular_ll  
2            ecmf         surface      0            20210503     72           fc           tprate       grid_simple  regular_ll  
1            ecmf         surface      0            20210503     72           fc           hwbt0        grid_simple  regular_ll  
1            ecmf         surface      0            20210503     72           fc           hcct         grid_simple  regular_ll  
1            ecmf         surface      0            20210503     72           fc           crr          grid_simple  regular_ll  
1            ecmf         surface      0            20210503     72           fc           lsrr         grid_simple  regular_ll  
1            ecmf         surface      0            20210503     69-72        fc           mxtpr3       grid_simple  regular_ll  
1            ecmf         surface      0            20210503     69-72        fc           mntpr3       grid_simple  regular_ll  
13 of 13 messages in data_surface.grb

13 of 13 total messages in 1 files

I am mostly interested in CRR and LSRR for my use cases.

@b8raoult
Copy link
Contributor

Created a pull request

@din14970
Copy link

din14970 commented Dec 1, 2021

I have a probably related issue for data coming from the archived forecasts:

>>> ds = xarray.open_dataset(filename)

---------------------------------------------------------------------------
DatasetBuildError                         Traceback (most recent call last)
/tmp/ipykernel_9992/3594020638.py in <module>
      1 filename = "test_file_atmosphere_all.grib"
----> 2 ds = xarray.open_dataset(filename, backend_kwargs={'errors': 'ignore', 'filter_by_keys': {"typeOfLevel": "surface"}})

~/miniconda3/envs/env/lib/python3.8/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, backend_kwargs, *args, **kwargs)
    495 
    496     overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 497     backend_ds = backend.open_dataset(
    498         filename_or_obj,
    499         drop_variables=drop_variables,

~/miniconda3/envs/env/lib/python3.8/site-packages/cfgrib/xarray_plugin.py in open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, lock, indexpath, filter_by_keys, read_keys, encode_cf, squeeze, time_dims, errors, extra_coords)
     98     ) -> xr.Dataset:
     99 
--> 100         store = CfGribDataStore(
    101             filename_or_obj,
    102             indexpath=indexpath,

~/miniconda3/envs/env/lib/python3.8/site-packages/cfgrib/xarray_plugin.py in __init__(self, filename, lock, **backend_kwargs)
     38             lock = ECCODES_LOCK
     39         self.lock = xr.backends.locks.ensure_lock(lock)  # type: ignore
---> 40         self.ds = dataset.open_file(filename, **backend_kwargs)
     41 
     42     def open_store_variable(self, var: dataset.Variable,) -> xr.Variable:

~/miniconda3/envs/env/lib/python3.8/site-packages/cfgrib/dataset.py in open_file(path, grib_errors, indexpath, filter_by_keys, read_keys, time_dims, extra_coords, **kwargs)
    717     index = open_fileindex(path, grib_errors, indexpath, index_keys, filter_by_keys=filter_by_keys)
    718     return Dataset(
--> 719         *build_dataset_components(
    720             index, read_keys=read_keys, time_dims=time_dims, extra_coords=extra_coords, **kwargs
    721         )

~/miniconda3/envs/env/lib/python3.8/site-packages/cfgrib/dataset.py in build_dataset_components(index, errors, encode_cf, squeeze, log, read_keys, time_dims, extra_coords)
    673         "encode_cf": encode_cf,
    674     }
--> 675     attributes = build_dataset_attributes(index, filter_by_keys, encoding)
    676     return dimensions, variables, attributes, encoding
    677 

~/miniconda3/envs/env/lib/python3.8/site-packages/cfgrib/dataset.py in build_dataset_attributes(index, filter_by_keys, encoding)
    597 def build_dataset_attributes(index, filter_by_keys, encoding):
    598     # type: (messages.FileIndex, T.Dict[str, T.Any], T.Dict[str, T.Any]) -> T.Dict[str, T.Any]
--> 599     attributes = enforce_unique_attributes(index, GLOBAL_ATTRIBUTES_KEYS, filter_by_keys)
    600     attributes["Conventions"] = "CF-1.7"
    601     if "GRIB_centreDescription" in attributes:

~/miniconda3/envs/env/lib/python3.8/site-packages/cfgrib/dataset.py in enforce_unique_attributes(index, attributes_keys, filter_by_keys)
    271                 fbk.update(filter_by_keys)
    272                 fbks.append(fbk)
--> 273             raise DatasetBuildError("multiple values for key %r" % key, key, fbks)
    274         if values and values[0] not in ("undef", "unknown"):
    275             attributes["GRIB_" + key] = values[0]

DatasetBuildError: multiple values for key 'edition'

The file contains a bunch of data variables. If I query those variables individually and direct them to individual files there is no problem opening any of the files. I need to extract the data arrays, convert them to dataframes and write them somewhere else. Making a single request for all variables seems more efficient though. Is there any quickfix around this issue?

Versions

python: 3.8.6
cfgrib: 0.9.9.1
xarray: 0.19.0
eccodes: 2.23.0
eccodes (python): 1.3.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants