Skip to content

Commit

Permalink
Add support for native ERA5 data in GRIB format (#2178)
Browse files Browse the repository at this point in the history
Co-authored-by: Valeriu Predoi <valeriu.predoi@gmail.com>
Co-authored-by: Bouwe Andela <b.andela@esciencecenter.nl>
Co-authored-by: Bettina Gier <gier@uni-bremen.de>
  • Loading branch information
4 people authored Dec 6, 2024
1 parent 4c36a0c commit 65c7b28
Show file tree
Hide file tree
Showing 19 changed files with 1,296 additions and 197 deletions.
2 changes: 1 addition & 1 deletion doc/quickstart/configure.rst
Original file line number Diff line number Diff line change
Expand Up @@ -974,7 +974,7 @@ infrastructure. The following example illustrates the concept.
.. _extra-facets-example-1:

.. code-block:: yaml
:caption: Extra facet example file `native6-era5.yml`
:caption: Extra facet example file `native6-era5-example.yml`
ERA5:
Amon:
Expand Down
109 changes: 101 additions & 8 deletions doc/quickstart/find_data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -107,18 +107,27 @@ The following native reanalysis/observational datasets are supported under the
To use these datasets, put the files containing the data in the directory that
you have :ref:`configured <config_options>` for the ``rootpath`` of the
``native6`` project, in a subdirectory called
``Tier{tier}/{dataset}/{version}/{frequency}/{short_name}``.
``Tier{tier}/{dataset}/{version}/{frequency}/{short_name}`` (assuming you are
using the ``default`` DRS for ``native6``).
Replace the items in curly braces by the values used in the variable/dataset
definition in the :ref:`recipe <recipe_overview>`.
Below is a list of native reanalysis/observational datasets currently
supported.

.. _read_native_era5:
.. _read_native_era5_nc:

ERA5
^^^^
ERA5 (in netCDF format downloaded from the CDS)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

ERA5 data can be downloaded from the Copernicus Climate Data Store (CDS) using
the convenient tool `era5cli <https://era5cli.readthedocs.io>`__.
For example for monthly data, place the files in the
``/Tier3/ERA5/version/mon/pr`` subdirectory of your ``rootpath`` that you have
configured for the ``native6`` project (assuming you are using the ``default``
DRS for ``native6``).

- Supported variables: ``cl``, ``clt``, ``evspsbl``, ``evspsblpot``, ``mrro``, ``pr``, ``prsn``, ``ps``, ``psl``, ``ptype``, ``rls``, ``rlds``, ``rsds``, ``rsdt``, ``rss``, ``uas``, ``vas``, ``tas``, ``tasmax``, ``tasmin``, ``tdps``, ``ts``, ``tsn`` (``E1hr``/``Amon``), ``orog`` (``fx``)
- Supported variables: ``cl``, ``clt``, ``evspsbl``, ``evspsblpot``, ``mrro``,
``pr``, ``prsn``, ``ps``, ``psl``, ``ptype``, ``rls``, ``rlds``, ``rsds``,
``rsdt``, ``rss``, ``uas``, ``vas``, ``tas``, ``tasmax``, ``tasmin``,
``tdps``, ``ts``, ``tsn`` (``E1hr``/``Amon``), ``orog`` (``fx``).
- Tier: 3

.. note:: According to the description of Evapotranspiration and potential Evapotranspiration on the Copernicus page
Expand All @@ -131,6 +140,85 @@ ERA5
of both liquid and solid phases to vapor (from underlying surface and vegetation)."
Therefore, the ERA5 (and ERA5-Land) CMORizer switches the signs of ``evspsbl`` and ``evspsblpot`` to be compatible with the CMOR standard used e.g. by the CMIP models.

.. _read_native_era5_grib:

ERA5 (in GRIB format available on DKRZ's Levante or downloaded from the CDS)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

ERA5 data in monthly, daily, and hourly resolution is `available on Levante
<https://docs.dkrz.de/doc/dataservices/finding_and_accessing_data/era_data/index.html#era-data>`__
in its native GRIB format.

.. note::
ERA5 data in its native GRIB format can also be downloaded from the
`Copernicus Climate Data Store (CDS)
<https://cds.climate.copernicus.eu/datasets>`__.
For example, hourly data on pressure levels is available `here
<https://cds.climate.copernicus.eu/datasets/reanalysis-era5-pressure-levels?tab=download>`__.
Reading self-downloaded ERA5 data in GRIB format is experimental and likely
requires additional setup from the user like setting up the proper directory
structure for the input files and/or creating a custom :ref:`DRS
<config_option_drs>`.

To read these data with ESMValCore, use the :ref:`rootpath
<config_option_rootpath>` ``/pool/data/ERA5`` with :ref:`DRS
<config_option_drs>` ``DKRZ-ERA5-GRIB`` in your configuration, for example:

.. code-block:: yaml
rootpath:
...
native6:
/pool/data/ERA5: DKRZ-ERA5-GRIB
...
The `naming conventions
<https://docs.dkrz.de/doc/dataservices/finding_and_accessing_data/era_data/index.html#file-and-directory-names>`__
for input directories and files for native ERA5 data in GRIB format on Levante
are

* input directories: ``{family}/{level}/{type}/{tres}/{grib_id}``
* input files: ``{family}{level}{typeid}_{tres}_*_{grib_id}.grb``

All of these facets have reasonable defaults preconfigured in the corresponding
:ref:`extra facets<extra_facets>` file, which is available here:
:download:`native6-era5.yml
</../esmvalcore/config/extra_facets/native6-era5.yml>`.
If necessary, these facets can be overwritten in the recipe.

Thus, example dataset entries could look like this:

.. code-block:: yaml
datasets:
- {project: native6, dataset: ERA5, timerange: '2000/2001',
short_name: tas, mip: Amon}
- {project: native6, dataset: ERA5, timerange: '2000/2001',
short_name: cl, mip: Amon, tres: 1H, frequency: 1hr}
- {project: native6, dataset: ERA5, timerange: '2000/2001',
short_name: ta, mip: Amon, type: fc, typeid: '12'}
The native ERA5 output in GRIB format is stored on a `reduced Gaussian grid
<https://confluence.ecmwf.int/display/CKB/ERA5:+data+documentation#ERA5:datadocumentation-SpatialgridSpatialGrid>`__.
By default, these data are regridded to a regular 0.25°x0.25° grid as
`recommended by the ECMWF
<https://confluence.ecmwf.int/display/CKB/ERA5%3A+What+is+the+spatial+reference#heading-Interpolation>`__
using bilinear interpolation.

To disable this, you can use the facet ``automatic_regrid: false`` in the
recipe:

.. code-block:: yaml
datasets:
- {project: native6, dataset: ERA5, timerange: '2000/2001',
short_name: tas, mip: Amon, automatic_regrid: false}
- Supported variables: ``albsn``, ``cl``, ``cli``, ``clt``, ``clw``, ``hur``,
``hus``, ``o3``, ``prw``, ``ps``, ``psl``, ``rainmxrat27``, ``sftlf``,
``snd``, ``snowmxrat27``, ``ta``, ``tas``, ``tdps``, ``toz``, ``ts``, ``ua``,
``uas``, ``va``, ``vas``, ``wap``, ``zg``.

.. _read_native_mswep:

MSWEP
Expand All @@ -140,7 +228,10 @@ MSWEP
- Supported frequencies: ``mon``, ``day``, ``3hr``.
- Tier: 3

For example for monthly data, place the files in the ``/Tier3/MSWEP/version/mon/pr`` subdirectory of your ``native6`` project location.
For example for monthly data, place the files in the
``/Tier3/MSWEP/version/mon/pr`` subdirectory of your ``rootpath`` that you have
configured for the ``native6`` project (assuming you are using the ``default``
DRS for ``native6``).

.. note::
For monthly data (``V220``), the data must be postfixed with the date, i.e. rename ``global_monthly_050deg.nc`` to ``global_monthly_050deg_197901-201710.nc``
Expand Down Expand Up @@ -642,6 +733,8 @@ first discuss the ``drs`` parameter: as we've seen in the previous section, the
DRS as a standard is used for both file naming conventions and for directory
structures.

.. _config_option_drs:

Explaining ``drs: CMIP5:`` or ``drs: CMIP6:``
---------------------------------------------
Whereas ESMValCore will by default use the CMOR standard for file naming (please
Expand Down
8 changes: 5 additions & 3 deletions esmvalcore/_provenance.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
import logging
import os
from functools import total_ordering
from pathlib import Path

from netCDF4 import Dataset
from PIL import Image
Expand Down Expand Up @@ -209,9 +210,10 @@ def _initialize_entity(self):
"""Initialize the entity representing the file."""
if self.attributes is None:
self.attributes = {}
with Dataset(self.filename, "r") as dataset:
for attr in dataset.ncattrs():
self.attributes[attr] = dataset.getncattr(attr)
if "nc" in Path(self.filename).suffix:
with Dataset(self.filename, "r") as dataset:
for attr in dataset.ncattrs():
self.attributes[attr] = dataset.getncattr(attr)

attributes = {
"attribute:" + str(k).replace(" ", "_"): str(v)
Expand Down
30 changes: 30 additions & 0 deletions esmvalcore/_recipe/recipe.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@
PreprocessorFile,
)
from esmvalcore.preprocessor._area import _update_shapefile_path
from esmvalcore.preprocessor._io import GRIB_FORMATS
from esmvalcore.preprocessor._multimodel import _get_stat_identifier
from esmvalcore.preprocessor._regrid import (
_spec_to_latlonvals,
Expand Down Expand Up @@ -230,6 +231,34 @@ def _get_default_settings(dataset):
return settings


def _add_dataset_specific_settings(dataset: Dataset, settings: dict) -> None:
"""Add dataset-specific settings."""
project = dataset.facets["project"]
dataset_name = dataset.facets["dataset"]
file_suffixes = [Path(file.name).suffix for file in dataset.files]

# Automatic regridding for native ERA5 data in GRIB format if regridding
# step is not already present (can be disabled with facet
# automatic_regrid=False)
if all(
[
project == "native6",
dataset_name == "ERA5",
any(grib_format in file_suffixes for grib_format in GRIB_FORMATS),
"regrid" not in settings,
dataset.facets.get("automatic_regrid", True),
]
):
# Settings recommended by ECMWF
# (https://confluence.ecmwf.int/display/CKB/ERA5%3A+What+is+the+spatial+reference#heading-Interpolation)
settings["regrid"] = {"target_grid": "0.25x0.25", "scheme": "linear"}
logger.debug(
"Automatically regrid native6 ERA5 data in GRIB format with the "
"settings %s",
settings["regrid"],
)


def _exclude_dataset(settings, facets, step):
"""Exclude dataset from specific preprocessor step if requested."""
exclude = {
Expand Down Expand Up @@ -546,6 +575,7 @@ def _get_preprocessor_products(
_apply_preprocessor_profile(settings, profile)
_update_multi_dataset_settings(dataset.facets, settings)
_update_preproc_functions(settings, dataset, datasets, missing_vars)
_add_dataset_specific_settings(dataset, settings)
check.preprocessor_supplementaries(dataset, settings)
input_datasets = _get_input_datasets(dataset)
missing = _check_input_files(input_datasets)
Expand Down
2 changes: 2 additions & 0 deletions esmvalcore/cmor/_fixes/fix.py
Original file line number Diff line number Diff line change
Expand Up @@ -845,6 +845,8 @@ def _fix_time_bounds(self, cube: Cube, cube_coord: Coord) -> None:
"""Fix time bounds."""
times = {"time", "time1", "time2", "time3"}
key = times.intersection(self.vardef.coordinates)
if not key: # cube has time, but CMOR variable does not
return
cmor = self.vardef.coordinates[" ".join(key)]
if cmor.must_have_bounds == "yes" and not cube_coord.has_bounds():
cube_coord.bounds = get_time_bounds(cube_coord, self.frequency)
Expand Down
Loading

0 comments on commit 65c7b28

Please sign in to comment.