Skip to content

Commit

Permalink
Handle IPSL-CM6 (the feature won't actually work without #1124)
Browse files Browse the repository at this point in the history
  • Loading branch information
senesis committed May 31, 2021
1 parent b2d3642 commit 8fc038f
Show file tree
Hide file tree
Showing 10 changed files with 628 additions and 83 deletions.
133 changes: 89 additions & 44 deletions doc/develop/fixing_data.rst
Original file line number Diff line number Diff line change
@@ -1,33 +1,40 @@
.. _fixing_data:

***********
Dataset fix
***********

Some (model) datasets contain (known) errors that would normally prevent them
from being processed correctly by the ESMValCore. The errors can be in
the metadata describing the dataset and/or in the actual data.
Typical examples of such errors are missing or wrong attributes (e.g.
attribute ''units'' says 1e-9 but data are actually in 1e-6), missing or
mislabeled coordinates (e.g. ''lev'' instead of ''plev'' or missing
******************************************
Adapting to data sources
******************************************

The baseline case for ESMValTool input data is CMOR fully compliant
data that is read using Iris load fuction. ESMValTool also allows for
some departures with compliance (see
:ref:`cmor_check_strictness`). Beyond that situation, some datasets
(either model or observations) contain (known) errors that would
normally prevent them from being processed. The issues can be in the
metadata describing the dataset and/or in the actual data. Typical
examples of such errors are missing or wrong attributes (e.g.
attribute ''units'' says 1e-9 but data are actually in 1e-6), missing
or mislabeled coordinates (e.g. ''lev'' instead of ''plev'' or missing
coordinate bounds like ''lat_bnds'') or problems with the actual data
(e.g. cloud liquid water only instead of sum of liquid + ice as specified by the CMIP data request).
(e.g. cloud liquid water only instead of sum of liquid + ice as
specified by the CMIP data request).

The ESMValCore can apply on the fly fixes to datasets that have
known errors that can be fixed automatically.

.. note::
**CMORization as a fix**.
Support for many observational and reanalysis datasets is implemented through
:ref:`CMORizer scripts in the ESMValTool <esmvaltool:new-dataset>`.
However, it is also possible to add support for a dataset that is not part of
a CMIP data request by implementing fixes for it.
This is particularly useful for large datasets, where keeping a copy of both
the original and CMORized dataset is not feasible.
See `Natively supported non-CMIP datasets`_ for a list of currently supported
datasets.
As an extreme case, some others data sources simply are not NetCDF
files and must go through other data load function.

The ESMValCore can apply on the fly fixes to such datasets when
issues can be fixed automatically. This is implemented for a set
of `Natively supported non-CMIP datasets`_. The following provide
details on how to design such fixes.

.. note::

**CMORizer scripts**. Support for many observational and reanalysis
datasets is also possible through a priori reformating by
:ref:`CMORizer scripts in the ESMValTool <esmvaltool:new-dataset>`,
which are rather relevant for datasets of small volume

.. _fix_structure:

Fix structure
=============

Expand Down Expand Up @@ -326,30 +333,68 @@ strictness to the highest:
Natively supported non-CMIP datasets
====================================
Fixed datasets are supported through the ``native6`` project.
Put the files containing the data in the directory that you have configured
for the ``native6`` project in your :ref:`user configuration file`, in a
subdirectory called ``Tier{tier}/{dataset}/{version}/{frequency}/{short_name}``.
Replace the items in curly braces by the values used in the variable/dataset
definition in the :ref:`recipe <recipe_overview>`.
Below is a list of datasets currently supported.
Some fixed datasets and native models formats are supported through
the ``native6`` project.
ERA5
----
.. _fixing_native_models:
- Supported variables: ``clt``, ``evspsbl``, ``evspsblpot``, ``mrro``, ``pr``, ``prsn``, ``ps``, ``psl``, ``ptype``, ``rls``, ``rlds``, ``rsds``, ``rsdt``, ``rss``, ``uas``, ``vas``, ``tas``, ``tasmax``, ``tasmin``, ``tdps``, ``ts``, ``tsn`` (``E1hr``/``Amon``), ``orog`` (``fx``)
- Tier: 3
Native models : IPSL-CM6,...
-----------------------------
MSWEP
-----
The following models are natively supported through the procedure
described above (:ref:`fix_structure`) and at
:ref:`configure_native_models`:
- Supported variables: ``pr``
- Supported frequencies: ``mon``, ``day``, ``3hr``.
- Tier: 3
- **IPSL-CM6** : both output formats (i.e. the ``Output`` and the
``Analyse / Time series`` formats) are supported, and should be
configured in recipes as e.g.:
For example for monthly data, place the files in the ``/Tier3/MSWEP/latestversion/mon/pr`` subdirectory of your ``native6`` project location.
.. code-block:: yaml
.. note::
For monthly data (V220), the data must be postfixed with the date, i.e. rename ``global_monthly_050deg.nc`` to ``global_monthly_050deg_197901-201710.nc``
datasets:
- {simulation: CM61-LR-hist-03.1950, exp: piControl, freq: Analyse/TS_MO,
account: p86caub, status: PROD, dataset: IPSL-CM6, project: native6 }
- {simulation: CM61-LR-hist-03.1950, exp: historical, freq: Output/MO,
account: p86caub, status: PROD, dataset: IPSL-CM6, project: native6 }
The ``Output`` format is an example of a case where variables are
grouped in multi-variable files, which name cannot be computed
directly from datasets attributes alone but requires a mapping
file. These multi-variable files must also undergo some data
selection, which may involve an external process for performance
purpose.
The ``config-developer.yaml`` section for configuring IPSL-CM6 is
:ref:`illustrated here <example_IPSL_config>`
ERA5 and MSWEP datasets
-----------------------
Put the files containing the data in the
directory that you have configured for the ``native6`` project in your
:ref:`user configuration file`, in a subdirectory called
``Tier{tier}/{dataset}/{version}/{frequency}/{short_name}``. Replace
the items in curly braces by the values used in the variable/dataset
definition in the :ref:`recipe <recipe_overview>`. Below is a list of
datasets currently supported :
- **ERA5**
- Supported variables: ``clt``, ``evspsbl``, ``evspsblpot``, ``mrro``, ``pr``, ``prsn``, ``ps``, ``psl``, ``ptype``, ``rls``, ``rlds``, ``rsds``, ``rsdt``, ``rss``, ``uas``, ``vas``, ``tas``, ``tasmax``, ``tasmin``, ``tdps``, ``ts``, ``tsn`` (``E1hr``/``Amon``), ``orog`` (``fx``)
- Tier: 3
- **MSWEP**
- Supported variables: ``pr``
- Supported frequencies: ``mon``, ``day``, ``3hr``.
- Tier: 3
For example for monthly data, place the files in the ``/Tier3/MSWEP/latestversion/mon/pr`` subdirectory of your ``native6`` project location.
.. note::
For monthly data (V220), the data must be postfixed with the date, i.e. rename ``global_monthly_050deg.nc`` to ``global_monthly_050deg_197901-201710.nc``
For more info: http://www.gloh2o.org/
For more info: http://www.gloh2o.org/
2 changes: 1 addition & 1 deletion doc/develop/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,5 @@ features.
:maxdepth: 1

Preprocessor function <preprocessor_function>
Dataset fix <fixing_data>
Adapting to data sources <fixing_data>
Deriving a variable <derivation>
94 changes: 88 additions & 6 deletions doc/quickstart/configure.rst
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ with explanations in a commented line above each option:
OBS: ~/obs_inputpath
default: ~/default_inputpath
# Directory structure for input data: [default]/BADC/DKRZ/ETHZ/etc
# Directory structure for input data: [default]/BADC/DKRZ/ETHZ/IPSL/etc
# See config-developer.yml for definitions.
drs:
CMIP5: default
Expand Down Expand Up @@ -176,8 +176,10 @@ It will be installed along with ESMValCore and can also be viewed on GitHub:
`esmvalcore/config-developer.yml
<https://github.com/ESMValGroup/ESMValCore/blob/main/esmvalcore/config-developer.yml>`_.
This configuration file describes the file system structure and CMOR tables for several
key projects (CMIP6, CMIP5, obs4mips, OBS6, OBS) on several key machines (e.g. BADC, CP4CDS, DKRZ,
ETHZ, SMHI, BSC). CMIP data is stored as part of the Earth System Grid
key projects (CMIP6, CMIP5, obs4mips, OBS6, OBS) on several key machines (e.g.
BADC, CP4CDS, DKRZ, ETHZ, SMHI, BSC, IPSL), and for native output data for some
models (IPSL, ... see :ref:`configure_native_models`) .
CMIP data is stored as part of the Earth System Grid
Federation (ESGF) and the standards for file naming and paths to files are set
out by CMOR and DRS. For a detailed description of these standards and their
adoption in ESMValCore, we refer the user to :ref:`CMOR-DRS` section where we
Expand Down Expand Up @@ -260,9 +262,33 @@ your data please see :ref:`CMOR-DRS`.
Preprocessor output files
-------------------------

The filename to use for preprocessed data is configured in a similar manner
using ``output_file``. Note that the extension ``.nc`` (and if applicable,
a start and end time) will automatically be appended to the filename.
The filename to use for preprocessed data is configured in a similar
manner using ``output_file``, which can be either a single value or a
dictionnary of values.

This latter case is useful for projects which gather much varied cases
with varied set of dataset attributes, such as the native6 project :

.. _example_IPSL_config:

.. code-block:: yaml
native6:
...
input_dir:
default: 'Tier{tier}/{dataset}/{latestversion}/{frequency}/{short_name}'
IPSL: '{account}/{model}/{status}/{exp}/{simulation}/{igcm_dir}/Analyse/{freq}'
input_file:
default: '*.nc'
IPSL:'{simulation}_*_{ipsl_varname}.nc'
output_file:
default: '{project}_{dataset}_{type}_{version}_{mip}_{short_name}'
IPSL: '{account}_{model}_{status}_{exp}_{simulation}_{short_name}'
...
Note that the extension ``.nc`` (and if applicable, a start and end
time) will automatically be appended to the filename.

.. _cmor_table_configuration:

Expand All @@ -289,6 +315,62 @@ related to CMOR table settings available:
to get the name of the file containing the ``mip`` table.
Defaults to the value provided in ``cmor_type``.

.. _configure_native_models:

Configuring native models and observation data sets
----------------------------------------------------

ESMValTool can take full advantage of the ability to configure
ESMValCore for handling native model output formats and specific
observation data sets without preliminary reformating. Such a
configuration involves the following steps :

- allowing for ESMValTool to locate the data files :

- entry ``native6`` of ``config-developer.yml`` should be
complemented with sub-entries for ``input_dir``, ``input_file``
and ``output_file`` that goes under a new key representing the
data organization (such as ``IPSL``), and these sub-entries can
use an arbitrary list of ``{placeholders}``. Example :

.. code-block:: yaml
native6:
cmor_strict: false
input_dir:
default: 'Tier{tier}/{dataset}/{latestversion}/{frequency}/{short_name}'
IPSL: '{account}/{model}/{status}/{exp}/{simulation}/{dir}/{freq}'
input_file:
default: '*.nc'
IPSL:
- '{simulation}_*_{ipsl_varname}.nc'
- '{simulation}_*_{group}.nc'
output_file:
default: '{project}_{dataset}_{type}_{version}_{mip}_{short_name}'
IPSL: '{account}_{model}_{status}_{exp}_{simulation}_{freq}_{short_name}'
cmor_type: 'CMIP6'
cmor_default_table_prefix: 'CMIP6_'

- if necessary, provide a so-called ``mapping file`` which allows
to associate a given variable short_name used in a recipe, such
as ``tas``, with a dictionnary of placeholder values; these
values will be used at run time, with ``input_dir`` and
``input_file`` patterns, to compute the actual filename to load
for that variable; such a file is looked for under pattern
``native6-*.yml`` at two places : in the source code, at
``ESMValCore/esmvalcore/_config/variable_details/`` and in user
space, at ``~/.esmvaltool/variable_details``. See here
:download:`an example of such a file for IPSL-CM6
<../../esmvalcore/_config/variable_details/native6-ipsl-cm6-mappings.yml>`.
All such files in these two places are sorted and loaded in
sequence, first for the code location, second for the
user-space location

- ensuring that ESMValTool get the right metadata and data out of
your data files : this is described at :ref:`fixing_data`


.. _config-ref:

References configuration file
Expand Down
68 changes: 39 additions & 29 deletions doc/quickstart/find_data.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
.. _findingdata:

************
Finding data
Input data
************

Overview
Expand All @@ -15,10 +15,13 @@ the right data. We will detail below the data finding and retrieval process and
the input the user needs to specify, giving examples on how to use the data
finding routine under different scenarios.

Data types
==========

.. _CMOR-DRS:

CMIP data - CMOR Data Reference Syntax (DRS) and the ESGF
=========================================================
CMIP data
---------------------------------------------------------
CMIP data is widely available via the Earth System Grid Federation
(`ESGF <https://esgf.llnl.gov/>`_) and is accessible to users either
via download from the ESGF portal or through the ESGF data nodes hosted
Expand All @@ -45,6 +48,39 @@ From the ESMValTool user perspective the number of data input parameters is
optimized to allow for ease of use. We detail this procedure in the next
section.

Native model data
---------------------------------------------------------
Support for native model data is quite easy using basic
:ref:`ESMValCore fix procedure <fixing_data>` and is yet implemented
for some models :ref:`as described here <fixing_native_models>`

Observational data
---------------------------------------------------------
Part of observational data is retrieved in the same manner as CMIP data, for example
using the ``OBS`` root path set to:

.. code-block:: yaml
OBS: /gws/nopw/j04/esmeval/obsdata-v2
and the dataset:

.. code-block:: yaml
- {dataset: ERA-Interim, project: OBS, type: reanaly, version: 1, start_year: 2014, end_year: 2015, tier: 3}
in ``recipe.yml`` in ``datasets`` or ``additional_datasets``, the rules set in
CMOR-DRS_ are used again and the file will be automatically found:

.. code-block::
/gws/nopw/j04/esmeval/obsdata-v2/Tier3/ERA-Interim/OBS_ERA-Interim_reanaly_1_Amon_ta_201401-201412.nc
Since observational data are organized in Tiers depending on their level of
public availability, the ``default`` directory must be structured accordingly
with sub-directories ``TierX`` (``Tier1``, ``Tier2`` or ``Tier3``), even when
``drs: default``.

.. _data-retrieval:

Data retrieval
Expand Down Expand Up @@ -231,32 +267,6 @@ and finally, using the file naming definition from CMOR-DRS_ find the file:
.. _observations:

Observational data
==================
Observational data is retrieved in the same manner as CMIP data, for example
using the ``OBS`` root path set to:

.. code-block:: yaml
OBS: /gws/nopw/j04/esmeval/obsdata-v2
and the dataset:

.. code-block:: yaml
- {dataset: ERA-Interim, project: OBS, type: reanaly, version: 1, start_year: 2014, end_year: 2015, tier: 3}
in ``recipe.yml`` in ``datasets`` or ``additional_datasets``, the rules set in
CMOR-DRS_ are used again and the file will be automatically found:

.. code-block::
/gws/nopw/j04/esmeval/obsdata-v2/Tier3/ERA-Interim/OBS_ERA-Interim_reanaly_1_Amon_ta_201401-201412.nc
Since observational data are organized in Tiers depending on their level of
public availability, the ``default`` directory must be structured accordingly
with sub-directories ``TierX`` (``Tier1``, ``Tier2`` or ``Tier3``), even when
``drs: default``.

Data loading
============
Expand Down
2 changes: 1 addition & 1 deletion doc/quickstart/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Getting started

Installation <install>
Configuration <configure>
Finding data <find_data>
Input data <find_data>
Installed recipes <recipes>
Running <run>
Output <output>
Loading

0 comments on commit 8fc038f

Please sign in to comment.