Handle IPSL-CM6 (the feature won't actually work without #1124)

ESMValGroup · May 31, 2021 · 8fc038f · 8fc038f
1 parent b2d3642
commit 8fc038f
Show file tree

Hide file tree

Showing 10 changed files with 628 additions and 83 deletions.
diff --git a/doc/develop/fixing_data.rst b/doc/develop/fixing_data.rst
@@ -1,33 +1,40 @@
 .. _fixing_data:
 
-***********
-Dataset fix
-***********
-
-Some (model) datasets contain (known) errors that would normally prevent them
-from being processed correctly by the ESMValCore. The errors can be in
-the metadata describing the dataset and/or in the actual data.
-Typical examples of such errors are missing or wrong attributes (e.g.
-attribute ''units'' says 1e-9 but data are actually in 1e-6), missing or
-mislabeled coordinates (e.g. ''lev'' instead of ''plev'' or missing
+******************************************
+Adapting to data sources
+******************************************
+
+The baseline case for ESMValTool input data is CMOR fully compliant
+data that is read using Iris load fuction. ESMValTool also allows for
+some departures with compliance (see
+:ref:`cmor_check_strictness`). Beyond that situation, some datasets
+(either model or observations) contain (known) errors that would
+normally prevent them from being processed. The issues can be in the
+metadata describing the dataset and/or in the actual data.  Typical
+examples of such errors are missing or wrong attributes (e.g.
+attribute ''units'' says 1e-9 but data are actually in 1e-6), missing
+or mislabeled coordinates (e.g. ''lev'' instead of ''plev'' or missing
 coordinate bounds like ''lat_bnds'') or problems with the actual data
-(e.g. cloud liquid water only instead of sum of liquid + ice as specified by the CMIP data request).
+(e.g. cloud liquid water only instead of sum of liquid + ice as
+specified by the CMIP data request).
 
-The ESMValCore can apply on the fly fixes to datasets that have
-known errors that can be fixed automatically.
-
-.. note::
-  **CMORization as a fix**.
-  Support for many observational and reanalysis datasets is implemented through
-  :ref:`CMORizer scripts in the ESMValTool <esmvaltool:new-dataset>`.
-  However, it is also possible to add support for a dataset that is not part of
-  a CMIP data request by implementing fixes for it.
-  This is particularly useful for large datasets, where keeping a copy of both
-  the original and CMORized dataset is not feasible.
-  See `Natively supported non-CMIP datasets`_ for a list of currently supported
-  datasets.
+As an extreme case, some others data sources simply are not NetCDF
+files and must go through other data load function.
 
+The ESMValCore can apply on the fly fixes to such datasets when
+issues can be fixed automatically.  This is implemented for a set
+of `Natively supported non-CMIP datasets`_.  The following provide
+details on how to design such fixes.
 
+.. note::
+
+  **CMORizer scripts**.  Support for many observational and reanalysis
+  datasets is also possible through a priori reformating by
+  :ref:`CMORizer scripts in the ESMValTool <esmvaltool:new-dataset>`,
+  which are rather relevant for datasets of small volume
+
+.. _fix_structure:
+
 Fix structure
 =============
 
@@ -326,30 +333,68 @@ strictness to the highest:
 Natively supported non-CMIP datasets
 ====================================
 
-Fixed datasets are supported through the ``native6`` project.
-Put the files containing the data in the directory that you have configured
-for the ``native6`` project in your :ref:`user configuration file`, in a
-subdirectory called ``Tier{tier}/{dataset}/{version}/{frequency}/{short_name}``.
-Replace the items in curly braces by the values used in the variable/dataset
-definition in the :ref:`recipe <recipe_overview>`.
-Below is a list of datasets currently supported.
+Some fixed datasets and native models formats are supported through
+the ``native6`` project.
 
-ERA5
-----
+.. _fixing_native_models:
 
-- Supported variables: ``clt``, ``evspsbl``, ``evspsblpot``, ``mrro``, ``pr``, ``prsn``, ``ps``, ``psl``, ``ptype``, ``rls``, ``rlds``, ``rsds``, ``rsdt``, ``rss``, ``uas``, ``vas``, ``tas``, ``tasmax``, ``tasmin``, ``tdps``, ``ts``, ``tsn`` (``E1hr``/``Amon``), ``orog`` (``fx``)
-- Tier: 3
+Native models : IPSL-CM6,... 
+-----------------------------
 
-MSWEP
------
+The following models are natively supported through the procedure
+described above (:ref:`fix_structure`) and at
+:ref:`configure_native_models`:
 
-- Supported variables: ``pr``
-- Supported frequencies: ``mon``, ``day``, ``3hr``.
-- Tier: 3
+  - **IPSL-CM6** : both output formats (i.e. the ``Output`` and the
+    ``Analyse / Time series`` formats) are supported, and should be
+    configured in recipes as e.g.:
 
-For example for monthly data, place the files in the ``/Tier3/MSWEP/latestversion/mon/pr`` subdirectory of your ``native6`` project location.
+    .. code-block:: yaml
 
-.. note::
-  For monthly data (V220), the data must be postfixed with the date, i.e. rename ``global_monthly_050deg.nc`` to ``global_monthly_050deg_197901-201710.nc``
+      datasets:
+        - {simulation: CM61-LR-hist-03.1950, exp: piControl, freq: Analyse/TS_MO,
+           account: p86caub,  status: PROD, dataset: IPSL-CM6, project:  native6 } 
+        - {simulation: CM61-LR-hist-03.1950, exp: historical, freq: Output/MO,
+           account: p86caub,  status: PROD, dataset: IPSL-CM6, project:  native6 } 
+
+    The ``Output`` format is an example of a case where variables are
+    grouped in multi-variable files, which name cannot be computed
+    directly from datasets attributes alone but requires a mapping
+    file. These multi-variable files must also undergo some data
+    selection, which may involve an external process for performance
+    purpose.
+
+    The ``config-developer.yaml`` section for configuring IPSL-CM6 is
+    :ref:`illustrated here <example_IPSL_config>`
+
+    
+
+
+ERA5 and MSWEP datasets
+-----------------------
+Put the files containing the data in the
+directory that you have configured for the ``native6`` project in your
+:ref:`user configuration file`, in a subdirectory called
+``Tier{tier}/{dataset}/{version}/{frequency}/{short_name}``.  Replace
+the items in curly braces by the values used in the variable/dataset
+definition in the :ref:`recipe <recipe_overview>`.  Below is a list of
+datasets currently supported :
+
+  - **ERA5**
+
+      - Supported variables: ``clt``, ``evspsbl``, ``evspsblpot``, ``mrro``, ``pr``, ``prsn``, ``ps``, ``psl``, ``ptype``, ``rls``, ``rlds``, ``rsds``, ``rsdt``, ``rss``, ``uas``, ``vas``, ``tas``, ``tasmax``, ``tasmin``, ``tdps``, ``ts``, ``tsn`` (``E1hr``/``Amon``), ``orog`` (``fx``)
+      - Tier: 3
+
+  - **MSWEP**
+
+      - Supported variables: ``pr``
+      - Supported frequencies: ``mon``, ``day``, ``3hr``.
+      - Tier: 3
+
+    For example for monthly data, place the files in the ``/Tier3/MSWEP/latestversion/mon/pr`` subdirectory of your ``native6`` project location.
+
+    .. note::
+
+      For monthly data (V220), the data must be postfixed with the date, i.e. rename ``global_monthly_050deg.nc`` to ``global_monthly_050deg_197901-201710.nc``
 
-For more info: http://www.gloh2o.org/
+    For more info: http://www.gloh2o.org/
diff --git a/doc/develop/index.rst b/doc/develop/index.rst
@@ -10,5 +10,5 @@ features.
    :maxdepth: 1
 
     Preprocessor function <preprocessor_function>
-    Dataset fix <fixing_data>
+    Adapting to data sources <fixing_data>
     Deriving a variable <derivation>
diff --git a/doc/quickstart/configure.rst b/doc/quickstart/configure.rst
@@ -103,7 +103,7 @@ with explanations in a commented line above each option:
     OBS: ~/obs_inputpath
     default: ~/default_inputpath
 
-  # Directory structure for input data: [default]/BADC/DKRZ/ETHZ/etc
+  # Directory structure for input data: [default]/BADC/DKRZ/ETHZ/IPSL/etc
   # See config-developer.yml for definitions.
   drs:
     CMIP5: default
@@ -176,8 +176,10 @@ It will be installed along with ESMValCore and can also be viewed on GitHub:
 `esmvalcore/config-developer.yml
 <https://github.com/ESMValGroup/ESMValCore/blob/main/esmvalcore/config-developer.yml>`_.
 This configuration file describes the file system structure and CMOR tables for several
-key projects (CMIP6, CMIP5, obs4mips, OBS6, OBS) on several key machines (e.g. BADC, CP4CDS, DKRZ,
-ETHZ, SMHI, BSC). CMIP data is stored as part of the Earth System Grid
+key projects (CMIP6, CMIP5, obs4mips, OBS6, OBS) on several key machines (e.g.
+BADC, CP4CDS, DKRZ, ETHZ, SMHI, BSC, IPSL), and for native output data for some
+models (IPSL, ... see :ref:`configure_native_models`) .
+CMIP data is stored as part of the Earth System Grid
 Federation (ESGF) and the standards for file naming and paths to files are set
 out by CMOR and DRS. For a detailed description of these standards and their
 adoption in ESMValCore, we refer the user to :ref:`CMOR-DRS` section where we
@@ -260,9 +262,33 @@ your data please see :ref:`CMOR-DRS`.
 Preprocessor output files
 -------------------------
 
-The filename to use for preprocessed data is configured in a similar manner
-using ``output_file``. Note that the extension ``.nc`` (and if applicable,
-a start and end time) will automatically be appended to the filename.
+The filename to use for preprocessed data is configured in a similar
+manner using ``output_file``, which can be either a single value or a
+dictionnary of values.
+
+This latter case is useful for projects which gather much varied cases
+with varied set of dataset attributes, such as the native6 project :
+
+.. _example_IPSL_config: 
+
+.. code-block:: yaml
+
+  native6:
+    ...
+    input_dir:
+      default: 'Tier{tier}/{dataset}/{latestversion}/{frequency}/{short_name}'
+      IPSL: '{account}/{model}/{status}/{exp}/{simulation}/{igcm_dir}/Analyse/{freq}'
+    input_file:
+      default: '*.nc'
+      IPSL:'{simulation}_*_{ipsl_varname}.nc'
+    output_file:
+      default: '{project}_{dataset}_{type}_{version}_{mip}_{short_name}'
+      IPSL: '{account}_{model}_{status}_{exp}_{simulation}_{short_name}'
+    ...
+
+		
+Note that the extension ``.nc`` (and if applicable, a start and end
+time) will automatically be appended to the filename.
 
 .. _cmor_table_configuration:
 
@@ -289,6 +315,62 @@ related to CMOR table settings available:
   to get the name of the file containing the ``mip`` table.
   Defaults to the value provided in ``cmor_type``.
 
+.. _configure_native_models:
+
+Configuring native models and observation data sets
+----------------------------------------------------
+
+ESMValTool can take full advantage of the ability to configure
+ESMValCore for handling native model output formats and specific
+observation data sets without preliminary reformating. Such a
+configuration involves the following steps :
+
+  - allowing for ESMValTool to locate the data files :
+
+    - entry ``native6`` of ``config-developer.yml`` should be
+      complemented with sub-entries for ``input_dir``, ``input_file``
+      and ``output_file`` that goes under a new key representing the
+      data organization (such as ``IPSL``), and these sub-entries can
+      use an arbitrary list of ``{placeholders}``. Example :
+
+      .. code-block:: yaml
+
+	native6:
+  	  cmor_strict: false
+	  input_dir:
+             default: 'Tier{tier}/{dataset}/{latestversion}/{frequency}/{short_name}'
+             IPSL: '{account}/{model}/{status}/{exp}/{simulation}/{dir}/{freq}'
+          input_file:
+            default: '*.nc'
+            IPSL: 
+              - '{simulation}_*_{ipsl_varname}.nc'
+              - '{simulation}_*_{group}.nc'
+          output_file:
+            default: '{project}_{dataset}_{type}_{version}_{mip}_{short_name}'
+            IPSL: '{account}_{model}_{status}_{exp}_{simulation}_{freq}_{short_name}'
+          cmor_type: 'CMIP6'
+          cmor_default_table_prefix: 'CMIP6_'
+      
+
+    - if necessary, provide a so-called ``mapping file`` which allows
+      to associate a given variable short_name used in a recipe, such
+      as ``tas``, with a dictionnary of placeholder values; these
+      values will be used at run time, with ``input_dir`` and
+      ``input_file`` patterns, to compute the actual filename to load
+      for that variable; such a file is looked for under pattern
+      ``native6-*.yml`` at two places : in the source code, at
+      ``ESMValCore/esmvalcore/_config/variable_details/`` and in user
+      space, at ``~/.esmvaltool/variable_details``. See here
+      :download:`an example of such a file for IPSL-CM6
+      <../../esmvalcore/_config/variable_details/native6-ipsl-cm6-mappings.yml>`.
+      All such files in these two places are sorted and loaded in
+      sequence, first for the code location, second for the
+      user-space location
+
+  - ensuring that ESMValTool get the right metadata and data out of
+    your data files : this is described at :ref:`fixing_data`
+
+
 .. _config-ref:
 
 References configuration file

diff --git a/doc/quickstart/find_data.rst b/doc/quickstart/find_data.rst
@@ -1,7 +1,7 @@
 .. _findingdata:
 
 ************
-Finding data
+Input data
 ************
 
 Overview
@@ -15,10 +15,13 @@ the right data. We will detail below the data finding and retrieval process and
 the input the user needs to specify, giving examples on how to use the data
 finding routine under different scenarios.
 
+Data types
+==========
+
 .. _CMOR-DRS:
 
-CMIP data - CMOR Data Reference Syntax (DRS) and the ESGF
-=========================================================
+CMIP data
+---------------------------------------------------------
 CMIP data is widely available via the Earth System Grid Federation
 (`ESGF <https://esgf.llnl.gov/>`_) and is accessible to users either
 via download from the ESGF portal or through the ESGF data nodes hosted
@@ -45,6 +48,39 @@ From the ESMValTool user perspective the number of data input parameters is
 optimized to allow for ease of use. We detail this procedure in the next
 section.
 
+Native model data
+---------------------------------------------------------
+Support for native model data is quite easy using basic
+:ref:`ESMValCore fix procedure <fixing_data>` and is yet implemented
+for some models :ref:`as described here <fixing_native_models>`
+
+Observational data
+---------------------------------------------------------
+Part of observational data is retrieved in the same manner as CMIP data, for example
+using the ``OBS`` root path set to:
+
+  .. code-block:: yaml
+
+    OBS: /gws/nopw/j04/esmeval/obsdata-v2
+
+and the dataset:
+
+  .. code-block:: yaml
+
+    - {dataset: ERA-Interim,  project: OBS,  type: reanaly,  version: 1,  start_year: 2014,  end_year: 2015,  tier: 3}
+
+in ``recipe.yml`` in ``datasets`` or ``additional_datasets``, the rules set in
+CMOR-DRS_ are used again and the file will be automatically found:
+
+.. code-block::
+
+  /gws/nopw/j04/esmeval/obsdata-v2/Tier3/ERA-Interim/OBS_ERA-Interim_reanaly_1_Amon_ta_201401-201412.nc
+
+Since observational data are organized in Tiers depending on their level of
+public availability, the ``default`` directory must be structured accordingly
+with sub-directories ``TierX`` (``Tier1``, ``Tier2`` or ``Tier3``), even when
+``drs: default``.
+
 .. _data-retrieval:
 
 Data retrieval
@@ -231,32 +267,6 @@ and finally, using the file naming definition from CMOR-DRS_ find the file:
 
 .. _observations:
 
-Observational data
-==================
-Observational data is retrieved in the same manner as CMIP data, for example
-using the ``OBS`` root path set to:
-
-  .. code-block:: yaml
-
-    OBS: /gws/nopw/j04/esmeval/obsdata-v2
-
-and the dataset:
-
-  .. code-block:: yaml
-
-    - {dataset: ERA-Interim,  project: OBS,  type: reanaly,  version: 1,  start_year: 2014,  end_year: 2015,  tier: 3}
-
-in ``recipe.yml`` in ``datasets`` or ``additional_datasets``, the rules set in
-CMOR-DRS_ are used again and the file will be automatically found:
-
-.. code-block::
-
-  /gws/nopw/j04/esmeval/obsdata-v2/Tier3/ERA-Interim/OBS_ERA-Interim_reanaly_1_Amon_ta_201401-201412.nc
-
-Since observational data are organized in Tiers depending on their level of
-public availability, the ``default`` directory must be structured accordingly
-with sub-directories ``TierX`` (``Tier1``, ``Tier2`` or ``Tier3``), even when
-``drs: default``.
 
 Data loading
 ============

diff --git a/doc/quickstart/index.rst b/doc/quickstart/index.rst
@@ -6,7 +6,7 @@ Getting started
 
 		Installation <install>
     Configuration <configure>
-    Finding data <find_data>
+    Input data <find_data>
     Installed recipes <recipes>
 		Running <run>
 		Output <output>