Merge pull request #171 from ecmwf/develop

Minor release 0.5.13
ecmwf · Jan 10, 2025 · 6853018 · 6853018
2 parents 84fa08c + 7df24d9
commit 6853018
Show file tree

Hide file tree

Showing 42 changed files with 879 additions and 79 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -15,11 +15,15 @@ Keep it human-readable, your future self will thank you!
 - Fix metadata serialization handling of numpy.integer (#140)
 - Fix negative variance for constant variables (#148)
 - Fix cutout slicing of grid dimension (#145)
+- Use cKDTree instead of KDTree
+- Implement 'complement' feature
+- Add ability to patch xarrays (#160)
 
 ### Added
 
 - Call filters from anemoi-transform
-- make test optional when adls is not installed Pull request #110
+- Make test optional when adls is not installed Pull request #110
+- Add wz_to_w, orog_to_z, and sum filters (#149)
 
 ## [0.5.8](https://github.com/ecmwf/anemoi-datasets/compare/0.5.7...0.5.8) - 2024-10-26
 

diff --git a/docs/building/filters.rst b/docs/building/filters.rst
@@ -15,8 +15,11 @@ Filters are used to modify the data or metadata in a dataset.
    :maxdepth: 1
 
    filters/select
+   filters/orog_to_z
    filters/rename
    filters/rotate_winds
+   filters/sum
    filters/unrotate_winds
+   filters/wz_to_w
    filters/noop
    filters/empty
diff --git a/docs/building/filters/orog_to_z.rst b/docs/building/filters/orog_to_z.rst
@@ -0,0 +1,17 @@
+###########
+ orog_to_z
+###########
+
+The ``orog_to_z`` filter converts orography (in meters) to surface
+geopotential height (m^2/s^2) using the equation:
+
+.. math::
+
+   z &= g \cdot \textrm{orog}\\
+   g &= 9.80665\ m \cdot s^{-1}
+
+This filter needs to follow a source that provides orography, which is
+replaced by surface geopotential height.
+
+.. literalinclude:: yaml/orog_to_z.yaml
+   :language: yaml
diff --git a/docs/building/filters/sum.rst b/docs/building/filters/sum.rst
@@ -0,0 +1,13 @@
+#####
+ sum
+#####
+
+The ``sum`` filter computes the sum over multiple variables. This can be
+useful for computing total precipitation from its components (snow,
+rain) or summing the components of total column integrated water. This
+filter needs to follow a source that provides the list of variables to
+be summed up. These variables are removed by the filter and replaced by
+a single summed variable.
+
+.. literalinclude:: yaml/sum.yaml
+   :language: yaml
diff --git a/docs/building/filters/wz_to_w.rst b/docs/building/filters/wz_to_w.rst
@@ -0,0 +1,12 @@
+#########
+ wz_to_w
+#########
+
+The ``wz_to_w`` filter converts geometric vertical velocity (provided in
+m/s) to vertical velocity in pressure coordinates (Pa/s). This filter
+needs to follow a source that provides geometric vertical velocity.
+Geometric vertical velocity is removed by the filter and pressure
+vertical velocity is added.
+
+.. literalinclude:: yaml/wz_to_w.yaml
+   :language: yaml
diff --git a/docs/building/filters/yaml/orog_to_z.yaml b/docs/building/filters/yaml/orog_to_z.yaml
@@ -0,0 +1,10 @@
+input:
+  pipe:
+    - source: # mars, grib, netcdf, etc.
+      # source attributes here
+      # ...
+      # Must load an orography variable
+
+    - orog_to_z:
+        orog: orog  # Name of orography (input) variable
+        z: z        # Name of z (output) variable
diff --git a/docs/building/filters/yaml/sum.yaml b/docs/building/filters/yaml/sum.yaml
@@ -0,0 +1,13 @@
+input:
+  pipe:
+    - source: # mars, grib, netcdf, etc.
+      # source attributes here
+      # ...
+      # Must load the variables to be summed
+
+    - sum:
+        params:  # List of input variables
+          variable1
+          variable2
+          variable3
+        output: variable_total    # Name of output variable
diff --git a/docs/building/filters/yaml/wz_to_w.yaml b/docs/building/filters/yaml/wz_to_w.yaml
@@ -0,0 +1,10 @@
+input:
+  pipe:
+    - source: # mars, grib, netcdf, etc.
+      # source attributes here
+      # ...
+      # Must load geometric vertical velocity
+
+    - wz_to_w:
+        wz: wz  # Name of geometric vertical velocity (input) variable
+        x: z    # Name of pressure vertical velocity (output) variable
diff --git a/docs/building/introduction.rst b/docs/building/introduction.rst
@@ -10,7 +10,7 @@ file, which is a YAML file that describes sources of meteorological
 fields as well as the operations to perform on them, before they are
 written to a zarr file. The input of the process is a range of dates and
 some options to control the layout of the output. Statistics will be
-computed as the dataset is build, and stored in the metadata, with other
+computed as the dataset is built, and stored in the metadata, with other
 information such as the the locations of the grid points, the list of
 variables, etc.
 
@@ -24,35 +24,35 @@ variables, etc.
 
 date
    Throughout this document, the term `date` refers to a date and time,
-   not just a date. A training dataset is covers a continuous range of
+   not just a date. A training dataset covers a continuous range of
    dates with a given frequency. Missing dates are still part of the
-   dataset, but the data are missing and marked as such using NaNs.
-   Dates are always in UTC, and refer to date at which the data is
-   valid. For accumulations and fluxes, that would be the end of the
-   accumulation period.
+   dataset, but missing data are marked as such using NaNs. Dates are
+   always in UTC, and refer to date at which the data is valid. For
+   accumulations and fluxes, that would be the end of the accumulation
+   period.
 
 variable
-   A `variable` is meteorological parameter, such as temperature, wind,
-   etc. Multilevel parameters are treated as separate variables, one for
-   each level. For example, temperature at 850 hPa and temperature at
-   500 hPa will be treated as two separate variables (`t_850` and
-   `t_500`).
+   A `variable` is a meteorological parameter, such as temperature,
+   wind, etc. Multilevel parameters are treated as separate variables,
+   one for each level. For example, temperature at 850 hPa and
+   temperature at 500 hPa will be treated as two separate variables
+   (`t_850` and `t_500`).
 
 field
-   A `field` is a variable at a given date. It is represented by a array
-   of values at each grid point.
+   A `field` is a variable at a given date. It is represented by an
+   array of values at each grid point.
 
 source
-   The `source` is a software component that given a list of dates and
-   variables will return the corresponding fields. A example of source
+   The `source` is a software component that, given a list of dates and
+   variables will return the corresponding fields. An example of source
    is ECMWF's MARS archive, a collection of GRIB or NetCDF files, a
    database, etc. See :ref:`sources` for more information.
 
 filter
    A `filter` is a software component that takes as input the output of
-   a source or the output of another filter can modify the fields and/or
-   their metadata. For example, typical filters are interpolations,
-   renaming of variables, etc. See :ref:`filters` for more information.
+   a source or another filter and can modify the fields and/or their
+   metadata. For example, typical filters are interpolations, renaming
+   of variables, etc. See :ref:`filters` for more information.
 
 ************
  Operations
@@ -62,19 +62,20 @@ In order to build a training dataset, sources and filters are combined
 using the following operations:
 
 join
-   The join is the process of combining several sources data. Each
-   source is expected to provide different variables at the same dates.
+   The join is the process of combining several sources of data. Each
+   source is expected to provide different variables for the same of
+   dates.
 
 pipe
    The pipe is the process of transforming fields using filters. The
-   first step of a pipe is typically a source, a join or another pipe.
-   The following steps are filters.
+   first step of a pipe is typically a source, a join, or another pipe.
+   This can subsequently followed by more filters.
 
 concat
    The concatenation is the process of combining different sets of
-   operation that handle different dates. This is typically used to
-   build a dataset that spans several years, when the several sources
-   are involved, each providing a different period.
+   operations that handle different dates. This is typically used to
+   build a dataset that spans several years, when several sources are
+   involved, each providing data for different period.
 
 Each operation is considered as a :ref:`source <sources>`, therefore
 operations can be combined to build complex datasets.
@@ -87,7 +88,7 @@ First recipe
 ============
 
 The simplest `recipe` file must contain a ``dates`` section and an
-``input`` section. The latter must contain a `source` In that case, the
+``input`` section. The latter must contain a `source`. In that case, the
 source is ``mars``
 
 .. literalinclude:: yaml/building1.yaml
@@ -132,15 +133,15 @@ This will build the following dataset:
 Adding some forcing variables
 =============================
 
-When training a data-driven models, some forcing variables may be
+When training a data-driven model, some forcing variables may be
 required such as the solar radiation, the time of day, the day in the
 year, etc.
 
-These are provided by the ``forcings`` source. In that example, we add a
-few of them. The `template` option is used to point to another source,
-in that case the first instance of ``mars``. This source is used to get
-information about the grid points, as some of the forcing variables are
-grid dependent.
+These are provided by the ``forcings`` source. Let us add a few of them
+to the above example. The `template` option is used to point to another
+source, in that case the first instance of ``mars``. This source is used
+to get information about the grid points, as some of the forcing
+variables are grid dependent.
 
 .. literalinclude:: yaml/building3.yaml
    :language: yaml

diff --git a/docs/building/sources/yaml/accumulations1.yaml b/docs/building/sources/yaml/accumulations1.yaml
@@ -1,6 +1,6 @@
 input:
   accumulations:
-    accumulations_period: 6
+    accumulation_period: 6
     class: ea
     param: [tp, cp, sf]
     levtype: sfc
diff --git a/docs/building/sources/yaml/accumulations2.yaml b/docs/building/sources/yaml/accumulations2.yaml
@@ -1,6 +1,6 @@
 input:
   accumulations:
-    accumulations_period: [6, 12]
+    accumulation_period: [6, 12]
     class: od
     param: [tp, cp, sf]
     levtype: sfc
diff --git a/docs/index.rst b/docs/index.rst
@@ -45,6 +45,7 @@ datasets <building-introduction>`.
 -  :doc:`using/subsetting`
 -  :doc:`using/combining`
 -  :doc:`using/selecting`
+-  :doc:`using/ensembles`
 -  :doc:`using/grids`
 -  :doc:`using/zip`
 -  :doc:`using/statistics`
@@ -65,6 +66,7 @@ datasets <building-introduction>`.
    using/subsetting
    using/combining
    using/selecting
+   using/ensembles
    using/grids
    using/zip
    using/statistics

diff --git a/docs/using/code/complement1_.py b/docs/using/code/complement1_.py
@@ -0,0 +1,6 @@
+open_dataset(
+    complement=dataset1,
+    source=dataset2,
+    what="variables",
+    interpolate="nearest",
+)
diff --git a/docs/using/code/complement2_.py b/docs/using/code/complement2_.py
@@ -0,0 +1,12 @@
+open_dataset(
+    cutout=[
+        {
+            "complement": lam_dataset,
+            "source": global_dataset,
+            "interpolate": "nearest",
+        },
+        {
+            "dataset": global_dataset,
+        },
+    ]
+)
diff --git a/docs/using/code/complement3_.py b/docs/using/code/complement3_.py
@@ -0,0 +1,4 @@
+open_dataset(
+    complement=dataset1,
+    source=dataset2,
+)
diff --git a/docs/using/code/number1_.py b/docs/using/code/number1_.py
@@ -0,0 +1,4 @@
+ds = open_dataset(
+    dataset,
+    number=1,
+)
diff --git a/docs/using/code/number2_.py b/docs/using/code/number2_.py
@@ -0,0 +1,4 @@
+ds = open_dataset(
+    dataset,
+    number=[1, 3, 5],
+)
diff --git a/docs/using/combining.rst b/docs/using/combining.rst
@@ -182,3 +182,32 @@ The difference can be seen at the boundary between the two grids:
 To debug the combination, you can pass `plot=True` to the `cutout`
 function (when running from a Notebook), of use `plot="prefix"` to save
 the plots to series of PNG files in the current directory.
+
+.. _complement:
+
+************
+ complement
+************
+
+That feature will interpolate the variables of `dataset2` that are not
+in `dataset1` to the grid of `dataset1` , add them to the list of
+variable of `dataset1` and return the result.
+
+.. literalinclude:: code/complement1_.py
+
+Currently ``what`` can only be ``variables`` and can be omitted.
+
+The value for ``interpolate`` can be one of ``none`` (default) or
+``nearest``. In the case of ``none``, the grids of the two datasets must
+match.
+
+This feature was originally designed to be used in conjunction with
+``cutout``, where `dataset1` is the lam, and `dataset2` is the global
+dataset.
+
+.. literalinclude:: code/complement2_.py
+
+Another use case is to simply bring all non-overlapping variables of a
+dataset into an other:
+
+.. literalinclude:: code/complement3_.py
diff --git a/docs/using/ensembles.rst b/docs/using/ensembles.rst
@@ -0,0 +1,27 @@
+.. _selecting-members:
+
+###################
+ Selecting members
+###################
+
+This section describes how to subset data that are part of an ensemble.
+To combine ensembles, see :ref:`ensembles` in the
+:ref:`combining-datasets` section.
+
+.. _number:
+
+If a dataset is an ensemble, you can select one or more specific members
+using the `number` option. You can also use ``numbers`` (which is an
+alias for ``number``), and ``member`` (or ``members``). The difference
+between the two is that ``number`` is **1-based**, while ``member`` is
+**0-based**.
+
+Select a single element:
+
+.. literalinclude:: code/number1_.py
+   :language: python
+
+... or a list:
+
+.. literalinclude:: code/number2_.py
+   :language: python
diff --git a/docs/using/selecting.rst b/docs/using/selecting.rst
@@ -67,6 +67,28 @@ You can also rename variables:
 This will be useful when you join datasets and do not want variables
 from one dataset to override the ones from the other.
 
+********
+ number
+********
+
+If a dataset is an ensemble, you can select one or more specific members
+using the `number` option. You can also use ``numbers`` (which is an
+alias for ``number``), and ``member`` (or ``members``). The difference
+between the two is that ``number`` is **1-based**, while ``member`` is
+**0-based**.
+
+Select a single element:
+
+.. literalinclude:: code/number1_.py
+   :language: python
+
+... or a list:
+
+.. literalinclude:: code/number2_.py
+   :language: python
+
+.. _rescale:
+
 *********
  rescale
 *********
@@ -87,7 +109,9 @@ rescale the data.
 .. warning::
 
    When providing units, the library assumes that the mapping between
-   them is a linear transformation. No check is does to ensure this is
+   them is a linear transformation. No check is done to ensure this is
    the case.
 
 .. _cfunits: https://github.com/NCAS-CMS/cfunits
+
+.. _number: