Skip to content

Commit

Permalink
Merge pull request #171 from ecmwf/develop
Browse files Browse the repository at this point in the history
Minor release 0.5.13
  • Loading branch information
JPXKQX authored Jan 10, 2025
2 parents 84fa08c + 7df24d9 commit 6853018
Show file tree
Hide file tree
Showing 42 changed files with 879 additions and 79 deletions.
6 changes: 5 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,15 @@ Keep it human-readable, your future self will thank you!
- Fix metadata serialization handling of numpy.integer (#140)
- Fix negative variance for constant variables (#148)
- Fix cutout slicing of grid dimension (#145)
- Use cKDTree instead of KDTree
- Implement 'complement' feature
- Add ability to patch xarrays (#160)

### Added

- Call filters from anemoi-transform
- make test optional when adls is not installed Pull request #110
- Make test optional when adls is not installed Pull request #110
- Add wz_to_w, orog_to_z, and sum filters (#149)

## [0.5.8](https://github.com/ecmwf/anemoi-datasets/compare/0.5.7...0.5.8) - 2024-10-26

Expand Down
3 changes: 3 additions & 0 deletions docs/building/filters.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,11 @@ Filters are used to modify the data or metadata in a dataset.
:maxdepth: 1

filters/select
filters/orog_to_z
filters/rename
filters/rotate_winds
filters/sum
filters/unrotate_winds
filters/wz_to_w
filters/noop
filters/empty
17 changes: 17 additions & 0 deletions docs/building/filters/orog_to_z.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
###########
orog_to_z
###########

The ``orog_to_z`` filter converts orography (in meters) to surface
geopotential height (m^2/s^2) using the equation:

.. math::
z &= g \cdot \textrm{orog}\\
g &= 9.80665\ m \cdot s^{-1}
This filter needs to follow a source that provides orography, which is
replaced by surface geopotential height.

.. literalinclude:: yaml/orog_to_z.yaml
:language: yaml
13 changes: 13 additions & 0 deletions docs/building/filters/sum.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#####
sum
#####

The ``sum`` filter computes the sum over multiple variables. This can be
useful for computing total precipitation from its components (snow,
rain) or summing the components of total column integrated water. This
filter needs to follow a source that provides the list of variables to
be summed up. These variables are removed by the filter and replaced by
a single summed variable.

.. literalinclude:: yaml/sum.yaml
:language: yaml
12 changes: 12 additions & 0 deletions docs/building/filters/wz_to_w.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
#########
wz_to_w
#########

The ``wz_to_w`` filter converts geometric vertical velocity (provided in
m/s) to vertical velocity in pressure coordinates (Pa/s). This filter
needs to follow a source that provides geometric vertical velocity.
Geometric vertical velocity is removed by the filter and pressure
vertical velocity is added.

.. literalinclude:: yaml/wz_to_w.yaml
:language: yaml
10 changes: 10 additions & 0 deletions docs/building/filters/yaml/orog_to_z.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
input:
pipe:
- source: # mars, grib, netcdf, etc.
# source attributes here
# ...
# Must load an orography variable

- orog_to_z:
orog: orog # Name of orography (input) variable
z: z # Name of z (output) variable
13 changes: 13 additions & 0 deletions docs/building/filters/yaml/sum.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
input:
pipe:
- source: # mars, grib, netcdf, etc.
# source attributes here
# ...
# Must load the variables to be summed

- sum:
params: # List of input variables
variable1
variable2
variable3
output: variable_total # Name of output variable
10 changes: 10 additions & 0 deletions docs/building/filters/yaml/wz_to_w.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
input:
pipe:
- source: # mars, grib, netcdf, etc.
# source attributes here
# ...
# Must load geometric vertical velocity

- wz_to_w:
wz: wz # Name of geometric vertical velocity (input) variable
x: z # Name of pressure vertical velocity (output) variable
65 changes: 33 additions & 32 deletions docs/building/introduction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ file, which is a YAML file that describes sources of meteorological
fields as well as the operations to perform on them, before they are
written to a zarr file. The input of the process is a range of dates and
some options to control the layout of the output. Statistics will be
computed as the dataset is build, and stored in the metadata, with other
computed as the dataset is built, and stored in the metadata, with other
information such as the the locations of the grid points, the list of
variables, etc.

Expand All @@ -24,35 +24,35 @@ variables, etc.

date
Throughout this document, the term `date` refers to a date and time,
not just a date. A training dataset is covers a continuous range of
not just a date. A training dataset covers a continuous range of
dates with a given frequency. Missing dates are still part of the
dataset, but the data are missing and marked as such using NaNs.
Dates are always in UTC, and refer to date at which the data is
valid. For accumulations and fluxes, that would be the end of the
accumulation period.
dataset, but missing data are marked as such using NaNs. Dates are
always in UTC, and refer to date at which the data is valid. For
accumulations and fluxes, that would be the end of the accumulation
period.

variable
A `variable` is meteorological parameter, such as temperature, wind,
etc. Multilevel parameters are treated as separate variables, one for
each level. For example, temperature at 850 hPa and temperature at
500 hPa will be treated as two separate variables (`t_850` and
`t_500`).
A `variable` is a meteorological parameter, such as temperature,
wind, etc. Multilevel parameters are treated as separate variables,
one for each level. For example, temperature at 850 hPa and
temperature at 500 hPa will be treated as two separate variables
(`t_850` and `t_500`).

field
A `field` is a variable at a given date. It is represented by a array
of values at each grid point.
A `field` is a variable at a given date. It is represented by an
array of values at each grid point.

source
The `source` is a software component that given a list of dates and
variables will return the corresponding fields. A example of source
The `source` is a software component that, given a list of dates and
variables will return the corresponding fields. An example of source
is ECMWF's MARS archive, a collection of GRIB or NetCDF files, a
database, etc. See :ref:`sources` for more information.

filter
A `filter` is a software component that takes as input the output of
a source or the output of another filter can modify the fields and/or
their metadata. For example, typical filters are interpolations,
renaming of variables, etc. See :ref:`filters` for more information.
a source or another filter and can modify the fields and/or their
metadata. For example, typical filters are interpolations, renaming
of variables, etc. See :ref:`filters` for more information.

************
Operations
Expand All @@ -62,19 +62,20 @@ In order to build a training dataset, sources and filters are combined
using the following operations:

join
The join is the process of combining several sources data. Each
source is expected to provide different variables at the same dates.
The join is the process of combining several sources of data. Each
source is expected to provide different variables for the same of
dates.

pipe
The pipe is the process of transforming fields using filters. The
first step of a pipe is typically a source, a join or another pipe.
The following steps are filters.
first step of a pipe is typically a source, a join, or another pipe.
This can subsequently followed by more filters.

concat
The concatenation is the process of combining different sets of
operation that handle different dates. This is typically used to
build a dataset that spans several years, when the several sources
are involved, each providing a different period.
operations that handle different dates. This is typically used to
build a dataset that spans several years, when several sources are
involved, each providing data for different period.

Each operation is considered as a :ref:`source <sources>`, therefore
operations can be combined to build complex datasets.
Expand All @@ -87,7 +88,7 @@ First recipe
============

The simplest `recipe` file must contain a ``dates`` section and an
``input`` section. The latter must contain a `source` In that case, the
``input`` section. The latter must contain a `source`. In that case, the
source is ``mars``

.. literalinclude:: yaml/building1.yaml
Expand Down Expand Up @@ -132,15 +133,15 @@ This will build the following dataset:
Adding some forcing variables
=============================

When training a data-driven models, some forcing variables may be
When training a data-driven model, some forcing variables may be
required such as the solar radiation, the time of day, the day in the
year, etc.

These are provided by the ``forcings`` source. In that example, we add a
few of them. The `template` option is used to point to another source,
in that case the first instance of ``mars``. This source is used to get
information about the grid points, as some of the forcing variables are
grid dependent.
These are provided by the ``forcings`` source. Let us add a few of them
to the above example. The `template` option is used to point to another
source, in that case the first instance of ``mars``. This source is used
to get information about the grid points, as some of the forcing
variables are grid dependent.

.. literalinclude:: yaml/building3.yaml
:language: yaml
Expand Down
2 changes: 1 addition & 1 deletion docs/building/sources/yaml/accumulations1.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
input:
accumulations:
accumulations_period: 6
accumulation_period: 6
class: ea
param: [tp, cp, sf]
levtype: sfc
2 changes: 1 addition & 1 deletion docs/building/sources/yaml/accumulations2.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
input:
accumulations:
accumulations_period: [6, 12]
accumulation_period: [6, 12]
class: od
param: [tp, cp, sf]
levtype: sfc
2 changes: 2 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ datasets <building-introduction>`.
- :doc:`using/subsetting`
- :doc:`using/combining`
- :doc:`using/selecting`
- :doc:`using/ensembles`
- :doc:`using/grids`
- :doc:`using/zip`
- :doc:`using/statistics`
Expand All @@ -65,6 +66,7 @@ datasets <building-introduction>`.
using/subsetting
using/combining
using/selecting
using/ensembles
using/grids
using/zip
using/statistics
Expand Down
6 changes: 6 additions & 0 deletions docs/using/code/complement1_.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
open_dataset(
complement=dataset1,
source=dataset2,
what="variables",
interpolate="nearest",
)
12 changes: 12 additions & 0 deletions docs/using/code/complement2_.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
open_dataset(
cutout=[
{
"complement": lam_dataset,
"source": global_dataset,
"interpolate": "nearest",
},
{
"dataset": global_dataset,
},
]
)
4 changes: 4 additions & 0 deletions docs/using/code/complement3_.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
open_dataset(
complement=dataset1,
source=dataset2,
)
4 changes: 4 additions & 0 deletions docs/using/code/number1_.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
ds = open_dataset(
dataset,
number=1,
)
4 changes: 4 additions & 0 deletions docs/using/code/number2_.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
ds = open_dataset(
dataset,
number=[1, 3, 5],
)
29 changes: 29 additions & 0 deletions docs/using/combining.rst
Original file line number Diff line number Diff line change
Expand Up @@ -182,3 +182,32 @@ The difference can be seen at the boundary between the two grids:
To debug the combination, you can pass `plot=True` to the `cutout`
function (when running from a Notebook), of use `plot="prefix"` to save
the plots to series of PNG files in the current directory.

.. _complement:

************
complement
************

That feature will interpolate the variables of `dataset2` that are not
in `dataset1` to the grid of `dataset1` , add them to the list of
variable of `dataset1` and return the result.

.. literalinclude:: code/complement1_.py

Currently ``what`` can only be ``variables`` and can be omitted.

The value for ``interpolate`` can be one of ``none`` (default) or
``nearest``. In the case of ``none``, the grids of the two datasets must
match.

This feature was originally designed to be used in conjunction with
``cutout``, where `dataset1` is the lam, and `dataset2` is the global
dataset.

.. literalinclude:: code/complement2_.py

Another use case is to simply bring all non-overlapping variables of a
dataset into an other:

.. literalinclude:: code/complement3_.py
27 changes: 27 additions & 0 deletions docs/using/ensembles.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
.. _selecting-members:

###################
Selecting members
###################

This section describes how to subset data that are part of an ensemble.
To combine ensembles, see :ref:`ensembles` in the
:ref:`combining-datasets` section.

.. _number:

If a dataset is an ensemble, you can select one or more specific members
using the `number` option. You can also use ``numbers`` (which is an
alias for ``number``), and ``member`` (or ``members``). The difference
between the two is that ``number`` is **1-based**, while ``member`` is
**0-based**.

Select a single element:

.. literalinclude:: code/number1_.py
:language: python

... or a list:

.. literalinclude:: code/number2_.py
:language: python
26 changes: 25 additions & 1 deletion docs/using/selecting.rst
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,28 @@ You can also rename variables:
This will be useful when you join datasets and do not want variables
from one dataset to override the ones from the other.

********
number
********

If a dataset is an ensemble, you can select one or more specific members
using the `number` option. You can also use ``numbers`` (which is an
alias for ``number``), and ``member`` (or ``members``). The difference
between the two is that ``number`` is **1-based**, while ``member`` is
**0-based**.

Select a single element:

.. literalinclude:: code/number1_.py
:language: python

... or a list:

.. literalinclude:: code/number2_.py
:language: python

.. _rescale:

*********
rescale
*********
Expand All @@ -87,7 +109,9 @@ rescale the data.
.. warning::

When providing units, the library assumes that the mapping between
them is a linear transformation. No check is does to ensure this is
them is a linear transformation. No check is done to ensure this is
the case.

.. _cfunits: https://github.com/NCAS-CMS/cfunits

.. _number:
Loading

0 comments on commit 6853018

Please sign in to comment.