Skip to content

Commit

Permalink
Merge branch 'master' into mobt_161_wxcode_tree_check
Browse files Browse the repository at this point in the history
* master:
  Change pandas DataFrame.at to DataFrame.loc (metoppv#1655)
  MOBT-154: Reunification of wx decision trees (metoppv#1639)
  Consolidate scale parameter usage across EMOS and ECC (metoppv#1642)

# Conflicts:
#	improver/wxcode/utilities.py
  • Loading branch information
MoseleyS committed Jan 31, 2022
2 parents 16760ac + 8eb0e74 commit 6196ac9
Show file tree
Hide file tree
Showing 16 changed files with 362 additions and 157 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -56,16 +56,16 @@ A normal (Gaussian) distribution is often represented using the syntax:
where :math:`\mu` is mean and :math:`\sigma^{2}` is the variance. The normal
distribution is a special case, where :math:`\mu` can be interpreted as both
the mean and the location parameter and :math:`\sigma^{2}` can be interpreted
as both the variance and the scale parameter. For an alternative distribution,
such as a truncated normal distribution that has been truncated to lie within
0 and infinity, the distribution can be represented as:
as both the variance and the square of the scale parameter. For an alternative
distribution, such as a truncated normal distribution that has been truncated
to lie within 0 and infinity, the distribution can be represented as:

.. math::
\mathcal{N^0}(\mu,\,\sigma^{2})
In this case, the :math:`\mu` is strictly interpreted as the location parameter
and :math:`\sigma^{2}` is strictly interpreted as the scale parameter.
and :math:`\sigma^{2}` is strictly interpreted as the square of the scale parameter.

===============================
What is the location parameter?
Expand All @@ -82,6 +82,29 @@ The scale parameter indicates the width in the distribution. If the scale
parameter is large, then the distribution will be broader. If the scale is
smaller, then the distribution will be narrower.

****************************************************
Implementation details
****************************************************

In this implementation, we will choose to define the distributions
using the scale parameter (as this matches scipy's expectation),
rather than the square of the scale parameter:

.. math::
\mathcal{N}(\mu,\,\sigma)
The full equation when estimating the EMOS coefficients using
the ensemble mean is therefore:

.. math::
\mathcal{N}(a + b\bar{X}, \sqrt{c + dS^{2}})
This matches the equations noted in `Allen et al., 2021`_.

.. _Allen et al., 2021: https://doi.org/10.1002/qj.3983

****************************************************
Estimating EMOS coefficients using the ensemble mean
****************************************************
Expand All @@ -91,7 +114,7 @@ If the predictor is the ensemble mean, coefficients are estimated as

.. math::
\mathcal{N}(a + \bar{X}, c + dS^{2})
\mathcal{N}(a + b\bar{X}, \sqrt{c + dS^{2}})
where N is a chosen distribution and values of a, b, c and d are solved in the
format of :math:`\alpha, \beta, \gamma` and :math:`\delta`, see the equations
Expand Down Expand Up @@ -121,7 +144,7 @@ If the predictor is the ensemble realizations, coefficients are estimated for

.. math::
\mathcal{N}(a + b_1X_1 + ... + b_mX_m, c + dS^{2})
\mathcal{N}(a + b_1X_1 + ... + b_mX_m, \sqrt{c + dS^{2}})
where N is a chosen distribution, the values of a, b, c and d relate
to alpha, beta, gamma and delta through the equations above with
Expand All @@ -140,14 +163,14 @@ The EMOS coefficients represent adjustments to the ensemble mean and ensemble
variance, in order to generate the location and scale parameters that, for the
chosen distribution, minimise the CRPS. The coefficients can therefore be used
to construct the location parameter, :math:`\mu`, and scale parameter,
:math:`\sigma^{2}`, for the calibrated forecast from today's ensemble mean, or
:math:`\sigma`, for the calibrated forecast from today's ensemble mean, or
ensemble realizations, and the ensemble variance.

.. math::
\mu = a + b\bar{X}
\sigma^{2} = c + dS^{2}
\sigma = \sqrt{c + dS^{2}}
Note here that this procedure holds whether the distribution is normal, i.e.
where the application of the EMOS coefficients to the raw ensemble mean results
Expand Down
14 changes: 10 additions & 4 deletions doc/source/extended_documentation/wxcode/build_a_decision_tree.rst
Original file line number Diff line number Diff line change
Expand Up @@ -77,10 +77,16 @@ accessed with this key contains the essentials that make the node function.
condition_combination. Alternatively, if they are being manipulated within
the node (e.g. added together), they must be separated by the desired
operators (e.g. 'diagnostic1', '+', 'diagnostic2').
- **diagnostic_thresholds** (List(List(float, str)): The diagnostic threshold
value and units being used in the test. A threshold [value, units] pair must
be provided for each diagnostic field with the same nested list structure; as
the basic unit is a list of value and unit, the overall nested structure is
- **diagnostic_thresholds** (List(List(float, str, Optional(int))): The
diagnostic threshold value and units being used in the test. An optional
third value provides a period in seconds that is associated with the
threshold value. For example, a precipitation accumulation threshold might
be given for a 1-hour period (3600 seconds). If instead 3-hour symbols are
being produced using 3-hour precipitation accumulations then the threshold
value will be scaled up by a factor of 3. Only thresholds with an
associated period will be scaled in this way. A threshold [value, units] pair
must be provided for each diagnostic field with the same nested list structure;
as the basic unit is a list of value and unit, the overall nested structure is
one list deeper.
- **diagnostic_conditions** (as diagnostic_fields): The expected inequality
that has been used to construct the input probability field. This is checked
Expand Down
9 changes: 5 additions & 4 deletions improver/calibration/ensemble_calibration.py
Original file line number Diff line number Diff line change
Expand Up @@ -1581,9 +1581,10 @@ def _calculate_scale_parameter(self) -> ndarray:
)

# Calculating the scale parameter, based on the raw variance S^2,
# where predicted variance = c + dS^2, where c = (gamma)^2 and
# d = (delta)^2
scale_parameter = (
# where predicted scale parameter (or equivalently standard deviation
# for a normal distribution) = sqrt(c + dS^2), where c = (gamma)^2 and
# d = (delta)^2.
scale_parameter = np.sqrt(
self.coefficients_cubelist.extract_cube("emos_coefficient_gamma").data
* self.coefficients_cubelist.extract_cube("emos_coefficient_gamma").data
+ self.coefficients_cubelist.extract_cube("emos_coefficient_delta").data
Expand Down Expand Up @@ -1622,7 +1623,7 @@ def _create_output_cubes(
)
scale_parameter_cube = create_new_diagnostic_cube(
"scale_parameter",
f"({template_cube.units})^2",
template_cube.units,
template_cube,
template_cube.attributes,
data=scale_parameter,
Expand Down
14 changes: 12 additions & 2 deletions improver/cli/wxcode.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ def process(
*cubes: cli.inputcube,
wxtree: cli.inputjson = None,
model_id_attr: str = None,
target_period: int = None,
check_tree: bool = False,
):
""" Processes cube for Weather symbols.
Expand All @@ -54,6 +55,13 @@ def process(
Name of attribute recording source models that should be
inherited by the output cube. The source models are expected as
a space-separated string.
target_period:
The period in seconds that the weather symbol being produced should
represent. This should correspond with any period diagnostics, e.g.
precipitation accumulation, being used as input. This is used to scale
any threshold values that are defined with an associated period in
the decision tree. It will only be used if the decision tree
provided has threshold values defined with an associated period.
check_tree (bool):
If set, the decision tree will be checked to see if it conforms to
the expected format and that all nodes can be reached; the only other
Expand All @@ -68,7 +76,7 @@ def process(
if check_tree:
from improver.wxcode.utilities import check_tree

return check_tree(wxtree)
return check_tree(wxtree, target_period=target_period)

from iris.cube import CubeList

Expand All @@ -77,4 +85,6 @@ def process(
if not cubes:
raise RuntimeError("Not enough input arguments. See help for more information.")

return WeatherSymbols(wxtree, model_id_attr=model_id_attr)(CubeList(cubes))
return WeatherSymbols(
wxtree, model_id_attr=model_id_attr, target_period=target_period
)(CubeList(cubes))
27 changes: 13 additions & 14 deletions improver/ensemble_copula_coupling/ensemble_copula_coupling.py
Original file line number Diff line number Diff line change
Expand Up @@ -871,10 +871,10 @@ def _location_and_scale_parameters_to_percentiles(
(len(percentiles_as_fractions), location_data.shape[0]), dtype=np.float32
)

self._rescale_shape_parameters(location_data, np.sqrt(scale_data))
self._rescale_shape_parameters(location_data, scale_data)

percentile_method = self.distribution(
*self.shape_parameters, loc=location_data, scale=np.sqrt(scale_data)
*self.shape_parameters, loc=location_data, scale=scale_data
)

# Loop over percentiles, and use the distribution as the
Expand All @@ -885,8 +885,9 @@ def _location_and_scale_parameters_to_percentiles(
result[index, :] = percentile_method.ppf(percentile_list)
# If percent point function (PPF) returns NaNs, fill in
# mean instead of NaN values. NaN will only be generated if the
# variance is zero. Therefore, if the variance is zero, the mean
# value is used for all gridpoints with a NaN.
# scale parameter (standard deviation) is zero. Therefore, if the
# scale parameter (standard deviation) is zero, the mean value is
# used for all gridpoints with a NaN.
if np.any(scale_data == 0):
nan_index = np.argwhere(np.isnan(result[index, :]))
result[index, nan_index] = location_data[nan_index]
Expand Down Expand Up @@ -1039,11 +1040,9 @@ def _check_unit_compatibility(
) -> None:
"""
The location parameter, scale parameters, and threshold values come
from three different cubes. They should all be in the same base unit,
with the units of the scale parameter being the squared units of the
location parameter and threshold values. This is a sanity check to
ensure the units are as expected, converting units of the location
parameter and scale parameter if possible.
from three different cubes. This is a sanity check to ensure the units
are as expected, converting units of the location parameter and
scale parameter if possible.
Args:
location_parameter:
Expand All @@ -1060,11 +1059,11 @@ def _check_unit_compatibility(

try:
location_parameter.convert_units(threshold_units)
scale_parameter.convert_units(threshold_units ** 2)
scale_parameter.convert_units(threshold_units)
except ValueError as err:
msg = (
"Error: {} This is likely because the mean "
"variance and template cube threshold units are "
"Error: {} This is likely because the location parameter, "
"scale parameter and template cube threshold units are "
"not equivalent/compatible.".format(err)
)
raise ValueError(msg)
Expand Down Expand Up @@ -1107,7 +1106,7 @@ def _location_and_scale_parameters_to_probabilities(
relative_to_threshold = probability_is_above_or_below(probability_cube_template)

self._rescale_shape_parameters(
location_parameter.data.flatten(), np.sqrt(scale_parameter.data).flatten()
location_parameter.data.flatten(), scale_parameter.data.flatten()
)

# Loop over thresholds, and use the specified distribution with the
Expand All @@ -1118,7 +1117,7 @@ def _location_and_scale_parameters_to_probabilities(
distribution = self.distribution(
*self.shape_parameters,
loc=location_parameter.data.flatten(),
scale=np.sqrt(scale_parameter.data.flatten()),
scale=scale_parameter.data.flatten(),
)

probability_method = distribution.cdf
Expand Down
49 changes: 42 additions & 7 deletions improver/wxcode/utilities.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
"""This module defines the utilities required for wxcode plugin """

from collections import OrderedDict
from typing import Any, Dict, List
from typing import Any, Dict, List, Optional

import iris
import numpy as np
Expand Down Expand Up @@ -103,22 +103,49 @@
}


def update_tree_units(tree: Dict[str, Dict[str, Any]]) -> Dict[str, Dict[str, Any]]:
def update_tree_thresholds(
tree: Dict[str, Dict[str, Any]], target_period: Optional[int] = None
) -> Dict[str, Dict[str, Any]]:
"""
Replaces value / unit pairs from tree definition with an Iris AuxCoord
that encodes the same information.
that encodes the same information. Also scales any threshold values that
have an associated period (e.g. accumulation in 3600 seconds) by a factor
to reflect the target period (e.g. a 3-hour, 10800 second, weather symbol).
Args:
tree:
Weather symbols decision tree.
target_period:
The period in seconds that the weather symbol being produced should
represent. This should correspond with any period diagnostics, e.g.
precipitation accumulation, being used as input. This is used to scale
any threshold values that are defined with an associated period in
the decision tree.
Returns:
The tree now containing AuxCoords instead of value / unit pairs.
The tree now containing AuxCoords instead of value / unit pairs, with
period diagnostic threshold values scaled appropriately.
Raises:
ValueError: If thresholds are defined with an associated period and no
target_period is provided.
"""

def _make_thresholds_with_units(items):
if isinstance(items[0], list):
return [_make_thresholds_with_units(item) for item in items]
values, units = items
try:
values, units, period = items
except ValueError:
values, units = items
else:
if not target_period:
raise ValueError(
"The decision tree contains thresholds defined for a particular "
"time period, e.g. the accumulation over 3600 seconds. To "
"use such a decision tree a target_period must be provided "
"such that the threshold values can be scaled appropriately."
)
values = values * target_period / period
return iris.coords.AuxCoord(values, units=units)

for query in tree.values():
Expand Down Expand Up @@ -339,13 +366,21 @@ def _checker(lists):
return _checker(query)


def check_tree(wxtree: Dict[str, Dict[str, Any]]) -> str:
def check_tree(
wxtree: Dict[str, Dict[str, Any]], target_period: Optional[int] = None
) -> str:
"""Perform some checks to ensure the provided decision tree is valid.
Args:
wxtree:
Weather symbols decision tree definition, provided as a
dictionary.
target_period:
The period in seconds that the weather symbol being produced should
represent. This should correspond with any period diagnostics, e.g.
precipitation accumulation, being used as input. This is used to scale
any threshold values that are defined with an associated period in
the decision tree.
Returns:
A list of problems found in the decision tree, or if none are found, the
Expand All @@ -363,7 +398,7 @@ def check_tree(wxtree: Dict[str, Dict[str, Any]]) -> str:
all_targets = np.array(
[(n["if_true"], n["if_false"]) for n in wxtree.values()]
).flatten()
wxtree = update_tree_units(wxtree)
wxtree = update_tree_thresholds(wxtree, target_period)
valid_codes = list(WX_DICT.keys())

all_key_words = REQUIRED_KEY_WORDS + OPTIONAL_KEY_WORDS
Expand Down
Loading

0 comments on commit 6196ac9

Please sign in to comment.