Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate scale parameter usage across EMOS and ECC #1642

Merged
merged 2 commits into from
Jan 24, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -56,16 +56,16 @@ A normal (Gaussian) distribution is often represented using the syntax:
where :math:`\mu` is mean and :math:`\sigma^{2}` is the variance. The normal
distribution is a special case, where :math:`\mu` can be interpreted as both
the mean and the location parameter and :math:`\sigma^{2}` can be interpreted
as both the variance and the scale parameter. For an alternative distribution,
such as a truncated normal distribution that has been truncated to lie within
0 and infinity, the distribution can be represented as:
as both the variance and the square of the scale parameter. For an alternative
distribution, such as a truncated normal distribution that has been truncated
to lie within 0 and infinity, the distribution can be represented as:

.. math::

\mathcal{N^0}(\mu,\,\sigma^{2})

In this case, the :math:`\mu` is strictly interpreted as the location parameter
and :math:`\sigma^{2}` is strictly interpreted as the scale parameter.
and :math:`\sigma^{2}` is strictly interpreted as the square of the scale parameter.

===============================
What is the location parameter?
Expand All @@ -82,6 +82,29 @@ The scale parameter indicates the width in the distribution. If the scale
parameter is large, then the distribution will be broader. If the scale is
smaller, then the distribution will be narrower.

****************************************************
fionaRust marked this conversation as resolved.
Show resolved Hide resolved
Implementation details
****************************************************

In this implementation, we will choose to define the distributions
using the scale parameter (as this matches scipy's expectation),
rather than the square of the scale parameter:

.. math::

\mathcal{N}(\mu,\,\sigma)

The full equation when estimating the EMOS coefficients using
the ensemble mean is therefore:

.. math::

\mathcal{N}(a + b\bar{X}, \sqrt{c + dS^{2}})

This matches the equations noted in `Allen et al., 2021`_.

.. _Allen et al., 2021: https://doi.org/10.1002/qj.3983

****************************************************
Estimating EMOS coefficients using the ensemble mean
****************************************************
Expand All @@ -91,7 +114,7 @@ If the predictor is the ensemble mean, coefficients are estimated as

.. math::

\mathcal{N}(a + \bar{X}, c + dS^{2})
\mathcal{N}(a + b\bar{X}, \sqrt{c + dS^{2}})

where N is a chosen distribution and values of a, b, c and d are solved in the
format of :math:`\alpha, \beta, \gamma` and :math:`\delta`, see the equations
Expand Down Expand Up @@ -121,7 +144,7 @@ If the predictor is the ensemble realizations, coefficients are estimated for

.. math::

\mathcal{N}(a + b_1X_1 + ... + b_mX_m, c + dS^{2})
\mathcal{N}(a + b_1X_1 + ... + b_mX_m, \sqrt{c + dS^{2}})

where N is a chosen distribution, the values of a, b, c and d relate
to alpha, beta, gamma and delta through the equations above with
Expand All @@ -140,14 +163,14 @@ The EMOS coefficients represent adjustments to the ensemble mean and ensemble
variance, in order to generate the location and scale parameters that, for the
chosen distribution, minimise the CRPS. The coefficients can therefore be used
to construct the location parameter, :math:`\mu`, and scale parameter,
:math:`\sigma^{2}`, for the calibrated forecast from today's ensemble mean, or
:math:`\sigma`, for the calibrated forecast from today's ensemble mean, or
ensemble realizations, and the ensemble variance.

.. math::

\mu = a + b\bar{X}

\sigma^{2} = c + dS^{2}
\sigma = \sqrt{c + dS^{2}}

Note here that this procedure holds whether the distribution is normal, i.e.
where the application of the EMOS coefficients to the raw ensemble mean results
Expand Down
9 changes: 5 additions & 4 deletions improver/calibration/ensemble_calibration.py
Original file line number Diff line number Diff line change
Expand Up @@ -1581,9 +1581,10 @@ def _calculate_scale_parameter(self) -> ndarray:
)

# Calculating the scale parameter, based on the raw variance S^2,
fionaRust marked this conversation as resolved.
Show resolved Hide resolved
# where predicted variance = c + dS^2, where c = (gamma)^2 and
# d = (delta)^2
scale_parameter = (
# where predicted scale parameter (or equivalently standard deviation
# for a normal distribution) = sqrt(c + dS^2), where c = (gamma)^2 and
# d = (delta)^2.
scale_parameter = np.sqrt(
self.coefficients_cubelist.extract_cube("emos_coefficient_gamma").data
* self.coefficients_cubelist.extract_cube("emos_coefficient_gamma").data
+ self.coefficients_cubelist.extract_cube("emos_coefficient_delta").data
Expand Down Expand Up @@ -1622,7 +1623,7 @@ def _create_output_cubes(
)
scale_parameter_cube = create_new_diagnostic_cube(
"scale_parameter",
f"({template_cube.units})^2",
template_cube.units,
template_cube,
template_cube.attributes,
data=scale_parameter,
Expand Down
27 changes: 13 additions & 14 deletions improver/ensemble_copula_coupling/ensemble_copula_coupling.py
Original file line number Diff line number Diff line change
Expand Up @@ -871,10 +871,10 @@ def _location_and_scale_parameters_to_percentiles(
(len(percentiles_as_fractions), location_data.shape[0]), dtype=np.float32
)

self._rescale_shape_parameters(location_data, np.sqrt(scale_data))
self._rescale_shape_parameters(location_data, scale_data)

percentile_method = self.distribution(
*self.shape_parameters, loc=location_data, scale=np.sqrt(scale_data)
*self.shape_parameters, loc=location_data, scale=scale_data
)

# Loop over percentiles, and use the distribution as the
Expand All @@ -885,8 +885,9 @@ def _location_and_scale_parameters_to_percentiles(
result[index, :] = percentile_method.ppf(percentile_list)
# If percent point function (PPF) returns NaNs, fill in
# mean instead of NaN values. NaN will only be generated if the
# variance is zero. Therefore, if the variance is zero, the mean
# value is used for all gridpoints with a NaN.
# scale parameter (standard deviation) is zero. Therefore, if the
# scale parameter (standard deviation) is zero, the mean value is
# used for all gridpoints with a NaN.
if np.any(scale_data == 0):
nan_index = np.argwhere(np.isnan(result[index, :]))
result[index, nan_index] = location_data[nan_index]
Expand Down Expand Up @@ -1039,11 +1040,9 @@ def _check_unit_compatibility(
) -> None:
"""
The location parameter, scale parameters, and threshold values come
from three different cubes. They should all be in the same base unit,
with the units of the scale parameter being the squared units of the
location parameter and threshold values. This is a sanity check to
ensure the units are as expected, converting units of the location
parameter and scale parameter if possible.
from three different cubes. This is a sanity check to ensure the units
are as expected, converting units of the location parameter and
scale parameter if possible.

Args:
location_parameter:
Expand All @@ -1060,11 +1059,11 @@ def _check_unit_compatibility(

try:
location_parameter.convert_units(threshold_units)
scale_parameter.convert_units(threshold_units ** 2)
scale_parameter.convert_units(threshold_units)
except ValueError as err:
msg = (
"Error: {} This is likely because the mean "
"variance and template cube threshold units are "
"Error: {} This is likely because the location parameter, "
"scale parameter and template cube threshold units are "
"not equivalent/compatible.".format(err)
)
raise ValueError(msg)
Expand Down Expand Up @@ -1107,7 +1106,7 @@ def _location_and_scale_parameters_to_probabilities(
relative_to_threshold = probability_is_above_or_below(probability_cube_template)

self._rescale_shape_parameters(
location_parameter.data.flatten(), np.sqrt(scale_parameter.data).flatten()
location_parameter.data.flatten(), scale_parameter.data.flatten()
)

# Loop over thresholds, and use the specified distribution with the
Expand All @@ -1118,7 +1117,7 @@ def _location_and_scale_parameters_to_probabilities(
distribution = self.distribution(
*self.shape_parameters,
loc=location_parameter.data.flatten(),
scale=np.sqrt(scale_parameter.data.flatten()),
scale=scale_parameter.data.flatten(),
)

probability_method = distribution.cdf
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -169,9 +169,9 @@ def setUp(self):
)
self.expected_scale_param_mean = np.array(
[
[0.2316, 0.2342, 0.0168],
[0.0271, 0.0237, 0.0168],
[0.0634, 0.1151, 0.0116],
[0.4813, 0.4840, 0.1295],
[0.1647, 0.1538, 0.1295],
[0.2517, 0.3393, 0.1076],
],
dtype=np.float32,
)
Expand All @@ -188,7 +188,7 @@ def setUp(self):
)

self.expected_scale_param_realizations_sites = np.array(
[0, 0, 0, 0], dtype=np.float32
[0.0005, 0.0005, 0.0005, 0.0005], dtype=np.float32
)

self.expected_loc_param_mean_alt = np.array(
Expand All @@ -202,9 +202,9 @@ def setUp(self):

self.expected_scale_param_mean_alt = np.array(
[
[0.4347, 0.4396, 0.0308],
[0.0503, 0.0438, 0.0308],
[0.1184, 0.2157, 0.0211],
[0.6593, 0.663, 0.1756],
[0.2242, 0.2093, 0.1756],
[0.3441, 0.4645, 0.1452],
],
dtype=np.float32,
)
Expand All @@ -219,7 +219,7 @@ def setUp(self):
self.expected_scale_param_mean_cube = set_up_variable_cube(
self.expected_scale_param_mean,
name="scale_parameter",
units="Kelvin^2",
units="K",
attributes=MANDATORY_ATTRIBUTE_DEFAULTS,
)

Expand Down
Loading