Skip to content

Commit

Permalink
Task 1455 doc (#1585)
Browse files Browse the repository at this point in the history
* first stab at converting to sphinx

* removing all slashes

* adding new link to README.rst file

* working on lists

* Made formatting changes

* Finished fcst section

* fixing spelling, bolding and italics issues

* updating web links

* working on formatting

* updating formatting

* formatting

* first attempt to clean up formatting completed.

* adding README to the index file

* fixing warning errors

* Bringing README_TC into sphinx.  Updating section headers

* Adding README_TC

* Made formatting updates to README.rst

* corrected section under wavelet

* small changes

* removing met/data/config/README since it is now in met/docs/Users_Guide

* Added some formatting for headers

* fixing chapters & sections

* Fixed warnings from building

* adding in code blocks

* removing slashes

* changes

* Made changes to formatting

* removing For example code blocks

* major updates

* fist pass at document conversion complete.

* cleaning up questions about dashes

* Made some formatting modifications

* Removing README_TC because it is being replaced by README_TC.rst in met/docs/Users_Guide

* Removing the reference to the README_TC file

* Making title capitalization consistent with README

* Added a space in timestring

* changing to 'time string' with a space between the words.

* adding a link to the new README_TC location in met/docs/Users_Guide

* Modified references to README and README_TC

* small formatting changes

* small formatting changes

* fixing tabs

* fixing spacing around number 11

* removing parenthesis around reference dates.

* adding parenthesis back in.

* fixing references

* updating references

* Update appendixC.rst

Removed space from "HAUSDOR FF"

* Update plotting.rst

Changed a couple of references of Plot_Point_Obs to Plot-Point-Obs

* Update point-stat.rst

Added oxford commas

Co-authored-by: Julie.Prestopnik <jpresto@ucar.edu>
  • Loading branch information
lisagoodrich and jprestop authored Dec 1, 2020
1 parent 6c8f4a2 commit d7ecff3
Show file tree
Hide file tree
Showing 17 changed files with 97 additions and 91 deletions.
2 changes: 1 addition & 1 deletion met/docs/Users_Guide/appendixA.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ A. Currently, very few graphics are included. The plotting tools (plot_point_obs

**Q. How do I find the version of the tool I am using?**

A. Type the name of the tool followed by -version. For example, type “pb2nc -version”.
A. Type the name of the tool followed by **-version**. For example, type “pb2nc **-version**”.

**Q. What are MET's conventions for latitude, longitude, azimuth and bearing angles?**

Expand Down
27 changes: 14 additions & 13 deletions met/docs/Users_Guide/appendixC.rst
Original file line number Diff line number Diff line change
Expand Up @@ -251,14 +251,14 @@ OR measures the ratio of the odds of a forecast of the event being correct to th

.. math:: \text{OR } = \frac{n_{11} \times n_{00}}{n_{10} \times n_{01}} = \frac{(\frac{\text{POD}}{1 - \text{POD}})}{(\frac{\text{POFD}}{1 - \text{POFD}})}.

OR can range from 0 to :math:`\infty`. A perfect forecast would have a value of OR = infinity. OR is often expressed as the log Odds Ratio or as the Odds Ratio Skill Score (:ref:`Stephenson 2000 <Stephenson-2000>`).
OR can range from 0 to :math:`\infty`. A perfect forecast would have a value of OR = infinity. OR is often expressed as the log Odds Ratio or as the Odds Ratio Skill Score (:ref:`Stephenson, 2000 <Stephenson-2000>`).

Logarithm of the Odds Ratio (LODDS)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Called "LODDS" in CTS output :numref:`table_PS_format_info_CTS`

LODDS transforms the odds ratio via the logarithm, which tends to normalize the statistic for rare events (:ref:`Stephenson 2000 <Stephenson-2000>`). However, it can take values of :math:`\pm\infty` when any of the contingency table counts is 0. LODDS is defined as :math:`\text{LODDS} = ln(OR)`.
LODDS transforms the odds ratio via the logarithm, which tends to normalize the statistic for rare events (:ref:`Stephenson, 2000 <Stephenson-2000>`). However, it can take values of :math:`\pm\infty` when any of the contingency table counts is 0. LODDS is defined as :math:`\text{LODDS} = ln(OR)`.

Odds Ratio Skill Score (ORSS)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -269,7 +269,7 @@ ORSS is a skill score based on the odds ratio. ORSS is defined as

.. math:: \text{ORSS } = \frac{OR - 1}{OR + 1}.

ORSS is sometimes also referred to as Yule's Q. (:ref:`Stephenson 2000 <Stephenson-2000>`).
ORSS is sometimes also referred to as Yule's Q. (:ref:`Stephenson, 2000 <Stephenson-2000>`).

Extreme Dependency Score (EDS)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -280,7 +280,7 @@ The extreme dependency score measures the association between forecast and obser

.. math:: \text{EDS } = \frac{2 ln(\frac{n_{11} + n_{01}}{T})}{ln(\frac{n_{11}}{T})} - 1.

EDS can range from -1 to 1, with 0 representing no skill. A perfect forecast would have a value of EDS = 1. EDS is independent of bias, so should be presented along with the frequency bias statistic (:ref:`Stephenson et al, 2008 <Stephenson-2008>`).
EDS can range from -1 to 1, with 0 representing no skill. A perfect forecast would have a value of EDS = 1. EDS is independent of bias, so should be presented along with the frequency bias statistic (:ref:`Stephenson et al., 2008 <Stephenson-2008>`).

Extreme Dependency Index (EDI)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -324,7 +324,7 @@ Bias Adjusted Gilbert Skill Score (GSS)

Called "BAGSS" in CTS output :numref:`table_PS_format_info_CTS`

BAGSS is based on the GSS, but is corrected as much as possible for forecast bias (:ref:`Brill and Mesinger, 2009<Brill-2009>`).
BAGSS is based on the GSS, but is corrected as much as possible for forecast bias (:ref:`Brill and Mesinger, 2009 <Brill-2009>`).

Economic Cost Loss Relative Value (ECLV)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -344,7 +344,7 @@ For cost / loss ratio above the base rate, the ECLV is defined as:
MET verification measures for continuous variables
__________________________________________________

For continuous variables, many verification measures are based on the forecast error (i.e., **f - o**). However, it also is of interest to investigate characteristics of the forecasts, and the observations, as well as their relationship. These concepts are consistent with the general framework for verification outlined by :ref:`Murphy and Winkler (1987) <Murphy-1987>`. The statistics produced by MET for continuous forecasts represent this philosophy of verification, which focuses on a variety of aspects of performance rather than a single measure.
For continuous variables, many verification measures are based on the forecast error (i.e., **f - o**). However, it also is of interest to investigate characteristics of the forecasts, and the observations, as well as their relationship. These concepts are consistent with the general framework for verification outlined by :ref:`Murphy and Winkler, (1987) <Murphy-1987>`. The statistics produced by MET for continuous forecasts represent this philosophy of verification, which focuses on a variety of aspects of performance rather than a single measure.

The verification measures currently evaluated by the Point-Stat tool are defined and described in the subsections below. In these definitions, **f** represents the forecasts, **o** represents the observation, and **n** is the number of forecast-observation pairs.

Expand Down Expand Up @@ -567,7 +567,7 @@ Partial Sums lines (SL1L2, SAL1L2, VL1L2, VAL1L2)

The SL1L2, SAL1L2, VL1L2, and VAL1L2 line types are used to store data summaries (e.g. partial sums) that can later be accumulated into verification statistics. These are divided according to scalar or vector summaries (S or V). The climate anomaly values (A) can be stored in place of the actuals, which is just a re-centering of the values around the climatological average. L1 and L2 refer to the L1 and L2 norms, the distance metrics commonly referred to as the “city block” and “Euclidean” distances. The city block is the absolute value of a distance while the Euclidean distance is the square root of the squared distance.

The partial sums can be accumulated over individual cases to produce statistics for a longer period without any loss of information because these sums are *sufficient* for resulting statistics such as RMSE, bias, correlation coefficient, and MAE (:ref:`Mood et al, 1974 <Mood-1974>`). Thus, the individual errors need not be stored, all of the information relevant to calculation of statistics are contained in the sums. As an example, the sum of all data points and the sum of all squared data points (or equivalently, the sample mean and sample variance) are *jointly sufficient* for estimates of the Gaussian distribution mean and variance.
The partial sums can be accumulated over individual cases to produce statistics for a longer period without any loss of information because these sums are *sufficient* for resulting statistics such as RMSE, bias, correlation coefficient, and MAE (:ref:`Mood et al., 1974 <Mood-1974>`). Thus, the individual errors need not be stored, all of the information relevant to calculation of statistics are contained in the sums. As an example, the sum of all data points and the sum of all squared data points (or equivalently, the sample mean and sample variance) are *jointly sufficient* for estimates of the Gaussian distribution mean and variance.

*Minimally sufficient* statistics are those that condense the data most, with no loss of information. Statistics based on L1 and L2 norms allow for good compression of information. Statistics based on other norms, such as order statistics, do not result in good compression of information. For this reason, statistics such as RMSE are often preferred to statistics such as the median absolute deviation. The partial sums are not sufficient for order statistics, such as the median or quartiles.

Expand Down Expand Up @@ -655,7 +655,7 @@ Gradient values

Called "TOTAL", "FGBAR", "OGBAR", "MGBAR", "EGBAR", "S1", "S1_OG", and "FGOG_RATIO" in GRAD output :numref:`table_GS_format_info_GRAD`

These statistics are only computed by the Grid_Stat tool and require vectors. Here :math:`\nabla` is the gradient operator, which in this applications signifies the difference between adjacent grid points in both the grid-x and grid-y directions. TOTAL is the count of grid locations used in the calculations. The remaining measures are defined below:
These statistics are only computed by the Grid-Stat tool and require vectors. Here :math:`\nabla` is the gradient operator, which in this applications signifies the difference between adjacent grid points in both the grid-x and grid-y directions. TOTAL is the count of grid locations used in the calculations. The remaining measures are defined below:

.. math:: \text{FGBAR} = \text{Mean}|\nabla f| = \frac{1}{n} \sum_{i=1}^n | \nabla f_i|

Expand Down Expand Up @@ -797,7 +797,7 @@ Calibration

Called "CALIBRATION" in PJC output :numref:`table_PS_format_info_PJC`

Calibration is the conditional probability of an event given each probability forecast category (i.e. each row in the **nx2** contingency table). This set of measures is paired with refinement in the calibration-refinement factorization discussed in :ref:`Wilks (2011) <Wilks-2011>`. A well-calibrated forecast will have calibration values that are near the forecast probability. For example, a 50% probability of precipitation should ideally have a calibration value of 0.5. If the calibration value is higher, then the probability has been underestimated, and vice versa.
Calibration is the conditional probability of an event given each probability forecast category (i.e. each row in the **nx2** contingency table). This set of measures is paired with refinement in the calibration-refinement factorization discussed in :ref:`Wilks, (2011) <Wilks-2011>`. A well-calibrated forecast will have calibration values that are near the forecast probability. For example, a 50% probability of precipitation should ideally have a calibration value of 0.5. If the calibration value is higher, then the probability has been underestimated, and vice versa.

.. math:: \text{Calibration}(i) = \frac{n_{i1}}{n_{1.}} = \text{probability}(o_1|p_i)

Expand Down Expand Up @@ -879,7 +879,7 @@ CRPS

Called "CRPS" in ECNT output :numref:`table_ES_header_info_es_out_ECNT`

The continuous ranked probability score (CRPS) is the integral, over all possible thresholds, of the Brier scores (:ref:`Gneiting et al, 2004 <Gneiting-2004>`). In MET, the CRPS calculation uses a normal distribution fit to the ensemble forecasts. In many cases, use of other distributions would be better.
The continuous ranked probability score (CRPS) is the integral, over all possible thresholds, of the Brier scores (:ref:`Gneiting et al., 2004 <Gneiting-2004>`). In MET, the CRPS calculation uses a normal distribution fit to the ensemble forecasts. In many cases, use of other distributions would be better.

WARNING: The normal distribution is probably a good fit for temperature and pressure, and possibly a not horrible fit for winds. However, the normal approximation will not work on most precipitation forecasts and may fail for many other atmospheric variables.

Expand Down Expand Up @@ -907,7 +907,7 @@ IGN

Called "IGN" in ECNT output :numref:`table_ES_header_info_es_out_ECNT`

The ignorance score (IGN) is the negative logarithm of a predictive probability density function (:ref:`Gneiting et al, 2004 <Gneiting-2004>`). In MET, the IGN is calculated based on a normal approximation to the forecast distribution (i.e. a normal pdf is fit to the forecast values). This approximation may not be valid, especially for discontinuous forecasts like precipitation, and also for very skewed forecasts. For a single normal distribution **N** with parameters :math:`\mu \text{ and } \sigma`, the ignorance score is
The ignorance score (IGN) is the negative logarithm of a predictive probability density function (:ref:`Gneiting et al., 2004 <Gneiting-2004>`). In MET, the IGN is calculated based on a normal approximation to the forecast distribution (i.e. a normal pdf is fit to the forecast values). This approximation may not be valid, especially for discontinuous forecasts like precipitation, and also for very skewed forecasts. For a single normal distribution **N** with parameters :math:`\mu \text{ and } \sigma`, the ignorance score is

.. math:: \text{ign} (N( \mu, \sigma),y) = \frac{1}{2} \ln (2 \pi \sigma^2 ) + \frac{(y - \mu)^2}{\sigma^2}.

Expand Down Expand Up @@ -975,7 +975,7 @@ The traditional contingency table statistics computed by the Grid-Stat neighborh

All of these measures are defined in :numref:`categorical variables`.

In addition to these standard statistics, the neighborhood analysis provides additional continuous measures, the Fractions Brier Score and the Fractions Skill Score. For reference, the Asymptotic Fractions Skill Score and Uniform Fractions Skill Score are also calculated. These measures are defined here, but are explained in much greater detail in :ref:`Ebert (2008) <Ebert-2008>` and :ref:`Roberts and Lean 2008 <Roberts-2008>`. Roberts and Lean (2008) also present an application of the methodology.
In addition to these standard statistics, the neighborhood analysis provides additional continuous measures, the Fractions Brier Score and the Fractions Skill Score. For reference, the Asymptotic Fractions Skill Score and Uniform Fractions Skill Score are also calculated. These measures are defined here, but are explained in much greater detail in :ref:`Ebert (2008) <Ebert-2008>` and :ref:`Roberts and Lean (2008) <Roberts-2008>`. :ref:`Roberts and Lean (2008) <Roberts-2008>` also present an application of the methodology.

Fractions Brier Score
~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -1047,7 +1047,8 @@ The results of the distance map verification approaches that are included in the
Baddeley's :math:`\Delta` Metric and Hausdorff Distance
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Called “BADDELEY” and “HAUSDORFF” in the DMAP output :numref:`table_GS_format_info_DMAP`
Called “BADDELEY” and “HAUSDORFF” in the DMAP
output :numref:`table_GS_format_info_DMAP`

The Baddeley's :math:`\Delta` Metric is given by

Expand Down
6 changes: 3 additions & 3 deletions met/docs/Users_Guide/appendixF.rst
Original file line number Diff line number Diff line change
Expand Up @@ -163,15 +163,15 @@ On the command line for any of the MET tools which will be obtaining its data fr

___________________

Listed below is an example of running the **plot_data_plane** tool to call a Python script for data that is included with the MET release tarball. Assuming the MET executables are in your path, this example may be run from the top-level MET source code directory.
Listed below is an example of running the Plot-Data-Plane tool to call a Python script for data that is included with the MET release tarball. Assuming the MET executables are in your path, this example may be run from the top-level MET source code directory.

.. code-block:: none
plot_data_plane PYTHON_NUMPY fcst.ps \
'name="scripts/python/read_ascii_numpy.py data/python/fcst.txt FCST";' \
-title "Python enabled plot_data_plane"
The first argument for the **plot_data_plane** tool is the gridded data file to be read. When calling a NumPy Python script, set this to the constant string PYTHON_NUMPY. The second argument is the name of the output PostScript file to be written. The third argument is a string describing the data to be plotted. When calling a Python script, set **name** to the Python script to be run along with command line arguments. Lastly, the **-title** option is used to add a title to the plot. Note that any print statements included in the Python script will be printed to the screen. The above example results in the following log messages.
The first argument for the Plot-Data-Plane tool is the gridded data file to be read. When calling a NumPy Python script, set this to the constant string PYTHON_NUMPY. The second argument is the name of the output PostScript file to be written. The third argument is a string describing the data to be plotted. When calling a Python script, set **name** to the Python script to be run along with command line arguments. Lastly, the **-title** option is used to add a title to the plot. Note that any print statements included in the Python script will be printed to the screen. The above example results in the following log messages.

.. code-block:: none
Expand All @@ -191,7 +191,7 @@ The first argument for the **plot_data_plane** tool is the gridded data file to

The second option was added to support the use of Python embedding in tools which read multiple input files. Option 1 reads a single field of data from a single source, whereas tools like Ensemble-Stat, Series-Analysis, and MTD read data from multiple input files. While option 2 can be used in any of the MET tools, it is required for Python embedding in Ensemble-Stat, Series-Analysis, and MTD.

On the command line for any of the MET tools, specify the path to the input gridded data file(s) as the usage statement for the tool indicates. Do **not** substitute in PYTHON_NUMPY or PYTHON_XARRAY on the command line. In the config file dictionary set the **file_type** entry to either PYTHON_NUMPY or PYTHON_XARRAY to activate the Python embedding logic. Then, in the **name** entry of the config file dictionaries for the forecast or observation data, list the Python script to be run followed by any command line arguments for that script. However, in the Python command, replace the name of the input gridded data file with the constant string MET_PYTHON_INPUT_ARG. When looping over multiple input files, the MET tools will replace that constant **MET_PYTHON_INPUT_ARG** with the path to the file currently being processed. The example **plot_data_plane** command listed below yields the same result as the example shown above, but using the option 2 logic instead.
On the command line for any of the MET tools, specify the path to the input gridded data file(s) as the usage statement for the tool indicates. Do **not** substitute in PYTHON_NUMPY or PYTHON_XARRAY on the command line. In the config file dictionary set the **file_type** entry to either PYTHON_NUMPY or PYTHON_XARRAY to activate the Python embedding logic. Then, in the **name** entry of the config file dictionaries for the forecast or observation data, list the Python script to be run followed by any command line arguments for that script. However, in the Python command, replace the name of the input gridded data file with the constant string MET_PYTHON_INPUT_ARG. When looping over multiple input files, the MET tools will replace that constant **MET_PYTHON_INPUT_ARG** with the path to the file currently being processed. The example plot_data_plane command listed below yields the same result as the example shown above, but using the option 2 logic instead.

The Ensemble-Stat, Series-Analysis, and MTD tools support the use of file lists on the command line, as do some other MET tools. Typically, the ASCII file list contains a list of files which actually exist on your machine and should be read as input. For Python embedding, these tools loop over the ASCII file list entries, set MET_PYTHON_INPUT_ARG to that string, and execute the Python script. This only allows a single command line argument to be passed to the Python script. However multiple arguments may be concatenated together using some delimiter, and the Python script can be defined to parse arguments using that delimiter. When file lists are constructed in this way, the entries will likely not be files which actually exist on your machine. In this case, users should place the constant string "file_list" on the first line of their ASCII file lists. This will ensure that the MET tools will parse the file list properly.

Expand Down
Loading

0 comments on commit d7ecff3

Please sign in to comment.