diff --git a/docs/getting_started/static/output_13_0.png b/docs/getting_started/static/output_13_0.png new file mode 100644 index 00000000..d24cee8a Binary files /dev/null and b/docs/getting_started/static/output_13_0.png differ diff --git a/docs/getting_started/static/output_16_0.png b/docs/getting_started/static/output_16_0.png new file mode 100644 index 00000000..ed5e5182 Binary files /dev/null and b/docs/getting_started/static/output_16_0.png differ diff --git a/docs/getting_started/static/output_19_0.png b/docs/getting_started/static/output_19_0.png new file mode 100644 index 00000000..701b300c Binary files /dev/null and b/docs/getting_started/static/output_19_0.png differ diff --git a/docs/getting_started/static/output_22_0.png b/docs/getting_started/static/output_22_0.png new file mode 100644 index 00000000..0b2ac5fb Binary files /dev/null and b/docs/getting_started/static/output_22_0.png differ diff --git a/docs/getting_started/static/output_27_0.png b/docs/getting_started/static/output_27_0.png new file mode 100644 index 00000000..02ef235b Binary files /dev/null and b/docs/getting_started/static/output_27_0.png differ diff --git a/docs/getting_started/static/output_30_0.png b/docs/getting_started/static/output_30_0.png new file mode 100644 index 00000000..6e1a5f8d Binary files /dev/null and b/docs/getting_started/static/output_30_0.png differ diff --git a/docs/getting_started/static/output_33_0.png b/docs/getting_started/static/output_33_0.png new file mode 100644 index 00000000..39228816 Binary files /dev/null and b/docs/getting_started/static/output_33_0.png differ diff --git a/docs/getting_started/static/output_36_0.png b/docs/getting_started/static/output_36_0.png new file mode 100644 index 00000000..678bb654 Binary files /dev/null and b/docs/getting_started/static/output_36_0.png differ diff --git a/docs/getting_started/static/output_6_0.png b/docs/getting_started/static/output_6_0.png new file mode 100644 index 00000000..4e6ec6db Binary files /dev/null and b/docs/getting_started/static/output_6_0.png differ diff --git a/docs/getting_started/static/output_9_0.png b/docs/getting_started/static/output_9_0.png new file mode 100644 index 00000000..6eee5a50 Binary files /dev/null and b/docs/getting_started/static/output_9_0.png differ diff --git a/docs/getting_started/theory b/docs/getting_started/theory new file mode 100644 index 00000000..e69de29b diff --git a/docs/getting_started/theory.rst b/docs/getting_started/theory.rst new file mode 100644 index 00000000..6141a80d --- /dev/null +++ b/docs/getting_started/theory.rst @@ -0,0 +1,1095 @@ +Theory of CSEP Tests +==================== + +This page describes the theory of each of the forecast tests +included in pyCSEP along with working code examples. You will find +information on the goals of each test, the theory behind the tests, how +the tests are applied in practice, and how forecasts are ‘scored’ given +the test results. Also, we include the code required to run in each test +and a description of how to interpret the test results. + +.. code:: ipython3 + + import csep + from csep.core import ( + regions, + catalog_evaluations, + poisson_evaluations as poisson + ) + from csep.utils import ( + datasets, + time_utils, + comcat, + plots, + readers + ) + + # Filters matplotlib warnings + import warnings + warnings.filterwarnings('ignore') + +Grid-based Forecast Tests +------------------------- + +These tests are designed for grid-based forecasts (e.g., Schorlemmer et +al., 2007), where expected rates are provided in discrete Poisson +space-magnitude cells covering the region of interest. The region +:math:`\boldsymbol{R}` is then the product of the spatial rate +:math:`\boldsymbol{S}` and the binned magnitude rate +:math:`\boldsymbol{M}`, + +.. math:: \boldsymbol{R} = \boldsymbol{M} \times \boldsymbol{S}. + +A forecast :math:`\boldsymbol{\Lambda}` can be fully specified as the +expected number of events (or rate) in each space-magnitude bin +(:math:`m_i, s_j`) covering the region :math:`\boldsymbol{R}` and +therefore can be written as + +.. math:: \boldsymbol{\Lambda} = \{ \lambda_{m_i, s_j}| m_i \in \boldsymbol{M}, s_j \in \boldsymbol{S} \}, + +where :math:`\lambda_{m_i, s_j}` is the expected rate of events in +magnitude bin :math:`m_i` and spatial bin :math:`s_j`. The observed +catalogue of events :math:`\boldsymbol{\Omega}` we use to evaluate the +forecast is similarly discretised into the same space-magnitude bins, +and can be described as + +.. math:: \boldsymbol{\Omega} = \{ \omega_{m_i, s_j}| m_i \in \boldsymbol{M}, s_j \in \boldsymbol{S} \}, + +where :math:`\omega_{m_i, s_j}` is the observed number of +events in spatial cell :math:`s_j` and magnitude bin :math:`m_i`. The +magnitude bins are specified in the forecast: typically these are in 0.1 +increments and this is the case in the examples we use here. These +examples use the Helmstetter et al (2007) smoothed seismicity forecast +(including aftershocks), testing over a 5 year period between 2010 and +2015. + +.. code:: ipython3 + + # Set up experiment parameters + start_date = time_utils.strptime_to_utc_datetime('2010-01-01 00:00:00.0') + end_date = time_utils.strptime_to_utc_datetime('2015-01-01 00:00:00.0') + + # Loads from the PyCSEP package + helmstetter = csep.load_gridded_forecast( + datasets.helmstetter_aftershock_fname, + start_date=start_date, + end_date=end_date, + name='helmstetter_aftershock' + ) + + # Set up evaluation catalog + catalog = csep.query_comcat(helmstetter.start_time, helmstetter.end_time, + min_magnitude=helmstetter.min_magnitude) + + # Filter evaluation catalog + catalog = catalog.filter_spatial(helmstetter.region) + + # Add seed for reproducibility in simulations + seed = 123456 + + # Number of simulations for Poisson consistency tests + nsim = 100000 + + +.. parsed-literal:: + + Fetched ComCat catalog in 5.9399449825286865 seconds. + + Downloaded catalog from ComCat with following parameters + Start Date: 2010-01-10 00:27:39.320000+00:00 + End Date: 2014-08-24 10:20:44.070000+00:00 + Min Latitude: 31.9788333 and Max Latitude: 41.1431667 + Min Longitude: -125.3308333 and Max Longitude: -115.0481667 + Min Magnitude: 4.96 + Found 24 events in the ComCat catalog. + + +Consistency tests +~~~~~~~~~~~~~~~~~ + +The consistency tests evaluate the consistency of a forecast against +observed earthquakes. These tests were developed across a range of +experiments and publications (Schorlemmer et al, 2007; Zechar et al +2010; Werner et al, 2011a). The consistency tests are based on the +likelihood of observing the catalogue (actual recorded events) given the +forecast. Since the space-magnitude bins are assumed to be independent, +the joint-likelihood of observing the events in each individual bin +given the specified forecast can be written as + +.. math:: Pr(\omega_1 | \lambda_1) Pr(\omega_2 | \lambda_2)...Pr(\omega_n | \lambda_n) = \prod_{m_i , s_j \in \boldsymbol{R}} f_{m_i, s_j}(\omega(m_i, s_j)), + +where :math:`f_{m_i, s_j}` specifies the probability distribution in +each space-magnitude bin. We prefer to use the joint log-likelihood in +order to sum log-likelihoods rather than multiply the likelihoods. The +joint log-likelihood can be written as: + +.. math:: L(\boldsymbol{\Omega} | \boldsymbol{\Lambda}) = \sum_{m_i , s_j \in \boldsymbol{R}} log(f_{m_i, s_j}(\omega(m_i, s_j)). + +The likelihood of the observations, :math:`\boldsymbol{\Omega}`, given +the forecast :math:`\boldsymbol{\Lambda}` is the sum over all +space-magnitude bins of the log probabilities in individual cells of the +forecast. Grid-based forecasts are specified by the expected number of +events in a discrete space-magnitude bin. From the maximum entropy +principle, we assign a Poisson distribution in each bin. In this case, +the probability of an event occurring is independent of the time since +the last event, and events occur at a rate :math:`\lambda`. The +Poissonian joint log-likelihood can be written as + +.. math:: L(\boldsymbol{\Omega} | \boldsymbol{\Lambda}) = \sum_{m_i , s_j \in \boldsymbol{R}} -\lambda(m_i, s_j) + \omega(m_i, s_j)\log(\lambda(m_i, s_j)) - log(\omega(m_i, s_j)!), + +where :math:`\lambda(m_i, s_j)` and :math:`\omega(m_i, s_j)` are the +expected counts from the forecast and observed counts in cell +:math:`m_i, s_j` respectively. We can calculate the likelihood directly +given the forecast and discretised observations. + +Forecast uncertainty + +A simulation based approach is used to account for uncertainty in the +forecast. We simulate realizations of catalogs that are consistent with +the forecast to obtain distributions of scores. In the pyCSEP package, +as in the original CSEP tests, simulation is carried out using the +cumulative probability density of the forecast obtained by ordering the +rates in each bin. We shall call :math:`F_{m_is_j}` the cumulative +probability density in cell :math:`(m_i, s_j)`. The simulation approach +then works as follows: + +- For each forecast bin, draw a random number :math:`z` from a uniform + distribution between 0 and 1 +- Assign this event to a space-magnitude bin through the inverse + cumulative density distribution at this point + :math:`F^{-1}_{m_i, s_j}(z)` +- Iterate over all simulated events to generate a catalog containing + :math:`N_{sim}` events consistent with the forecast + +For each of these tests, we can plot the distribution of likelihoods +computed from theses simulated catalogs relative to the observations +using the ``plots.plot_poisson_consistency_test`` function. We also +calculate a quantile score to diagnose a particular forecast with +repsect. The number of simulations can be supplied to the Poisson +consistency test functions using the ``num_simulations`` argument: for +best results we suggest 100,000 simulations to ensure convergence. + +Scoring the tests + +Through simulation (as described above), we obtain a set of simulated +catalogs :math:`\{\hat{\boldsymbol{\Omega}}\}`. Each catalogue can be +written as + +.. math:: \hat{\boldsymbol{\Omega}}_x =\{ \hat{\lambda}_x(m_i, s_j)|(m_i, s_j) \in \boldsymbol{R}\}, + +where :math:`\hat{\lambda}_x(m_i, s_j)` is the number of +simulated earthquakes in cell :math:`(m_i, s_j)` of (simulated) catalog +:math:`x` that is consistent with the forecast :math:`\Lambda`. We then +compute the joint log-likelihood for each simulated catalogue +:math:`\hat{L}_x = L(\hat{\Omega}_x|\Lambda)`. The joint log-likelihood +for each simulated catalogue given the forecast gives us a set of +log-likelihoods :math:`\{\hat{\boldsymbol{L}}\}` that represents the +range of log-likelihoods consistent with the forecast. We then compare +our simulated log-likelihoods with the observed log-likelihood +:math:`L_{obs} = L(\boldsymbol{\Omega}|\boldsymbol{\Lambda})` using a +quantile score. + +The quantile score is defined by the fraction of simulated joint +log-likelihoods less than or equal to the observed likelihood. + +.. math:: \gamma = \frac{ |\{ \hat{L}_x | \hat{L}_x \le L_{obs}\} |}{|\{ \hat{\boldsymbol{L}} \}|} + +Whether a forecast can be said to pass an evaluation depends on the +significance level chosen for the testing process. The quantile score +explicitly tells us something about the significance of the result: the +observation is consistent with the forecast with :math:`100(1-\gamma)\%` +confidence (Zechar, 2011). Low :math:`\gamma` values demonstrate that +the observed likelihood score is less than most of the simulated +catalogs. The consistency tests, excluding the N-test, are considered to +be one-sided tests: values which are too small are ruled inconsistent +with the forecast, but very large values may not necessarily be +inconsistent with the forecast and additional testing should be used to +further clarify this (Schorlemmer et al, 2007). + +Different CSEP experiments have used different sensitivity values. +Schorlemmer et al (2010b) consider :math:`\gamma \lt 0.05` while the +implementation in the Italian CSEP testing experiment uses +:math:`\gamma` < 0.01 (Taroni et al, 2018). However, the consistency +tests are most useful as diagnostic tools where the quantile score +assesses the level of consistency between observations and data. +Temporal variations in seismicity make it difficult to formally reject a +model from a consistency test over a single evaluation period. + +Likelihood-test (L-test) +^^^^^^^^^^^^^^^^^^^^^^^^ + +Aim: Evaluate the likelihood of observed events given the provided +forecast - this includes the rate, spatial distribution and magnitude +components of the forecast. + +Method: The L-test is one of the original forecast tests described in +Schorlemmer et al, 2007. The likelihood of the observation given the +model is described by a Poisson likelihood function in each cell and the +total joint likelihood described by the product over all bins, or the +sum of the log-likelihoods (see above, or Zechar 2011 for more details). + +Note: The likelihood scores are dominated by the rate-component of the +forecast. This causes issues in scoring forecasts where the expected +number of events are different from the observed number of events. We +suggest to use the N-test (below) and CL-test (below) independently to +score the rate component, and spatial-magnitude components of the +forecast. This behavior can be observed by comparing the CL-test and +N-test results with the L-test results in this notebook. Since the +forecast overpredicts the rate of events during this testing period, the +L-test provides a passing score even though the space-magnitude and rate +components perform poorly during this evaluation period. + +pyCSEP implementation + +pyCSEP uses the forecast and catalog and returns the test distribution, +observed statistic and quantile score, which can be accessed from the +``likelihood_test_result`` object. We can pass this directly to the +plotting function, specifying that the test should be one-sided. + +.. code:: ipython3 + + likelihood_test_result = poisson.likelihood_test( + helmstetter, + catalog, + seed=seed, + num_simulations=nsim + ) + ax = plots.plot_poisson_consistency_test( + likelihood_test_result, + one_sided_lower=True, + plot_args={'title': r'$\mathcal{L}-\mathrm{test}$', 'xlabel': 'Log-likelihood'} + ) + + + +.. image:: output_6_0.png + + +pyCSEP plots the resulting :math:`95\%` range of likelihoods returned by +the simulation with the black bar by default. The observed likelihood +score is shown by a green square where the forecast passes the test and +a red circle where the observed likelihood is outside the likelihood +distribution. + + CL-test +^^^^^^^ + +Aim: The original likelihood test described above gives a result that +combines the spatial, magnitude and number components of a forecast. The +conditional likelihood or CL-Test was developed to test the spatial and +magnitude performance of a forecast without the influence of the number +of events (Werner et al. 2011a, 2011b). By conditioning the test +distribution on the observed number of events we elimiate the dependency +with the forecasted number of events as described above. + +| Method +| The CL-test is computed in the same way as the L-test, but with the + number of events normalised to the observed catalog :math:`N_{obs}` + during the simulation stage. The quantile score is then calculated + similarly such that + +.. math:: \gamma_{CL} = \frac{ |\{ \hat{CL}_x | \hat{CL}_x \le CL_{obs}\} |}{|\{ \hat{\boldsymbol{CL}} \}|}. + +Implementation in pyCSEP + +.. code:: ipython3 + + cond_likelihood_test_result = poisson.conditional_likelihood_test( + helmstetter, + catalog, + seed=seed, + num_simulations=nsim + ) + ax = plots.plot_poisson_consistency_test( + cond_likelihood_test_result, + one_sided_lower=True, + plot_args = {'title': r'$CL-\mathrm{test}$', 'xlabel': 'conditional log-likelihood'} + ) + + + +.. image:: output_9_0.png + + +Again, the :math:`95\%` confidence range of likelihoods is shown by the +black bar, and the symbol reflects the observed conditional-likelihood +score. In this case, the observed conditional-likelihood is shown with +the red circle, which falls outside the range of likelihoods simulated +from the forecast. To understand why the L- and CL-tests give different +results, consider the results of the N-test and S-test in the following +sections. + +N-test +^^^^^^ + +Aim: The number or N-test is the most conceptually simple test of a +forecast: To test whether the number of observed events is consistent +with that of the forecast. + +Method: The originial N-test was introduced by Schorlemmer et al (2007) +and modified by Zechar et al (2010). The observed number of events is +given by, + +.. math:: N_{obs} = \sum_{m_i, s_j \in R} \omega(m_i, s_j). + +Using the simulations described above, the expected number of events is +calculated by summing the simulated number of events over all grid cells + +.. math:: \hat{N_x} = \sum_{m_i, s_j \in R} \hat{\omega}_x(m_i, s_j), + +where :math:`\hat{\omega}_x(m_i, s_j)` is the simulated number of events +in catalog :math:`x` in spatial cell :math:`s_j` and magnitude cell +:math:`m_i`, generating a set of simulated rates :math:`\{ \hat{N} \}`. +We can then calculate the probability of i) observing at most +:math:`N_{obs}` events and ii) of observing at least :math:`N_{obs}` +events. These probabilities can be written as: + +.. math:: \delta_1 = \frac{ |\{ \hat{N_x} | \hat{N_x} \le N_{obs}\} |}{|\{ \hat{N} \}|} + +and + +.. math:: \delta_2 = \frac{ |\{ \hat{N_x} | \hat{N_x} \ge N_{obs}\} |}{|\{ \hat{N} \}|} + +If a forecast is Poisson, the expected number of events in the forecast +follows a Poisson distribution with expectation +:math:`N_{fore} = \sum_{m_i, s_j \in R} \lambda(m_i, s_j)`. The +cumulative distribution is then a Poisson cumulative distribution: + +.. math:: F(x|N_{fore}) = \exp(-N_{fore}) \sum^{x}_{i=0} \frac{(N_{fore})^i}{i!} + +which can be used directly without the need for simulations. The N-test +quantile score is then + +.. math:: \delta_1 = 1 - F((N_{obs}-1)|N_{fore}), + +and + +.. math:: \delta_2 = F(N_{obs}|N_{fore}). + +The original N-test considered only :math:`\delta_2` and it’s complement +:math:`1-\delta_2`, which effectively tested the probability of at most +:math:`N_{obs}` events and more than :math:`N_{obs}` events. Very small +or very large values (<0.025 or > 0.975 respectively) were considered to +be inconsistent with the forecast in Schorlemmer et al (2010). However +the approach above aims to test something subtely different, that is at +least :math:`N_{obs}` events and at most :math:`N_{obs}` events. Zechar +et al (2010a) recommends testing both :math:`\delta_1` and +:math:`\delta_2` with an effective significance of have the required +significance level, so for a required significance level of 0.05, a +forecast is consistent if both :math:`\delta_1` and :math:`\delta_2` are +greater than 0.025. A very small :math:`\delta_1` suggest the rate is +too low while a very low :math:`\delta_2` suggests a rate which is too +high to be consistent with observations. + +Implementation in pyCSEP + +pyCSEP uses the Zechar et al (2010) version of the N-test and the +cumulative Poisson approach to estimate the range of expected events +from the forecasts, so does not implement a simulation in this case. The +upper and lower bounds for the test are determined from the cumulative +Poisson distribution. ``number_test_result.quantile`` will return both +:math:`\delta_1` and :math:`\delta_2` values. + +.. code:: ipython3 + + number_test_result = poisson.number_test(helmstetter, catalog) + ax = plots.plot_poisson_consistency_test( + number_test_result, + plot_args={'xlabel':'Number of events'} + ) + + + +.. image:: output_13_0.png + + +In this case, the black bar shows the :math:`95\%` interval for the +number of events in the forecast. The actual observed number of events +is shown by the green box, which just passes the N-test in this case: +the forecast generallly expects more events than are observed in +practice, but the observed number falls just within the lower limits of +what is expected so the forecast (just!) passes the N-test. + +M-test +^^^^^^ + +Aim: Establish consistency (or lack thereof) of observed event +magnitudes with forecast magnitudes. + +Method: The M-test is first described in Zechar et al. (2010) and aims +to isolate the magnitude component of a forecast. To do this, we sum +over the spatial bins and normalise so that the sum of events matches +the observations. + +.. math:: \hat{\boldsymbol{\Omega}}^m = \big{\{}\omega^{m}(m_i)| m_i \in \boldsymbol{M}\big{\}}, + +where + +.. math:: \omega^m(m_i) = \sum_{s_j \in \boldsymbol{S}} \omega(m_i, s_j), + +and + +.. math:: \boldsymbol{\Lambda}^m = \big{\{} \lambda^m(m_i)| m_i \in \boldsymbol{M} \big{\}}, + +where + +.. math:: \lambda^m(m_i) = \frac{N_{obs}}{N_{fore}}\sum_{s_j \in \boldsymbol{S}} \lambda\big{(}m_i, s_j\big{)}. + +Then we compute the joint log-likelihood as we did for the L-test: + +.. math:: M = L(\boldsymbol{\Omega}^m | \boldsymbol{\Lambda}^m) + +We then wish to compare this with the distribution of simulated +log-likelihoods, this time keep the number of events fixed to + +:math:`N_{obs}`. Then for each simulated catalogue, +:math:`\hat{M}_x = L(\hat{\boldsymbol{\Omega}}^m | \boldsymbol{\Lambda}^m)` + +Quantile score: The final test statistic is again the fraction of +observed log likelihoods within the range of the simulated log +likelihood values: + +.. math:: \kappa = \frac{ |\{ \hat{M_x} | \hat{M_x} \le M\} |}{|\{ \hat{M} \}|} + +and the observed magnitudes are inconsistent with the forecast if +:math:`\kappa` is less than the significance level. + +pyCSEP implementation + +.. code:: ipython3 + + mag_test_result = poisson.magnitude_test( + helmstetter, + catalog, + seed=seed, + num_simulations=nsim + ) + ax = plots.plot_poisson_consistency_test( + mag_test_result, + one_sided_lower=True, + plot_args={'xlabel':'Normalized likelihood'} + ) + + + +.. image:: output_16_0.png + + +In this example, the forecast passes the M-test, demonstrating that the +magnitude distribution in the forecast is consistent with observed +events. This is shown by the green square marking the joint +log-likelihood for the observed events. + +S-test +^^^^^^ + +Aim: The spatial or S-test aims to establish consistency (or lack +thereof) of observed event locations with a forecast. It is originally +defined in Zechar et al (2010). + +Method: Similar to the M-test, but in this case we sum over all +magnitude bins. + +.. math:: \hat{\boldsymbol{\Omega}^s} = \{\omega^s(s_j)| s_j \in \boldsymbol{S}\}, + +where + +.. math:: \omega^s(s_j) = \sum_{m_i \in \boldsymbol{M}} \omega(m_i, s_j), + +and + +.. math:: \boldsymbol{\Lambda}^s = \{ \lambda^s(s_j)| s_j \in \boldsymbol{S} \}, + +where + +.. math:: \lambda^s(s_j) = \frac{N_{obs}}{N_{fore}}\sum_{m_i \in M} \lambda(m_i, s_j). + +Then we compute the joint log-likelihood as we did for the L-test or the +M-test: + +.. math:: S = L(\boldsymbol{\Omega}^s | \boldsymbol{\Lambda}^s) + +We then wish to compare this with the distribution of simulated +log-likelihoods, this time keeping the number of events fixed to +:math:`N_{obs}`. Then for each simulated catalogue, +:math:`\hat{S}_x = L(\hat{\boldsymbol{\Omega}}^s | \boldsymbol{\Lambda}^s)` + +The final test statistic is again the fraction of observed log +likelihoods within the range of the simulated log likelihood values: + +.. math:: \zeta = \frac{ |\{ \hat{S_x} | \hat{S_x} \le S\} |}{|\{ \hat{S} \}|} + +and again the distinction between a forecast passing or failing the test +depends on our significance level. + +pyCSEP implementation + +The S-test is again a one-sided test, so we specify this when plotting +the result. + +.. code:: ipython3 + + spatial_test_result = poisson.spatial_test( + helmstetter, + catalog, + seed=seed, + num_simulations=nsim + ) + ax = plots.plot_poisson_consistency_test( + spatial_test_result, + one_sided_lower=True, + plot_args = {'xlabel':'normalized spatial likelihood'} + ) + + + +.. image:: output_19_0.png + + +The Helmstetter model fails the S-test as the observed spatial +likelihood falls in the tail of the simulated likelihood distribution. +Again this is shown by a coloured symbol which highlights whether the +forecast model passes or fails the test. + +Forecast comparison tests +~~~~~~~~~~~~~~~~~~~~~~~~~ + +The consistency tests above check whether a forecast is consistent with +observations, but do not provide a straightforward way to compare two +different forecasts. A few suggestions for this focus on the information +gain of one forecast relative to another (Harte and Vere-Jones 2005, +Imoto and Hurukawa, 2006, Imoto and Rhoades, 2010, Rhoades et al 2011). +The T-test and W-test implementations for earthquake forecast comparison +are first described in Rhoades et al. (2011). + +The information gain per earthquake (IGPE) of model A compared to model +B is defined by :math:`I_{N}(A, B) = R/N` where R is the rate-corrected +log-likelihood ratio of models A and B gven by + +.. math:: R = \sum_{k=1}^{N}\big{(}\log\lambda_A(i_k) - \log \lambda_B(i_k)\big{)} - \big{(}\hat{N}_A - \hat{N}_B\big{)} + +If we set :math:`X_i=\log\lambda_A(k_i)` and +:math:`Y_i=\log\lambda_B(k_i)` then we can define the information gain +per earthquake (IGPE) as + +.. math:: I_N(A, B) = \frac{1}{N}\sum^N_{i=1}\big{(}X_i - Y_i\big{)} - \frac{\hat{N}_A - \hat{N}_B}{N} + +If :math:`I(A, B)` differs significantly from 0, the model with the +lower likelihood can be rejected in favour of the other. + +t-test + +If :math:`X_i - Y_i` are independent and come from the same normal +population with mean :math:`\mu` then we can use the classic paired +t-test to evaluate the null hypothesis that +:math:`\mu = (\hat{N}_A - \hat{N}_B)/N` against the alternative +hypothesis :math:`\mu \ne (\hat{N}_A - \hat{N}_B)/N`. To implement this, +we let :math:`s` denote the sample variance of :math:`(X_i - Y_i)` such +that + +.. math:: s^2 = \frac{1}{N-1}\sum^N_{i=1}\big{(}X_i - Y_i\big{)}^2 - \frac{1}{N^2 - N}\bigg{(}\sum^N_{i=1}\big{(}X_i - Y_i\big{)}\bigg{)}^2 + +Under the null hypothesis +:math:`T = I_N(A, B)\big{/}\big{(}s/\sqrt{N}\big{)}` has a +t-distribution with :math:`N-1` degrees of freedom and the null +hypothesis can be rejected if :math:`|T|` exceeds a critical value of +the :math:`t_{N-1}` distribution. The confidence intervals for +:math:`\mu - (\hat{N}_A - \hat{N}_B)/N` can then be constructed with the +form :math:`I_N(A,B) \pm ts/\sqrt{N}` where t is the appropriate +quantile of the :math:`t_{N-1}` distribution. + +W-test + +An alternative to the t-test is the Wilcoxan signed-rank test or W-test. +This is a non-parameteric alternative to the t-test which can be used if +we do not feel the assumption of normally distributed differences in +:math:`X_i - Y_i` is valid. This assumption might b particularly poor +when we have small sample sizes. The W-test instead depends on the +(weaker) assumption that :math:`X_i - Y_i` is symmetric and tests +whether the meadian of :math:`X_i - Y_i` is equal to +:math:`(\hat{N}_A - \hat{N}_B)/N`. The W-test is less powerful than the +T-test for normally distributed differences and cannot reject the null +hypothesis (with :math:`95\%` confidence) for very small sample sizes +(:math:`N \leq 5`). + +The t-test becomes more accurate as :math:`N \rightarrow \infty` due to +the central limit theorem and therefore the t-test is considered +dependable for large :math:`N`. Where :math:`N` is small, a model might +only be considered more informative if both the t- and W-test results +agree. + +Implementation in pyCSEP + +The t-test and W-tests are implemented in pyCSEP as below. + +.. code:: ipython3 + + helmstetter_ms = csep.load_gridded_forecast( + datasets.helmstetter_mainshock_fname, + name = "Helmstetter Mainshock" + ) + + t_test = poisson.paired_t_test(helmstetter, helmstetter_ms, catalog) + w_test = poisson.w_test(helmstetter, helmstetter_ms, catalog) + comp_args = {'title': 'Paired T-test Result', + 'ylabel': 'Information gain', + 'xlabel': '', + 'xticklabels_rotation': 0, + 'figsize': (6,4)} + + ax = plots.plot_comparison_test([t_test], [w_test], plot_args=comp_args) + + + +.. image:: output_22_0.png + + +The first argument to the ``paired_t_test`` function is taken as model A +and the second as our basline model, or model B. When plotting the +result, the horizontal dashed line indicates the performance of model B +and the vertical bar shows the confidence bars for the information gain +:math:`I_N(A, B)` associated with model A relative to model B. In this +case, the model with aftershocks performs statistically worse than the +benchmark model. We note that this comparison is used for demonstation +purposes only. + +Catalog-based forecast tests +---------------------------- + +Catalog-based forecast tests evaluate forecasts using simulated outputs +in the form of synthetic earthquake catalogs. Thus, removing the need +for the Poisson approximation and simulation procedure used with +grid-based forecasts. We know that observed seismicity is overdispersed +with respect to a Poissonian model due to spatio-temporal clustering. +Overdispersed models are more likely to be rejected by the original +Poisson-based CSEP tests (Werner et al, 2011a). This modification of the +testing framework allows for a broader range of forecast models. The +distribution of realizations is then compared with observations, similar +to in the grid-based case. These tests were developed by Savran et al +2020, who applied them to test forecasts following the 2019 Ridgecrest +earthquake in Southern California. + +In the following text, we show how catalog-based forecasts are defined. +Again we begin by defining a region :math:`\boldsymbol{R}` as a function +of some magnitude range :math:`\boldsymbol{M}`, spatial domain +:math:`\boldsymbol{S}` and time period :math:`\boldsymbol{T}` + +.. math:: \boldsymbol{R} = \boldsymbol{M} \times \boldsymbol{S} \times \boldsymbol{T}. + +An earthquake :math:`e` can be described by a magnitude :math:`m_i` at +some location :math:`s_j` and time :math:`t_k`. A catalog is simply a +collection of earthquakes, thus the observed catalog can be written as + +.. math:: \Omega = \big{\{}e_n \big{|} n= 1...N_{obs}; e_n \in \boldsymbol{R} \big{\}}, + +and a forecast is then specified as a collection of synthetic catalogs +containing events :math:`\hat{e}_{nj}` in domain :math:`\boldsymbol{R}`, +as + +.. math:: \boldsymbol{\Lambda} \equiv \Lambda_j = \{\hat{e}_{nj} | n = 1... N_j, j= 1....J ;\hat{e}_{nj} \in \boldsymbol{R} \}. + +That is, a forecast consists of :math:`J` simulated catalogs each +containing :math:`N_j` events, described in time, space and magnitude +such that :math:`\hat{e}_{nj}` describes the :math:`n`\ th synthetic +event in the :math:`j`\ th synthetic catalog :math:`\Lambda_j` + +When using simulated forecasts in pyCSEP, we must first explicitly +specify the forecast region by specifying the spatial domain and +magnitude regions as below. In effect, these are filters applied to the +forecast and observations to retain only the events in +:math:`\boldsymbol{R}`. The examples in this section are catalog-based +forecast simulations for the Landers earthquake and aftershock sequence +generated using UCERF3-ETAS (Field et al, 2017). + +.. code:: ipython3 + + # Define the start and end times of the forecasts + start_time = time_utils.strptime_to_utc_datetime("1992-06-28 11:57:35.0") + end_time = time_utils.strptime_to_utc_datetime("1992-07-28 11:57:35.0") + + # Magnitude bins properties + min_mw = 4.95 + max_mw = 8.95 + dmw = 0.1 + + # Create space and magnitude regions. + magnitudes = regions.magnitude_bins(min_mw, max_mw, dmw) + region = regions.california_relm_region() + space_magnitude_region = regions.create_space_magnitude_region( + region, + magnitudes + ) + + # Load forecast + forecast = csep.load_catalog_forecast( + datasets.ucerf3_ascii_format_landers_fname, + start_time = start_time, + end_time = end_time, + region = space_magnitude_region, + apply_filters = True + ) + + # Compute expected rates + forecast.filters = [ + f'origin_time >= {forecast.start_epoch}', + f'origin_time < {forecast.end_epoch}' + ] + _ = forecast.get_expected_rates(verbose=False) + + # Obtain Comcat catalog and filter to region + comcat_catalog = csep.query_comcat( + start_time, + end_time, + min_magnitude=forecast.min_magnitude + ) + + # Filter observed catalog using the same region as the forecast + comcat_catalog = comcat_catalog.filter_spatial(forecast.region) + + +.. parsed-literal:: + + Fetched ComCat catalog in 0.31937098503112793 seconds. + + Downloaded catalog from ComCat with following parameters + Start Date: 1992-06-28 12:00:45+00:00 + End Date: 1992-07-24 18:14:36.250000+00:00 + Min Latitude: 33.901 and Max Latitude: 36.705 + Min Longitude: -118.067 and Max Longitude: -116.285 + Min Magnitude: 4.95 + Found 19 events in the ComCat catalog. + + +Number Test +~~~~~~~~~~~ + +Aim: As above, the number test aims to evaluate if the number of +observed events is consistent with the forecast. + +Method: The observed statistic in this case is given by +:math:`N_{obs} = |\Omega|`, which is simply the number of events in the +observed catalog. To build the test distribution from the forecast, we +simply count the number of events in each simulated catalog. + +.. math:: N_{j} = |\Lambda_c|; j = 1...J + +As in the gridded test above, we can then evaluate the probabilities of +at least and at most N events, in this case using the empirical +cumlative distribution function of :math:`F_N`: + +.. math:: \delta_1 = P(N_j \geq N_{obs}) = 1 - F_N(N_{obs}-1) + +and + +.. math:: \delta_2 = P(N_j \leq N_{obs}) = F_N(N_{obs}) + +Implementation in pyCSEP + +.. code:: ipython3 + + number_test_result = catalog_evaluations.number_test( + forecast, + comcat_catalog, + verbose=False + ) + ax = number_test_result.plot() + + + +.. image:: output_27_0.png + + +Plotting the number test result of a simulated catalog forecast displays +a histogram of the numbers of events :math:`\hat{N}_j` in each simulated +catalog :math:`j`, which makes up the test distribution. The test +statistic is shown by the dashed line - in this case it is the number of +observed events in the catalog :math:`N_{obs}`. + +Magnitude Test +~~~~~~~~~~~~~~ + +Aim: The magnitude test aims to test the consistency of the observed +frequency-magnitude distribution with that in the simulated catalogs +that make up the forecast. + +Method: The catalog-based magnitude test is implemented quite +differently to the grid-based equivalent. We first define the union +catalog :math:`\Lambda_U` as the union of all simulated catalogs in the +forecast. Formally: + +.. math:: \Lambda_U = \{ \lambda_1 \cup \lambda_2 \cup ... \cup \lambda_j \} + +| so that the union catalog contains all events across all simulated + catalogs for a total of + :math:`N_U = \sum_{j=1}^{J} \big{|}\lambda_j\big{|}` events. +| We then compute the following histograms discretised to the magnitude + range and magnitude step size (specified earlier for pyCSEP): 1. the + histogram of the union catalog magnitudes :math:`\Lambda_U^{(m)}` 2. + Histograms of magnitudes in each of the individual simulated catalogs + :math:`\lambda_j^{(m)}` 3. the histogram of the observed catalog + magnitudes :math:`\Omega^{(m)}` + +The histograms are normalized so that the total number of events across +all bins is equal to the observed number. The observed statistic is then +calculated as the sum of squared logarithmic residuals between the +normalised observed magnitudes and the union histograms. This statistic +is related to the Kramer von-Mises statistic. + +.. math:: d_{obs}= \sum_{k}\Bigg(\log\Bigg[\frac{N_{obs}}{N_U} \Lambda_U^{(m)}(k) + 1\Bigg]- \log\Big[\Omega^{(m)}(k) + 1\Big]\Bigg)^2 + +where :math:`\Lambda_U^{(m)}(k)` and :math:`\Omega^{(m)}(k)` +represent the count in the :math:`k`\ th bin of the magnitude-frequency +distribution in the union and observed catalogs respectively. We add +unity to each bin to avoid :math:`\log(0)`. We then build the test +distribution from the catalogs in :math:`\boldsymbol{\Lambda}`: + +.. math:: D_j = \sum_{k}\Bigg(\log\Bigg[\frac{N_{obs}}{N_U} \Lambda_U^{(m)}(k) + 1\Bigg]- \log\Bigg[\frac{N_{obs}}{N_j}\Lambda_j^{(m)}(k) + 1\Bigg]\Bigg)^2; j= 1...J + +where :math:`\lambda_j^{(m)}(k)` represents the count in the +:math:`k`\ th bin of the magnitude-frequency distribution of the +:math:`j`\ th catalog. + +The quantile score can then be calculated using the empirical CDF such +that + +.. math:: \gamma_m = F_D(d_{obs})= P(D_j \leq d_{obs}) + +| Implementation in pyCSEP +| Hopefully you now see why it was necessary to specify our magnitude + range explicitly when we set up the catalog-type testing - we need to + makes sure the magnitudes are properly discretised for the model we + want to test. + +.. code:: ipython3 + + magnitude_test_result = catalog_evaluations.magnitude_test( + forecast, + comcat_catalog,verbose=False + ) + ax = magnitude_test_result.plot(plot_args={'xy': (0.6,0.7)}) + + + +.. image:: output_30_0.png + + +The histogram shows the resulting test distribution with :math:`D^*` +calculated for each simulated catalog as described in the method above. +The test statistic :math:`\omega = d_{obs}` is shown with the dashed +horizontal line. The quantile score for this forecast is +:math:`\gamma = 0.29`. + +Pseudo-likelihood test +~~~~~~~~~~~~~~~~~~~~~~ + +Aim : The pseudo-likelihood test aims to evaluate the likelihood of a +forecast given an observed catalog. + +Method : The pseudo-likelihood test has similar aims to the grid-based +likelihood test above, but its implementation differs in a few +significant ways. Firstly, it does not compute an actual likelihood +(hence the name pseudo-likelihood), and instead of aggregating over +cells as in the grid-based case, the pseudo-likelihood test aggregates +likelihood over target event likelihood scores (so likelihood score per +target event, rather than likelihood score per grid cell). The most +important difference, however, is that the pseudo-likelihood tests do +not use a Poisson likelihood. + +The pseudo-likelihood approach is based on the continuous point process +likelihood function. A continuous marked space-time point process can be +specified by a conditional intensity function +:math:`\lambda(\boldsymbol{e}|H_t)`, in which :math:`H_t` describes the +history of the process in time. The log-likelihood function for any +point process in :math:`\boldsymbol{R}` is given by + +.. math:: L = \sum_{i=1}^{N} \log \lambda(e_i|H_t) - \int_{\boldsymbol{R}}\lambda(\boldsymbol{e}|H_t)d\boldsymbol{R} + +Not all models will have an explicit likelihood function, so instead we +approximate the expectation of :math:`\lambda(e|H_t)` using the forecast +catalogs. The approximate rate density is defined as the conditional +expectation given a discretised region :math:`R_d` of the continuous +rate + +.. math:: \hat{\lambda}(\boldsymbol{e}|H_t) = E\big[\lambda(\boldsymbol{e}|H_t)|R_d\big] + +We still regard the model as continuous, but the rate density is +approximated within a single cell. This is analogous to the gridded +approach where we count the number of events in discrete cells. The +pseudo-loglikelihood is then + +.. math:: \hat{L} = \sum_{i=1}^N \log \hat{\lambda}(e_i|H_t) - \int_R \hat{\lambda}(\boldsymbol{e}|H_t) dR + +and we can write the approximate rate density as + +.. math:: \hat{\lambda}(\boldsymbol{e}|H_t) = \sum_M \hat{\lambda}(\boldsymbol{e}|H_t), + +where we take the sum over all magnitude bins :math:`M`. We can +calculate observed pseudolikelihood as + +.. math:: \hat{L}_{obs} = \sum_{i=1}^{N_{obs}} \log \hat{\lambda}_s(k_i) - \bar{N}, + +where :math:`\hat{\lambda}_s(k_i)` is the approximate rate density in +the :math:`k`\ th spatial cell and :math:`k_i` denotes the spatil cell +in which the :math:`i`\ th event occurs. :math:`\bar{N}` is the expected +number of events in :math:`R_d`. Similarly, we calculate the test +distribution as + +.. math:: \hat{L}_{j} = \Bigg[\sum_{i=1}^{N_{j}} \log\hat{\lambda}_s(k_{ij}) - \bar{N}\Bigg]; j = 1....J, + +where :math:`\hat{\lambda}_s(k_{ij})` describes the approximate rate +density of the :math:`i`\ th event in the :math:`j`\ th catalog. We can +then calculate the quantile score as + +.. math:: \gamma_L = F_L(\hat{L}_{obs})= P(\hat{L}_j \leq \hat{L}_{obs}). + +Implementation in pyCSEP + +.. code:: ipython3 + + pseudolikelihood_test_result = catalog_evaluations.pseudolikelihood_test( + forecast, + comcat_catalog, + verbose=False + ) + ax = pseudolikelihood_test_result.plot() + + + +.. image:: output_33_0.png + + +The histogram shows the test distribution of pseudolikelihood as +calculated above for each catalog :math:`j`. The dashed vertical line +shows the observed statistic :math:`\hat{L}_{obs} = \omega`. It is clear +that the observed statistic falls within the critical region of test +distribution, as reflected in the quantile score of +:math:`\gamma_L = 0.02`. + +Spatial test +~~~~~~~~~~~~ + +Aim: The spatial test again aims to isolate the spatial component of the +forecast and test the consistency of spatial rates with observed events. + +Method We perform the spatial test in the catalog-based approach in a +similar way to the grid-based spatial test approach: by normalising the +approximate rate density. In this case, we use the normalisation +:math:`\hat{\lambda}_s = \hat{\lambda}_s \big/ \sum_{R} \hat{\lambda}_s`. +Then the observed spatial test statistic is calculated as + +.. math:: S_{obs} = \Bigg[\sum_{i=1}^{N_{obs}} \log \hat{\lambda}_s^*(k_i)\Bigg]N_{obs}^{-1} + +in which :math:`\hat{\lambda}_s^*(k_i)` is the normalised approximate +rate density in the :math:`k`\ th cell corresponding to the +:math:`i`\ th event in the observed catalog :math:`\Omega`. Similarly, +we define the test distribution using + +.. math:: S_{c} = \bigg[\sum_{i=1}^{N_{j}} \log \hat{\lambda}_s^*(k_{ij})\bigg]N_{j}^{-1}; j= 1...J + +for each catalog j. Finally, the quantile score for the spatial test is +determined by once again comparing the observed and test distribution +statistics: + +.. math:: \gamma_s = F_s(\hat{S}_{obs}) = P (\hat{S}_j \leq \hat{S}_{obs}) + +Implementation in pyCSEP + +.. code:: ipython3 + + spatial_test_result = catalog_evaluations.spatial_test( + forecast, + comcat_catalog, + verbose=False + ) + ax = spatial_test_result.plot() + + + +.. image:: output_36_0.png + + +The histogram shows the test distribution of normalised +pseduo-likelihood computed for each simulated catalog :math:`j`. The +dashed vertical line shows the observed test statistic +:math:`s_{obs} = \omega = -5.88`, which is clearly within the test +distribution. The quantile score :math:`\gamma_s = 0.36` is also printed +on the figure by default. + +References +---------- + +Field, E. H., K. R. Milner, J. L. Hardebeck, M. T. Page, N. J. van der +Elst, T. H. Jordan, A. J. Michael, B. E. Shaw, and M. J. Werner (2017). +A spatiotemporal clustering model for the third Uniform California +Earthquake Rupture Forecast (UCERF3-ETAS): Toward an operational +earthquake forecast, Bull. Seismol. Soc. Am. 107, 1049–1081. + +Harte, D., and D. Vere-Jones (2005), The entropy score and its uses in +earthquake forecasting, Pure Appl. Geophys. 162 , 6-7, 1229-1253, DOI: +10.1007/ s00024-004-2667-2. + +Helmstetter, A., Y. Y. Kagan, and D. D. Jackson (2006). Comparison of +short-term and time-independent earthquake forecast models for southern +California, Bulletin of the Seismological Society of America 96 90-106. + +Imoto, M., and N. Hurukawa (2006), Assessing potential seismic activity +in Vrancea, Romania, using a stress-release model, Earth Planets Space +58 , 1511-1514. + +Imoto, M., and D.A. Rhoades (2010), Seismicity models of moderate +earthquakes in Kanto, Japan utilizing multiple predictive parameters, +Pure Appl. Geophys. 167, 6-7, 831-843, DOI: 10.1007/s00024-010-0066-4. + +Rhoades, D.A, D., Schorlemmer, M.C.Gerstenberger, A. Christophersen, J. +D. Zechar & M. Imoto (2011) Efficient testing of earthquake forecasting +models, Acta Geophysica 59 + +Savran, W., M. J. Werner, W. Marzocchi, D. Rhoades, D. D. Jackson, K. R. +Milner, E. H. Field, and A. J. Michael (2020). Pseudoprospective +evaluation of UCERF3-ETAS forecasts during the 2019 Ridgecrest Sequence, +Bulletin of the Seismological Society of America. + +Schorlemmer, D., and M.C. Gerstenberger (2007), RELM testing center, +Seismol. Res. Lett. 78, 30–36. + +Schorlemmer, D., M.C. Gerstenberger, S. Wiemer, D.D. Jackson, and D.A. +Rhoades (2007), Earthquake likelihood model testing, Seismol. Res. Lett. +78, 17–29. + +Schorlemmer, D., A. Christophersen, A. Rovida, F. Mele, M. Stucci and W. +Marzocchi (2010a). Setting up an earthquake forecast experiment in +Italy, Annals of Geophysics, 53, no.3 + +Schorlemmer, D., J.D. Zechar, M.J. Werner, E.H. Field, D.D. Jackson, and +T.H. Jordan (2010b), First results of the Regional Earthquake Likelihood +Models experiment, Pure Appl. Geophys., 167, 8/9, +doi:10.1007/s00024-010-0081-5. + +M. Taroni, W. Marzocchi, D. Schorlemmer, M. J. Werner, S. Wiemer, J. D. +Zechar, L. Heiniger, F. Euchner; Prospective CSEP Evaluation of 1‐Day, +3‐Month, and 5‐Yr Earthquake Forecasts for Italy. Seismological Research +Letters 2018;; 89 (4): 1251–1261. doi: +https://doi.org/10.1785/0220180031 + +Werner, M. J., A. Helmstetter, D. D. Jackson, and Y. Y. Kagan (2011a). +High-Resolution Long-Term and Short-Term Earthquake Forecasts for +California, Bulletin of the Seismological Society of America 101 +1630-1648 + +Werner, M.J. J.D. Zechar, W. Marzocchi, and S. Wiemer (2011b), +Retrospective evaluation of the five-year and ten-year CSEP-Italy +earthquake forecasts, Annals of Geophysics 53, no. 3, 11–30, +doi:10.4401/ag-4840. + +Zechar, 2011: Evaluating earthquake predictions and earthquake +forecasts: a guide for students and new researchers, CORSSA +(http://www.corssa.org/en/articles/theme_6/) + +Zechar, J.D., M.C. Gerstenberger, and D.A. Rhoades (2010a), +Likelihood-based tests for evaluating space-rate-magnitude forecasts, +Bull. Seis. Soc. Am., 100(3), 1184—1195, doi:10.1785/0120090192. + +Zechar, J.D., D. Schorlemmer, M. Liukis, J. Yu, F. Euchner, P.J. +Maechling, and T.H. Jordan (2010b), The Collaboratory for the Study of +Earthquake Predictability perspective on computational earthquake +science, Concurr. Comp-Pract. E., doi:10.1002/cpe.1519. diff --git a/docs/index.rst b/docs/index.rst index 83e37b76..3a91d53d 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -8,6 +8,7 @@ pyCSEP: Tools for Earthquake Forecast Developers getting_started/installing getting_started/core_concepts + getting_started/theory .. toctree:: :maxdepth: 2