Merge pull request #1110 from NNPDF/thcovtutorial

Docs: tutorial for running fit with scale variation theory covmat
NNPDF · Mar 15, 2021 · aa06460 · aa06460
2 parents 17575ad + 2a67a1f
commit aa06460
Show file tree

Hide file tree

Showing 9 changed files with 497 additions and 26 deletions.
diff --git a/doc/sphinx/source/theory/theoryparamsinfo.md b/doc/sphinx/source/theory/theoryparamsinfo.md
@@ -1,3 +1,6 @@
+```eval_rst
+.. _th_parameter_info:
+```
 # Looking up the parameters of a theory
 
 The parameters for all of the theories can be found in the `theory.db` file,

diff --git a/doc/sphinx/source/tutorials/index.rst b/doc/sphinx/source/tutorials/index.rst
@@ -14,6 +14,7 @@ Running fits
    ./run-fit.md
    ./run-legacy-fit.rst
    ./run-iterated-fit.rst
+   ./thcov_tutorial.rst
 
 Analysing results
 -----------------

diff --git a/doc/sphinx/source/tutorials/thcov_tutorial.rst b/doc/sphinx/source/tutorials/thcov_tutorial.rst
@@ -0,0 +1,258 @@
+How to include a theory covariance matrix in a fit
+==================================================
+:Author: Contact Rosalyn (r.l.pearson@ed.ac.uk) for further information.
+
+This section details how to include :ref:`scale variation covariance matrices (covmats) <vptheorycov-index>`
+in a PDF fit. At the present time this can only be done at next-to-leading order (NLO), for which the
+central theory is :ref:`theory 163 <theory-indexes>`.
+
+First, decide which theory covmat you want
+------------------------------------------
+- Choose the desired point-prescription listed :ref:`here <prescrips>`.
+- Each prescription comes with a ``point_prescription`` flag to include in
+  the runcard, one of ["3 point", "5 point", "5bar point", "7 point", "9 point"]
+
+Next, add necessary flags to the runcard
+----------------------------------------
+- Remember to list the required datasets using ``dataset_inputs`` (see :ref:`data_specification`).
+- Add ``theorycovmatconfig`` to the runcard. An example is in the following code snippet:
+
+.. code:: yaml
+
+	############################################################
+	theory:
+	  theoryid: 163        # database id
+
+	theorycovmatconfig:
+	  point_prescription: "3 point"
+	  theoryids:
+   	    from_: scale_variation_theories
+	  pdf: NNPDF31_nlo_as_0118
+	  use_thcovmat_in_fitting: true
+	  use_thcovmat_in_sampling: true
+
+	############################################################
+
+- ``pdf`` is the PDF used to generate the scale varied predictions which
+  construct the theory covmat. Choose something close to the PDF you are
+  trying to fit, such as a previous iteration if available.
+-  ``theoryids`` are necessary for the construction of the theory covmat.
+   To avoid user error in entering them in the correct configuration and order,
+   this is handled by the ``produce_scale_variation_theories`` action in
+   `config <https://github.com/NNPDF/nnpdf/tree/master/validphys2/src/validphys/config.py>`_,
+   using the information in
+   `the scalevariations module <https://github.com/NNPDF/nnpdf/tree/master/validphys2/src/validphys/scalevariations>`_.
+-  The flags ``use_thcovmat_in_fitting`` and ``use_thcovmat_in_sampling`` specify
+   where to use the theory covmat in the code. There are two possible places:
+   the fitting (i.e. :math:`\chi^2` minimiser) and the sampling (i.e. pseudodata
+   generation). The default is ``True`` for both.
+.. warning::
+      Changing either of these to ``False`` will affect the fit outcome and should
+      be avoided unless you know what you are doing.
+
+If you want to compare data to another fit
+------------------------------------------
+-  Sometimes we want to compare data to another fit for validation, for example
+   we might want to compare predictions for the NLO fit with MHOUs to the known
+   NNLO fit (see :ref:`vptheorycov-tests`).
+-  To make sure the cuts match between these two fits, edit the ``datacuts``
+   section of the runcard to include the following
+
+.. code:: yaml
+
+	  use_cuts: fromintersection
+	  cuts_intersection_spec:
+	  - theoryid: 163
+	  - theoryid: 53
+
+-  This ensures that the cuts on the data are the intersection of the cuts in
+   theory 53 (default NNLO) and theory 163 (central scale variation NLO). See
+   :ref:`here <theory-indexes>` for theory definitions.
+
+Example runcard
+---------------
+The following is an example runcard for an NLO NNPDF3.1-style fit with a 3 point theory covmat.
+It can be found `here <https://github.com/NNPDF/nnpdf/tree/master/validphys2/examples/theory_covariance/fit_with_thcovmat.yaml>`_.
+
+.. code:: yaml
+
+	#
+	# Configuration file for NNPDF++
+	#
+	##########################################################################################
+	description: Example runcard for NLO NNPDF3.1 style fit with 3pt theory covariance matrix
+
+	##########################################################################################
+	# frac: training fraction
+	# ewk: apply ewk k-factors
+	# sys: systematics treatment (see systypes)
+	dataset_inputs:
+	  - {dataset: NMCPD, frac: 0.5}
+	  - {dataset: NMC, frac: 0.5}
+	  - {dataset: SLACP, frac: 0.5}
+	  - {dataset: SLACD, frac: 0.5}
+	  - {dataset: BCDMSP, frac: 0.5}
+	  - {dataset: BCDMSD, frac: 0.5}
+	  - {dataset: CHORUSNU, frac: 0.5}
+	  - {dataset: CHORUSNB, frac: 0.5}
+	  - {dataset: NTVNUDMN, frac: 0.5}
+	  - {dataset: NTVNBDMN, frac: 0.5}
+	  - {dataset: HERACOMBNCEM, frac: 0.5}
+	  - {dataset: HERACOMBNCEP460, frac: 0.5}
+	  - {dataset: HERACOMBNCEP575, frac: 0.5}
+	  - {dataset: HERACOMBNCEP820, frac: 0.5}
+	  - {dataset: HERACOMBNCEP920, frac: 0.5}
+	  - {dataset: HERACOMBCCEM, frac: 0.5}
+	  - {dataset: HERACOMBCCEP, frac: 0.5}
+	  - {dataset: HERAF2CHARM, frac: 0.5}
+	  - {dataset: CDFZRAP, frac: 1.0}
+	  - {dataset: D0ZRAP, frac: 1.0}
+	  - {dataset: D0WEASY, frac: 1.0}
+	  - {dataset: D0WMASY, frac: 1.0}
+	  - {dataset: ATLASWZRAP36PB, frac: 1.0}
+	  - {dataset: ATLASZHIGHMASS49FB, frac: 1.0}
+	  - {dataset: ATLASLOMASSDY11EXT, frac: 1.0}
+	  - {dataset: ATLASWZRAP11, frac: 0.5}
+	  - {dataset: ATLAS1JET11, frac: 0.5}
+	  - {dataset: ATLASZPT8TEVMDIST, frac: 0.5}
+	  - {dataset: ATLASZPT8TEVYDIST, frac: 0.5}
+	  - {dataset: ATLASTTBARTOT, frac: 1.0}
+	  - {dataset: ATLASTOPDIFF8TEVTRAPNORM, frac: 1.0}
+	  - {dataset: CMSWEASY840PB, frac: 1.0}
+	  - {dataset: CMSWMASY47FB, frac: 1.0}
+	  - {dataset: CMSDY2D11, frac: 0.5}
+	  - {dataset: CMSWMU8TEV, frac: 1.0}
+	  - {dataset: CMSZDIFF12, frac: 1.0, cfac: [NRM]}
+	  - {dataset: CMSJETS11, frac: 0.5}
+	  - {dataset: CMSTTBARTOT, frac: 1.0}
+	  - {dataset: CMSTOPDIFF8TEVTTRAPNORM, frac: 1.0}
+	  - {dataset: LHCBZ940PB, frac: 1.0}
+	  - {dataset: LHCBZEE2FB, frac: 1.0}
+	  - {dataset: LHCBWZMU7TEV, frac: 1.0, cfac: [NRM]}
+	  - {dataset: LHCBWZMU8TEV, frac: 1.0, cfac: [NRM]}
+
+	############################################################
+	datacuts:
+	  t0pdfset: 190310-tg-nlo-global                    # PDF set to generate t0 covmat
+	  q2min: 13.96                        # Q2 minimum
+	  w2min: 12.5                        # W2 minimum
+	  combocuts: NNPDF31                 # NNPDF3.0 final kin. cuts
+	  jetptcut_tev: 0                    # jet pt cut for tevatron
+	  jetptcut_lhc: 0                    # jet pt cut for lhc
+	  wptcut_lhc: 30.0                   # Minimum pT for W pT diff distributions
+	  jetycut_tev: 1e30                  # jet rap. cut for tevatron
+	  jetycut_lhc: 1e30                  # jet rap. cut for lhc
+	  dymasscut_min: 0                   # dy inv.mass. min cut
+	  dymasscut_max: 1e30                # dy inv.mass. max cut
+	  jetcfactcut: 1e30                  # jet cfact. cut
+	  use_cuts: fromintersection
+	  cuts_intersection_spec:
+	  - theoryid: 163
+	  - theoryid: 53
+
+	############################################################
+	theory:
+	  theoryid: 163        # database id
+
+	theorycovmatconfig:
+	  point_prescription: "3 point"
+	  theoryids:
+	   from_: scale_variation_theories
+	  fivetheories: None
+	  pdf: NNPDF31_nlo_as_0118
+	  use_thcovmat_in_fitting: true
+	  use_thcovmat_in_sampling: true
+
+	sampling_t0:
+	  use_t0: false
+
+	fitting_t0:
+	  use_t0: true
+
+	############################################################
+	fitting:
+	  seed: 65532133530           # set the seed for the random generator
+	  genrep: on        # on = generate MC replicas, off = use real data
+	  rngalgo: 0        # 0 = ranlux, 1 = cmrg, see randomgenerator.cc
+	  fitmethod: NGA    # Minimization algorithm
+	  ngen: 30000       # Maximum number of generations
+	  nmutants: 80      # Number of mutants for replica
+	  paramtype: NN
+	  nnodes: [2, 5, 3, 1]
+
+	  # NN23(QED) = sng=0,g=1,v=2,t3=3,ds=4,sp=5,sm=6,(pht=7)
+	  # EVOL(QED) = sng=0,g=1,v=2,v3=3,v8=4,t3=5,t8=6,(pht=7)
+	  # EVOLS(QED)= sng=0,g=1,v=2,v8=4,t3=4,t8=5,ds=6,(pht=7)
+	  # FLVR(QED) = g=0, u=1, ubar=2, d=3, dbar=4, s=5, sbar=6, (pht=7)
+	  fitbasis: NN31IC # EVOL (7), EVOLQED (8), etc.
+	  basis:
+	      # remeber to change the name of PDF accordingly with fitbasis
+	      # pos: on for NN squared
+	      # mutsize: mutation size
+	      # mutprob: mutation probability
+	      # smallx, largex: preprocessing ranges
+	  - {fl: sng, pos: off, mutsize: [15], mutprob: [0.05], smallx: [1.046, 1.188], largex: [
+	      1.437, 2.716]}
+	  - {fl: g, pos: off, mutsize: [15], mutprob: [0.05], smallx: [0.9604, 1.23], largex: [
+	      0.08459, 6.137]}
+	  - {fl: v, pos: off, mutsize: [15], mutprob: [0.05], smallx: [0.5656, 0.7242], largex: [
+	      1.153, 2.838]}
+	  - {fl: v3, pos: off, mutsize: [15], mutprob: [0.05], smallx: [0.1521, 0.5611], largex: [
+	      1.236, 2.976]}
+	  - {fl: v8, pos: off, mutsize: [15], mutprob: [0.05], smallx: [0.5264, 0.7246], largex: [
+	      0.6919, 3.198]}
+	  - {fl: t3, pos: off, mutsize: [15], mutprob: [0.05], smallx: [-0.3687, 1.459], largex: [
+	      1.664, 3.373]}
+	  - {fl: t8, pos: off, mutsize: [15], mutprob: [0.05], smallx: [0.5357, 1.267], largex: [
+	      1.433, 2.866]}
+	  - {fl: cp, pos: off, mutsize: [15], mutprob: [0.05], smallx: [-0.09635, 1.204],
+	    largex: [1.654, 7.456]}
+
+	############################################################
+	stopping:
+	  stopmethod: LOOKBACK  # Stopping method
+	  lbdelta: 0            # Delta for look-back stopping
+	  mingen: 0             # Minimum number of generations
+	  window: 500           # Window for moving average
+	  minchi2: 3.5          # Minimum chi2
+	  minchi2exp: 6.0       # Minimum chi2 for experiments
+	  nsmear: 200           # Smear for stopping
+	  deltasm: 200          # Delta smear for stopping
+	  rv: 2                 # Ratio for validation stopping
+	  rt: 0.5               # Ratio for training stopping
+	  epsilon: 1e-6         # Gradient epsilon
+
+	############################################################
+	positivity:
+	  posdatasets:
+	  - {dataset: POSF2U, poslambda: 1e6}        # Positivity Lagrange Multiplier
+	  - {dataset: POSF2DW, poslambda: 1e6}
+	  - {dataset: POSF2S, poslambda: 1e6}
+	  - {dataset: POSFLL, poslambda: 1e6}
+	  - {dataset: POSDYU, poslambda: 1e10}
+	  - {dataset: POSDYD, poslambda: 1e10}
+	  - {dataset: POSDYS, poslambda: 1e10}
+
+	############################################################
+	closuretest:
+	  filterseed: 0     # Random seed to be used in filtering data partitions
+	  fakedata: off     # on = to use FAKEPDF to generate pseudo-data
+	  fakepdf: MSTW2008nlo68cl      # Theory input for pseudo-data
+	  errorsize: 1.0    # uncertainties rescaling
+	  fakenoise: off    # on = to add random fluctuations to pseudo-data
+	  rancutprob: 1.0   # Fraction of data to be included in the fit
+	  rancutmethod: 0   # Method to select rancutprob data fraction
+	  rancuttrnval: off # 0(1) to output training(valiation) chi2 in report
+	  printpdf4gen: off # To print info on PDFs during minimization
+
+	############################################################
+	lhagrid:
+	  nx: 150
+	  xmin: 1e-9
+	  xmed: 0.1
+	  xmax: 1.0
+	  nq: 50
+	  qmax: 1e5
+
+	############################################################
+	debug: off
diff --git a/doc/sphinx/source/vp/dataspecification.rst b/doc/sphinx/source/vp/dataspecification.rst
@@ -390,8 +390,6 @@ input
 
 .. code:: yaml
 
-    metadata_group: nnpdf31_process
-
     experiments:
      - experiment: NMC
        datasets:
@@ -418,18 +416,6 @@ The user should be aware, however, that any grouping introduced in this way is
 purely superficial and will be ignored in favour of the experiments defined by
 the metadata of the datasets.
 
-*IMPORTANT*: Note that all theory uncertainties runcards will need to be
-updated to explicitly set ``metadata_group: nnpdf31_process``, or else the
-prescriptions for scale variations will not vary scales coherently for data
-within the same process type, as usually desired, but rather for data within
-the same experiment. When running the examples in
-:ref:`theory-covmat-examples`, it should be obvious if this has been set
-because the outputs will be plots grouped by experiment rather than by process
-type. However, care must be taken when using the theory covariance matrix but
-not plotting anything, since the aforementioned check is not relevant. For
-example, if you only want to produce a 𝞆² you must be careful to set the
-``metadata_group`` key as above.
-
 Runcards that request actions that have been renamed will not work anymore.
 Generally, actions that were previously named ``experiments_*`` have been
 renamed to highlight the fact that they work with more general groupings.

diff --git a/doc/sphinx/source/vp/theorycov/examples.rst b/doc/sphinx/source/vp/theorycov/examples.rst
@@ -14,6 +14,10 @@ You need to provide the central theory under the ``default_theory`` flag,
 corresponding to :math:`(\mu_F, \mu_R) = (0,0)`,
 which for NLO is theory 163.
 
+You need to provide the required point prescription using the flag in 
+:ref:`this section <pointprescrips>`, e.g. ``point_prescription: "3 point"``
+in the case below.
+
 ``dataspecs`` associates a chosen label (``speclabel``) with each of the theory
 choices. This details what scale variation the theory corresponds to.
 
@@ -22,11 +26,12 @@ Here the cuts and PDF are taken from the central NLO scale-varied fit.
 You must also list all the experiments you wish to include, along with any 
 relevant c-factors. 
 
-*IMPORTANT*: In order to ensure backwards compatibility now that the structure
-of data in runcards has been updated and ``experiments`` is deprecated, you must
-also include ``metadata_group: nnpdf31_process`` in the runcards, so that the
-scale variation prescriptions are done by process rather than by experiment. See
-:ref:`backwards-compatibility` for more details.
+.. warning::
+	In order to ensure backwards compatibility now that the structure
+	of data in runcards has been updated and ``experiments`` is deprecated, you must
+	also include ``metadata_group: nnpdf31_process`` in the runcards, so that the
+	scale variation prescriptions are done by process rather than by experiment. See
+	:ref:`backwards-compatibility` for more details.
 
 .. code-block::  yaml
    :linenos:

diff --git a/doc/sphinx/source/vp/theorycov/index.rst b/doc/sphinx/source/vp/theorycov/index.rst
@@ -3,7 +3,6 @@
 The theorycovariance module
 ===============================
 
-
 :Author: Rosalyn Pearson (r.l.pearson@ed.ac.uk)
 
 The ``theorycovariance`` module deals with constructing, testing and 
@@ -32,7 +31,9 @@ Summary
 -  Theoretical covariance matrices are built according to the various prescriptions
    in :ref:`prescrips`. 
 
--  The prescription must be one of 3 point, 5 point, 5bar point, 7 point or 9 point.
+-  The prescription must be one of 3 point, 5 point, 5bar point, 7 point or 9 point. You can specify
+   this using ``point_prescription: "x point"`` in the runcard. The translation of this flag 
+   into the relevant ``theoryids`` is handled by the ``scalevariations`` module in ``validphys``.
 
 -  As input you need theories for the relevant scale combinations which
    correspond to the prescription. This information is taken from the