Add docs for the different losses/statistics

NNPDF · Mar 6, 2024 · f88fbb5 · f88fbb5
1 parent 6bafedb
commit f88fbb5
Showing 1 changed file with 94 additions and 0 deletions.
diff --git a/doc/sphinx/source/n3fit/hyperopt.rst b/doc/sphinx/source/n3fit/hyperopt.rst
@@ -34,6 +34,8 @@ The desired features of this figure of merit can be summarized as:
 3. Be reliable even when the number of points is not very large.
 
 
+.. _hyperkfolding-label:
+
 K-folding cross-validation
 --------------------------
 A good compromise between all previous points is the usage of the cross-validation technique
@@ -320,6 +322,10 @@ Changing the hyperoptimization target
 -----------------------------------
 
 Beyond the usual :math:`\chi2`-based optimization figures above, it is possible to utilize other measures as the target for hyperoptimization.
+
+Future tests
+~~~~~~~~~~~~
+
 One possibility is to use a :ref:`future test<futuretests>`-based metric for which the goal is not to get the minimum :math:`\chi2` but to get the same :math:`\chi2` (with PDF errors considered) for different datasets. The idea is that this way we select models of which the prediction is stable upon variations in the dataset.
 In order to obtain the PDF errors used in the figure of merit it is necessary to run multiple replicas, luckily ``n3fit`` provides such a possibility also during hyperoptimization.
 
@@ -372,6 +378,94 @@ The figure of merit will be the difference between the :math:`\chi2` of the seco
    L_{\rm hyperopt} = \chi^{2}_{(1) \rm pdferr} - \chi^{2}_{(2)}
 
 
+New hyperoptimization metrics with fold and replica statistics
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The combination of :ref:`k-folding <hyperkfolding-label>` and multi-replica experiments
+opens several the possibilities for the choice of figure of merit. The simplest option would be to minimize
+the average of :math:`\chi2` across both replica and k folds, *i.e.*,
+
+.. math::
+    L_{1} = \frac{1}{n_{\rm fold}} \sum_{k=1}^{n_{\rm fold}} \left< \chi^2_{k} \right>_{\rm rep}.
+
+In NNPDF, this hyperoptimisation metrics is selected via the following generic runcard:
+
+.. code-block:: yaml
+
+        dataset_inputs:
+        ...
+
+        kfold:
+          loss_type: chi2
+          replica_statistic: average
+          fold_statistic: average
+          partitions:
+          - datasets:
+            ...
+          - datasets:
+            ...
+
+        parallel_models: true
+        same_trvl_per_replica: true
+
+By combining the ``average``, ``best_worst``, and ``std`` figures of merit discussed in :ref:`hyperkfolding-label`,
+several alternatives may arise. For example, one approach could involve minimizing
+the maximum value of the set of averaged-over-replicas :math:`\chi2`,
+
+.. math::
+    L_{2} = {\rm max} \left ( \left< \chi^2_{1} \right>_{\rm rep}, \left< \chi^2_{2} \right>_{\rm rep}, ..., \left< \chi^2_{n_{\rm fold}} \right>_{\rm rep}\right),
+
+with correspond runcard ``kfold`` settings:
+
+.. code-block:: yaml
+
+        dataset_inputs:
+        ...
+
+        kfold:
+          loss_type: chi2
+          replica_statistic: average
+          fold_statistic: best_worst
+          partitions:
+          - datasets:
+            ...
+          - datasets:
+            ...
+
+An alternative metric that is
+sensitive to higher moments of the probability distribution
+has been defined in `NNPDF3.0 <https://link.springer.com/article/10.1007/JHEP04(2015)040>`_ [see Eq. (4.6) therein],
+namely, the :math:`\varphi` estimator. In the context of hyperopt, :math:`\varphi^{2}` can be calculated for each k-fold as
+
+.. math::
+  \varphi_{k}^2 = \langle \chi^2_k  [ \mathcal{T}[f_{\rm fit}], \mathcal{D} ] \rangle_{\rm rep} - \chi^2_k [ \langle \mathcal{T}[f_{\rm fit}] \rangle_{\rm rep}, \mathcal{D} ],
+
+where the first term represents the usual averaged-over-replicas :math:`\left< \chi^2_{k} \right>_{\rm rep}`
+calculated based on the dataset used in the fit (:math:`\mathcal{D}`) and
+the theory predictions from each fitted PDF (:math:`f_{\rm fit}`) replica.
+The second term involves the calculation of :math:`\chi2` but now with respect to the theory predictions from the central PDF.
+
+On the basis of :math:`\varphi`, we define the loss as
+
+.. math::
+    L_{3} = \left (\frac{1}{n_{\rm fold}} \sum_{k=1}^{n_{\rm fold}} \varphi_{k}^2 \right)^{-1},
+
+which selects hyperparameters that favor the most conservative extrapolation.
+In NNPDF, this figure of merit is chosen using the following settings:
+
+.. code-block:: yaml
+
+        kfold:
+          loss_type: phi2
+          fold_statistic: average
+            ...
+
+Alternatively, it is currently also possible to combine :math:`L_1` (which is only sensitive to the first moment)
+with :math:`L_3` (which provides information on the second moment).
+For example, one might minimize :math:`L_1` while monitoring the values of :math:`L_3` for a final model selection.
+The optimal approach for this combination is still under development.
+All the above options are implemented in the :class:`~n3fit.hyper_optimization.rewards.HyperLoss` class
+which is instantiated and monitored within :meth:`~n3fit.model_trainer.ModelTrainer.hyperparametrizable` method.
+
 Restarting hyperoptimization runs
 ---------------------------------