Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hyperopt loss #1726

Merged
merged 20 commits into from
Mar 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 98 additions & 0 deletions doc/sphinx/source/n3fit/hyperopt.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@ The desired features of this figure of merit can be summarized as:
3. Be reliable even when the number of points is not very large.


.. _hyperkfolding-label:

K-folding cross-validation
--------------------------
A good compromise between all previous points is the usage of the cross-validation technique
Expand Down Expand Up @@ -320,6 +322,10 @@ Changing the hyperoptimization target
-----------------------------------

Beyond the usual :math:`\chi2`-based optimization figures above, it is possible to utilize other measures as the target for hyperoptimization.

Future tests
~~~~~~~~~~~~

One possibility is to use a :ref:`future test<futuretests>`-based metric for which the goal is not to get the minimum :math:`\chi2` but to get the same :math:`\chi2` (with PDF errors considered) for different datasets. The idea is that this way we select models of which the prediction is stable upon variations in the dataset.
In order to obtain the PDF errors used in the figure of merit it is necessary to run multiple replicas, luckily ``n3fit`` provides such a possibility also during hyperoptimization.

Expand Down Expand Up @@ -372,6 +378,98 @@ The figure of merit will be the difference between the :math:`\chi2` of the seco
L_{\rm hyperopt} = \chi^{2}_{(1) \rm pdferr} - \chi^{2}_{(2)}


New hyperoptimization metrics with fold and replica statistics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The hyperopt measures discussed above are all based on performing a single replica fit per fold.
However, one may also wish to run the hyperoptimization algorithm on fits consisting of many
replicas per fold. This is a feasible option in ``n3fit``, since it has been optimised to
efficiently run many replica fits in parallel on GPU.
Cmurilochem marked this conversation as resolved.
Show resolved Hide resolved

The combination of :ref:`k-folding <hyperkfolding-label>` and multi-replica experiments
Cmurilochem marked this conversation as resolved.
Show resolved Hide resolved
opens several possibilities for the choice of figure of merit. The simplest option would be to minimize
the average of :math:`\chi^2` across both replica and k folds, *i.e.*,

.. math::
L_{1} = \frac{1}{n_{\rm fold}} \sum_{k=1}^{n_{\rm fold}} \left< \chi^2_{k} \right>_{\rm rep}.

In NNPDF, this hyperoptimisation metrics is selected via the following generic runcard:

.. code-block:: yaml

dataset_inputs:
...

kfold:
loss_type: chi2
replica_statistic: average
fold_statistic: average
partitions:
- datasets:
...
- datasets:
...

parallel_models: true

By combining the ``average``, ``best_worst``, and ``std`` figures of merit discussed in :ref:`hyperkfolding-label`,
several alternatives may arise. For example, one approach could involve minimizing
the maximum value of the set of averaged-over-replicas :math:`\chi^2`,

.. math::
L_{2} = {\rm max} \left ( \left< \chi^2_{1} \right>_{\rm rep}, \left< \chi^2_{2} \right>_{\rm rep}, ..., \left< \chi^2_{n_{\rm fold}} \right>_{\rm rep}\right),

with correspond runcard ``kfold`` settings:

.. code-block:: yaml

dataset_inputs:
...

kfold:
loss_type: chi2
replica_statistic: average
fold_statistic: best_worst
partitions:
- datasets:
...
- datasets:
...

An alternative metric that is
sensitive to higher moments of the probability distribution
has been defined in `NNPDF3.0 <https://link.springer.com/article/10.1007/JHEP04(2015)040>`_ [see Eq. (4.6) therein],
namely, the :math:`\varphi` estimator. In the context of hyperopt, :math:`\varphi^{2}` can be calculated for each k-fold as

.. math::
\varphi_{k}^2 = \langle \chi^2_k [ \mathcal{T}[f_{\rm fit}], \mathcal{D} ] \rangle_{\rm rep} - \chi^2_k [ \langle \mathcal{T}[f_{\rm fit}] \rangle_{\rm rep}, \mathcal{D} ],

where the first term represents the usual averaged-over-replicas :math:`\left< \chi^2_{k} \right>_{\rm rep}`
calculated based on the dataset used in the fit (:math:`\mathcal{D}`) and
the theory predictions from each fitted PDF (:math:`f_{\rm fit}`) replica.
The second term involves the calculation of :math:`\chi2` but now with respect to the theory predictions from the central PDF.

On the basis of :math:`\varphi`, we define the loss as

.. math::
L_{3} = \left (\frac{1}{n_{\rm fold}} \sum_{k=1}^{n_{\rm fold}} \varphi_{k}^2 \right)^{-1},

which selects hyperparameters that favor the most conservative extrapolation.
In NNPDF, this figure of merit is chosen using the following settings:

.. code-block:: yaml

kfold:
loss_type: phi2
fold_statistic: average
...

Alternatively, it is currently also possible to combine :math:`L_1` (which is only sensitive to the first moment)
with :math:`L_3` (which provides information on the second moment).
For example, one might minimize :math:`L_1` while monitoring the values of :math:`L_3` for a final model selection.
The optimal approach for this combination is still under development.
All the above options are implemented in the :class:`~n3fit.hyper_optimization.rewards.HyperLoss` class
which is instantiated and monitored within :meth:`~n3fit.model_trainer.ModelTrainer.hyperparametrizable` method.

Restarting hyperoptimization runs
---------------------------------

Expand Down
2 changes: 1 addition & 1 deletion n3fit/runcards/examples/Basic_hyperopt.yml
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ hyperscan_config:
activations: ['sigmoid', 'tanh']

kfold:
target: average
fold_statistic: average
penalties:
- saturation
- patience
Expand Down
29 changes: 24 additions & 5 deletions n3fit/src/n3fit/checks.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@

from n3fit.hyper_optimization import penalties as penalties_module
from n3fit.hyper_optimization import rewards as rewards_module
from n3fit.hyper_optimization.rewards import IMPLEMENTED_LOSSES, IMPLEMENTED_STATS
from reportengine.checks import CheckError, make_argcheck
from validphys.core import PDF
from validphys.pdfbases import check_basis
Expand Down Expand Up @@ -255,15 +256,33 @@ def check_kfold_options(kfold):
raise CheckError(
f"The penalty '{penalty}' is not recognized, ensure it is implemented in hyper_optimization/penalties.py"
)
loss_target = kfold.get("target")
if loss_target is not None:
if not hasattr(rewards_module, loss_target):

loss_type = kfold.get("loss_type")
if loss_type is not None:
if loss_type not in IMPLEMENTED_LOSSES:
raise CheckError(
f"Loss type '{loss_type}' is not recognized, "
"ensure it is implemented in the HyperLoss class in hyper_optimization/rewards.py."
"Options so far are 'chi2' or 'phi2'."
)
replica_statistic = kfold.get("replica_statistic")
if replica_statistic is not None:
if replica_statistic not in IMPLEMENTED_STATS:
raise CheckError(
f"The hyperoptimization target '{loss_target}' loss is not recognized, "
"ensure it is implemented in hyper_optimization/rewards.py"
f"The replica statistic '{replica_statistic}' is not recognized, "
"ensure it is implemented in the HyperLoss class in hyper_optimization/rewards.py"
)
fold_statistic = kfold.get("fold_statistic")
if fold_statistic is not None:
if fold_statistic not in IMPLEMENTED_STATS:
raise CheckError(
f"The fold statistic '{fold_statistic}' is not recognized, "
"ensure it is implemented in the HyperLoss class in hyper_optimization/rewards.py"
)

partitions = kfold["partitions"]
# Check specific errors for specific targets
loss_target = kfold.get("fold_statistic") # TODO: haven't updated this
if loss_target == "fit_future_tests":
if len(partitions) == 1:
raise CheckError("Cannot use target 'fit_future_tests' with just one partition")
Expand Down
8 changes: 7 additions & 1 deletion n3fit/src/n3fit/hyper_optimization/hyper_scan.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
"""
import copy
import logging
from typing import Callable

import hyperopt
from hyperopt.pyll.base import scope
Expand All @@ -24,6 +25,11 @@

log = logging.getLogger(__name__)

# Hyperopt uses these strings for a passed and failed run
# it also has statuses "new", "running" and "suspended", but we don't use them
HYPEROPT_STATUSES = {True: "ok", False: "fail"}


HYPEROPT_SEED = 42


Expand Down Expand Up @@ -118,7 +124,7 @@ def hyper_scan_wrapper(replica_path_set, model_trainer, hyperscanner, max_evals=
parameters of the best trial as found by ``hyperopt``
"""
# Tell the trainer we are doing hpyeropt
model_trainer.set_hyperopt(True, keys=hyperscanner.hyper_keys, status_ok=hyperopt.STATUS_OK)
model_trainer.set_hyperopt(True, keys=hyperscanner.hyper_keys)
# Generate the trials object
trials = FileTrials(replica_path_set, parameters=hyperscanner.as_dict())
# Initialize seed for hyperopt
Expand Down
43 changes: 35 additions & 8 deletions n3fit/src/n3fit/hyper_optimization/penalties.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,11 @@ def saturation(pdf_model=None, n=100, min_x=1e-6, max_x=1e-4, flavors=None, **_k
flavors: list(int)
indices of the flavors to inspect

Returns
-------
NDArray
array of saturation penalties for each replica

Example
-------
>>> from n3fit.hyper_optimization.penalties import saturation
Expand Down Expand Up @@ -72,8 +77,9 @@ def saturation(pdf_model=None, n=100, min_x=1e-6, max_x=1e-4, flavors=None, **_k
return extra_loss


def patience(stopping_object=None, alpha=1e-4, **_kwargs):
"""Adds a penalty for fits that have finished too soon, which
def patience(stopping_object, alpha: float = 1e-4, **_kwargs):
"""
Adds a penalty for fits that have finished too soon, which
means the number of epochs or its patience is not optimal.
The penalty is proportional to the validation loss and will be 0
when the best epoch is exactly at max_epoch - patience
Expand All @@ -85,6 +91,11 @@ def patience(stopping_object=None, alpha=1e-4, **_kwargs):
alpha: float
dumping factor for the exponent

Returns
-------
NDArray
patience penalty for each replica

Example
-------
>>> from n3fit.hyper_optimization.penalties import patience
Expand All @@ -94,11 +105,11 @@ def patience(stopping_object=None, alpha=1e-4, **_kwargs):
3.434143467595683

"""
epoch_best = np.take(stopping_object.e_best_chi2, 0)
epoch_best = np.array(stopping_object.e_best_chi2)
patience = stopping_object.stopping_patience
max_epochs = stopping_object.total_epochs
diff = abs(max_epochs - patience - epoch_best)
vl_loss = np.take(stopping_object.vl_chi2, 0)
vl_loss = np.array(stopping_object.vl_chi2)
return vl_loss * np.exp(alpha * diff)


Expand All @@ -109,6 +120,11 @@ def integrability(pdf_model=None, **_kwargs):

The penalty increases exponentially with the growth of the integrability number

Returns
-------
NDArray
array of integrability penalties for each replica

Example
-------
>>> from n3fit.hyper_optimization.penalties import integrability
Expand All @@ -121,8 +137,19 @@ def integrability(pdf_model=None, **_kwargs):
"""
pdf_instance = N3PDF(pdf_model.split_replicas())
integ_values = integrability_numbers(pdf_instance)
integ_overflow = np.sum(integ_values[integ_values > fitveto.INTEG_THRESHOLD])
if integ_overflow > 50.0:
# before reaching an overflow, just give a stupidly big number
return np.exp(50.0)

# set components under the threshold to 0
integ_values[integ_values <= fitveto.INTEG_THRESHOLD] = 0.0

# sum over flavours
integ_overflow = np.sum(integ_values, axis=-1) # -1 rather than 1 so it works with 1 replica

# Limit components to 50 to avoid overflow
if isinstance(integ_overflow, np.ndarray):
# Case: multi-replica scenario
integ_overflow[integ_overflow > 50.0] = 50.0
elif isinstance(integ_overflow, (float, np.float64)):
# Case: single replica scenario
integ_overflow = min(integ_overflow, 50.0)

return np.exp(integ_overflow) - 1.0
Loading