This repository is a collection of machine learning benchmark for DeepHyper.
The repository follows this organization:
# Python package containing utility code
deephyper_benchmark/
# Library of benchmarks
lib/
To install the DeepHyper benchmark suite, run:
git clone https://github.com/deephyper/benchmark.git deephyper_benchmark
cd deephyper_benchmark/
pip install -e "."
A benchmark is defined as a sub-folder of the lib/
folder such as lib/Benchmark-101/
. Then a benchmark folder needs to follow a python package structure and therefore it needs to contain a __init__.py
file at its root. In addition, a benchmark folder needs to define a benchmark.py
script that defines its requirements.
General benchmark structure:
lib/
Benchmark-101/
__init__.py
benchmark.py
data.py
model.py
hpo.py # Defines hyperparameter optimization inputs (run-function + problem)
README.md # Description of the benchmark
Then to use the benchmark:
import deephyper_benchmark as dhb
dhb.install("Benchmark-101")
dhb.load("Benchmark-101")
from deephyper_benchmark.lib.benchmark_101.hpo import problem, run
All run
-functions (i.e., functions returning the objective(s) to be optimized) should follow the MAXIMIZATION standard. If a benchmark needs minimization then the negative of the minimized objective can be returned return -minimized_objective
.
A benchmark inherits from the Benchmark
class:
import os
from deephyper_benchmark import *
DIR = os.path.dirname(os.path.abspath(__file__))
class Benchmark101(Benchmark):
version = "0.0.1"
requires = {
"bash-install": {"type": "cmd", "cmd": "cd .. && " + os.path.join(DIR, "../install.sh")},
}
Finally, when testing a benchmark it can be useful to activate the logging:
import logging
logging.basicConfig(
# filename="deephyper.log", # Uncomment if you want to create a file with the logs
level=logging.INFO,
format="%(asctime)s - %(levelname)s - %(filename)s:%(funcName)s - %(message)s",
force=True,
)
Benchmarks can sometimes be configured. The configuration can use environment variables with the prefix DEEPHYPER_BENCHMARK_
.
Benchmarks must return the following standard metadata when it applies, some metadata are specific to neural networks (e.g., num_parameters
):
-
num_parameters
: integer value of the number of parameters in the neural network. -
num_parameters_train
: integer value of the number of trainable parameters of the neural network. -
budget
: scalar value (float/int) of the budget consumed by the neural network. Therefore the budget should be defined for each benchmark (e.g., number of epochs in general). -
stopped
: boolean value indicating if the evaluation was stopped before consuming the maximum budget. train_X
: scalar value of the training metrics (replaceX
by the metric name, 1 key per metric).valid_X
: scalar value of the validation metrics (replaceX
by the metric name, 1 key per metric).test_X
: scalar value of the testing metrics (replaceX
by the metric name, 1 key per metric).-
flops
: number of flops of the model such as computed infvcore.nn.FlopCountAnalysis(...).total()
(See documentation). -
latency
: TO BE CLARIFIED -
lc_train_X
: recorded learning curves of the trained model, thebi
variables are the budget value (e.g., epochs/batches), and theyi
values are the recorded metric.X
intrain_X
is replaced by the name of the metric such astrain_loss
ortrain_accuracy
. The format is[[b0, y0], [b1, y1], ...]
. -
lc_valid_X
: Same aslc_train_X
but for validation data.
The @profile
decorator should be used on all run
-functions to collect the timestamp_start
and timestamp_end
metadata.
In the following table:
-
$\mathbb{R}$ denotes real parameters. -
$\mathbb{D}$ denotes discrete parameters. -
$\mathbb{C}$ denotes categorical parameters.
Name | Description | Variable(s) Type | Objective(s) Type | Multi-Objective | Multi-Fidelity | Evaluation Duration |
---|---|---|---|---|---|---|
C-BBO | Continuous Black-Box Optimization problems. | ❌ | ❌ | configurable | ||
DTLZ | The modified DTLZ multiobjective test suite. | ✅ | ❌ | configurable | ||
ECP-Candle | Deep Neural-Networks on multiple "biological" scales of Cancer related data. | ✅ | ✅ | min | ||
HPOBench | Hyperparameter Optimization Benchmark. | ✅ | ✅ | ms to min | ||
JAHSBench | A slightly modified JAHSBench 201 wrapper. | ✅ | ❌ | configurable | ||
LCu | Learning curve hyperparameter optimization benchmark. | |||||
LCbench | Multi-fidelity benchmark without hyperparameter optimization. | NA | ❌ | ✅ | secondes | |
PINNBench | Physics Informed Neural Networks Benchmark. | ✅ | ✅ | ms | ||
- COBYQA:
deephyper_benchmark.search.COBYQA(...)
- PyBOBYQA:
deephyper_benchmark.search.PyBOBYQA(...)
- TPE:
deephyper_benchmark.search.MPIDistributedOptuna(..., sampler="TPE")
- BoTorch:
deephyper_benchmark.search.MPIDistributedOptuna(..., sampler="BOTORCH")
- CMAES:
deephyper_benchmark.search.MPIDistributedOptuna(..., sampler="CMAES")
- NSGAII:
deephyper_benchmark.search.MPIDistributedOptuna(..., sampler="NSGAII")
- QMC:
deephyper_benchmark.search.MPIDistributedOptuna(..., sampler="QMC")
- SMAC:
deephyper_benchmark.search.SMAC(...)