This repository is a collection of machine learning benchmark for DeepHyper.
The repository follows this organization:
# Python package containing utility code
deephyper_benchmark/
# Library of benchmarks
lib/To install the DeepHyper benchmark suite, run:
git clone https://github.com/deephyper/benchmark.git deephyper_benchmark
cd deephyper_benchmark/
pip install -e "."A benchmark is defined as a sub-folder of the lib/ folder such as lib/Benchmark-101/. Then a benchmark folder needs to follow a python package structure and therefore it needs to contain a __init__.py file at its root. In addition, a benchmark folder needs to define a benchmark.py script that defines its requirements.
General benchmark structure:
lib/
Benchmark-101/
__init__.py
benchmark.py
data.py
model.py
hpo.py # Defines hyperparameter optimization inputs (run-function + problem)
README.md # Description of the benchmark
Then to use the benchmark:
import deephyper_benchmark as dhb
dhb.install("Benchmark-101")
dhb.load("Benchmark-101")
from deephyper_benchmark.lib.benchmark_101.hpo import problem, runAll run-functions (i.e., functions returning the objective(s) to be optimized) should follow the MAXIMIZATION standard. If a benchmark needs minimization then the negative of the minimized objective can be returned return -minimized_objective.
A benchmark inherits from the Benchmark class:
import os
from deephyper_benchmark import *
DIR = os.path.dirname(os.path.abspath(__file__))
class Benchmark101(Benchmark):
version = "0.0.1"
requires = {
"bash-install": {"type": "cmd", "cmd": "cd .. && " + os.path.join(DIR, "../install.sh")},
}Finally, when testing a benchmark it can be useful to activate the logging:
import logging
logging.basicConfig(
# filename="deephyper.log", # Uncomment if you want to create a file with the logs
level=logging.INFO,
format="%(asctime)s - %(levelname)s - %(filename)s:%(funcName)s - %(message)s",
force=True,
)Benchmarks can sometimes be configured. The configuration can use environment variables with the prefix DEEPHYPER_BENCHMARK_.
Benchmarks must return the following standard metadata when it applies, some metadata are specific to neural networks (e.g., num_parameters):
-
num_parameters: integer value of the number of parameters in the neural network. -
num_parameters_train: integer value of the number of trainable parameters of the neural network. -
budget: scalar value (float/int) of the budget consumed by the neural network. Therefore the budget should be defined for each benchmark (e.g., number of epochs in general). -
stopped: boolean value indicating if the evaluation was stopped before consuming the maximum budget. train_X: scalar value of the training metrics (replaceXby the metric name, 1 key per metric).valid_X: scalar value of the validation metrics (replaceXby the metric name, 1 key per metric).test_X: scalar value of the testing metrics (replaceXby the metric name, 1 key per metric).-
flops: number of flops of the model such as computed infvcore.nn.FlopCountAnalysis(...).total()(See documentation). -
latency: TO BE CLARIFIED -
lc_train_X: recorded learning curves of the trained model, thebivariables are the budget value (e.g., epochs/batches), and theyivalues are the recorded metric.Xintrain_Xis replaced by the name of the metric such astrain_lossortrain_accuracy. The format is[[b0, y0], [b1, y1], ...]. -
lc_valid_X: Same aslc_train_Xbut for validation data.
The @profile decorator should be used on all run-functions to collect the timestamp_start and timestamp_end metadata.
In the following table:
-
$\mathbb{R}$ denotes real parameters. -
$\mathbb{D}$ denotes discrete parameters. -
$\mathbb{C}$ denotes categorical parameters.
| Name | Description | Variable(s) Type | Objective(s) Type | Multi-Objective | Multi-Fidelity | Evaluation Duration |
|---|---|---|---|---|---|---|
| C-BBO | Continuous Black-Box Optimization problems. | ❌ | ❌ | configurable | ||
| DTLZ | The modified DTLZ multiobjective test suite. | ✅ | ❌ | configurable | ||
| ECP-Candle | Deep Neural-Networks on multiple "biological" scales of Cancer related data. | ✅ | ✅ | min | ||
| HPOBench | Hyperparameter Optimization Benchmark. | ✅ | ✅ | ms to min | ||
| JAHSBench | A slightly modified JAHSBench 201 wrapper. | ✅ | ❌ | configurable | ||
| LCu | Learning curve hyperparameter optimization benchmark. | |||||
| LCbench | Multi-fidelity benchmark without hyperparameter optimization. | NA | ❌ | ✅ | secondes | |
| PINNBench | Physics Informed Neural Networks Benchmark. | ✅ | ✅ | ms | ||
- COBYQA:
deephyper_benchmark.search.COBYQA(...) - PyBOBYQA:
deephyper_benchmark.search.PyBOBYQA(...) - TPE:
deephyper_benchmark.search.MPIDistributedOptuna(..., sampler="TPE") - BoTorch:
deephyper_benchmark.search.MPIDistributedOptuna(..., sampler="BOTORCH") - CMAES:
deephyper_benchmark.search.MPIDistributedOptuna(..., sampler="CMAES") - NSGAII:
deephyper_benchmark.search.MPIDistributedOptuna(..., sampler="NSGAII") - QMC:
deephyper_benchmark.search.MPIDistributedOptuna(..., sampler="QMC") - SMAC:
deephyper_benchmark.search.SMAC(...)