-
Notifications
You must be signed in to change notification settings - Fork 0
GSoC 2015 Proposal: Global optimization based Hyper parameter optimization
Name: Christof Angermueller
Web : http://cangermueller.com
Github : cangermueller
Twitter : @cangermueller
I am a PhD student in Bioinformatics at the University of Cambridge and European Bioinformatics Institute (EBI/EMBL), supervised by Oliver Stegle and Zoubin Ghahramani. My research is about machine learning algorithms for analyzing high-dimensional biological data, which includes Bayesian matrix factorization methods and deep neural networks. I have been using Python regularly for both my research and as general purpose programming language for more than five years, implemented a Python package for variational Bayesian matrix factorization (vbmfa), and aim to contribute to open source development in the long. To start with, I want to work on the following GSoC topic with scikit learn. I decided to contribute to scikit-learn, since a) I use it on a daily basis for various machine learning tasks, b) it has a very active community, and c) allows me to combine my interests in machine learning and software engineering.
Scikit-learn is a suite for various machine learning algorithms. These algorithms have often hyper-parameters, which are not learnt from data, but specified by the user. Optimizing hyper-parameters is critical for generalization performance, and often time consuming, since it requires retraining the machine learning algorithm and evaluating its fitness for different hyper-parameter settings.
Currently, scikit-learn only provides the module GridSearchCV
[1] for exhaustive grid search, and RandomizedSearchCV
[2] for randomized hyper-parameter optimization. While the first is intractable if the hyper-parameter space is large, the latter has been outperformed by more sophisticated sequential model based optimization (SMBO) algorithms [3], which take previous evaluations of the fitness function into account to choose parameters with the highest chance of improvement.
The goal of the proposed GSoC topic is to implement a SMBO algorithm based on Gaussian Processes (GP) as described by Snoek et al. [4], which a) allows to explore the hyper-parameter space more efficiently and more accurately than GridSearchCV
and RandomizedSearchCV
, b) is well integrated into the scikit-learn infrastructure, and c) is easy to use for transparently optimizing various scikit-learn algorithms.
GPSearchCV
(tentative name) will be a new scikit-learn module for hyper-parameter optimization based on Gaussian Processes [4]. It will be similar to the spearmint
python package [5], but more lightweight and consistent with the scikit-learn framework. It will have same interfaces as GridSearchCV
and RandomizedSearchCV
, and will allow to easily optimize hyper-parameters of different scikit-learn algorithms like SVMs and random forests. Gaussian Processes with Matern 5/2 kernel will be used as prior distribution over the fitness function, and the expected improvement acquisition function will be used to choose the next hyper-parameters. In the first version, kernel parameters will be estimated by maximum likelihood, since integrating over them is currently not supported by the Gaussian Process scikit-learn module, and would also be computationally more costly. Implementing slice sampling for Bayesian treatment of parameters is one of my later goals (see below).
I will write unit tests to assure that my implementation works correctly for different scenarios.
The new GPSearchCV
module will only become popular if it renders superior result over existing modules. I will therefore compare its performance with GridSearchCV
, RandomizedSearchCV
, and spearmint
. I will evaluate their performance to optimize hyper-parameter of different classification and regression algorithms.
I will follow the scikit-learn conventions to document all interfaces, and write code examples on how to use GPSearchCV
. Links to my benchmark results will allow users to judge which optimization module (GridSearchCV
, RandomizedSearchCV
, GPSearchCV
) is most appropriate for their task.
If time remains after I accomplished all milestones described above, I will work on the following points in the given order:
Bayesian treatment of parameters by marginalizing over them instead of maximizing the likelihood is computationally more costly, but improves generalization performance. I will implement slice sampling to approximate the integral over hyper-parameters, which might also be incorporated into the scikit-learn Gaussian Process module.
Instead of waiting until the fitness function has been evaluated for one hyper-parameter configuration before trying out new parameters, several configurations can be evaluated in parallel to speed-up the optimization. I will use all completed, and integrate over pending evaluations to approximate the expected acquisition function, from which multiple hyper-parameter configurations can be chosen in parallel.
I will implement two alternative acquisition functions, which might be more suited for certain optimization tasks than the expected improvement: 1) the probability of improvement, and 2) the GP upper confidence bound [6], which has an additional parameter to trade-off exploitation versus exploration.
The time for evaluating the fitness function can vary considerably for different hyper-parameters, e.g. choosing by a different number of hidden layers in a neural network. I will therefore use a second Gaussian Process to model the time for evaluating the fitness function, and to compute the expected improvement per second.
- Submit several PR requests, if possible related to Gaussian Processes and hyper-parameter optimization
- Familiarize with implementation of
GridSearchCV
,RandomizedSearchCV
, andspearmint
- Read more about hyper-parameter optimization and Gaussian Processes
- Implement Matern 5/2 Kernel and maximum likelihood parameter estimation
- Extend the existing Gaussian Process module and improve its performance if necessary
- Implement expected improvement acquisition function
- Implement search over hyper-parameter space
- Build optimizing pipeline from parts implemented so far
- 1 July: Submit mid-term evaluation
- Sanity check of first prototype by comparing results with
spearmint
- Refactor code
- Test basic features outlined in 3.1
- Slice sampling of hyper-parameters
- Parallelization
- Profile implementation and improve performance
- Perform benchmarking
- Document interfaces
- Write usage examples
- 28 August: Submit code examples to Google
[1] GridSearchCV: http://scikit-learn.org/stable/modules/generated/sklearn.grid_search.GridSearchCV.html#sklearn.grid_search.GridSearchCV
[2] RandomizedSearchCV: http://scikit-learn.org/stable/modules/generated/sklearn.grid_search.RandomizedSearchCV.html#sklearn.grid_search.RandomizedSearchCV
[3] Bergstra et al., “Algorithms for Hyper-Parameter Optimization.”
[4] Snoek, Larochelle, and Adams, “Practical Bayesian Optimization of Machine Learning Algorithms.”
[5] Spearmint: https://github.com/JasperSnoek/spearmint
[6] Srinivas et al., “Gaussian Process Optimization in the Bandit Setting.”