Remark: The code is updated from the ICML version. The ICML version corresponds to a commit on May 25, 2018.
This is a Theano(>=1.0.0) implementation of "Functional gradient boosting based on residual network perception".
ResFGB is a functional gradient boosting method for learning a resnet-like deep neural network for non-linear classification problems. The model is composed of a linear classifier such as logistic regression and support vector machine, and a feature extraction. In each iteration, these components are trained by alternate optimization, that is, a linear classifier is trained to classify obtained samples through a feature extraction and this extraction map is updated by stacking a resnet-type layer to move samples along the direction of increasing the linear separability. We finally obtain a highly non-linear classifier forming a residual network.
A simple pseudocode is provided below.
Note: (X,Y)
: training data, (Xv,Yv)
: validation data, (Xt,Yt)
: test data.
These are numpy arrays.
n_data
: the number of training data, input_dim
: dimension of the input space, n_class
: the number of classes.
A label set should be an integer sequence starting with zero.
from resfgb.models import ResFGB, get_hyperparams
hparams = get_hyperparams( n_data, input_dim, n_class )
model = ResFGB( **hparams )
best_iters,_ ,_ = model.fit( X, Y, Xv, Yv, use_best_iter=True )
train_loss, train_acc = model.evaluate( X, Y )
print( 'train_loss: {0}, train_acc: {1}'.format(train_loss, train_acc) )
test_loss, test_acc = model.evaluate( Xt, Yt )
print( 'test_loss : {0}, test_acc : {1}'.format(test_loss, test_acc) )
See examples/sample_resfgb.py
for more detail.
Hyperparameters of ResFGB are mainly divided three types: the first is for learning a linear classifier, the second is for learning a multi-layer network as a resblock, and the other is for the functional gradient method.
The hyperparameters are listed below.
'Default' is a value set by the function resfgb.models.get_hyperparams
.
input_dim
and n_class
stand for the dimension of the input space and the number of classes, respectively.
shape
[default=(input_dim, n_class)]- Shape of the linear model, which should not be changed.
wr
[default=1/n_data]- L2-regularization parameter.
bias
[default=True]- Flag for whether to include bias term or not.
eta
[default=1e-2]- Learning rate for Nesterov's momentum method.
momentum
[default=0.9]- Momentum parameter for Nesterov's momentum method.
minibatch_size
[default=100]- Minibatch size to compute stochastic gradients.
max_epoch
[default=100]- The number of epochs for learning a linear model.
tune_eta
[default=True]- Flag for whether to tune learning rate or not.
scale
[default=1.0]- Positive number by which a tuned learning rate is multiplied.
eval_iters
[default=1000]- The number of iterations in a trial for tuning learning rate.
early_stop
[default=10]- When the training loss does not improve while this number of epochs, the training is stopped.
shape
[default=(input_dim,100,100,100,100,input_dim)]- Shape of the multi-layer perceptron. Dimensions of the input and last layer should set to input_dim.
wr
[default=1/n_data]- L2-regularization parameter.
eta
[default=1e-2]- Learning rate for Nesterov's momentum method.
momentum
[default=0.9]- Momentum parameter for Nesterov's momentum method.
minibatch_size
[default=100]- Minibatch size to compute stochastic gradients.
max_epoch
[default=50]- The number of epochs for learning a linear model.
tune_eta
[default=True]- Flag for whether to tune learning rate or not.
scale
[default=1.0]- Positive number by which a tuned learning rate is multiplied.
eval_iters
[default=1000]- The number of iterations in a trial for tuning learning rate.
early_stop
[default=10]- When the training loss does not improve while this number of epochs, the training is stopped.
model_type
[default='logistic']- Type of the linear model: 'logistic' or 'smooth_hinge'.
model_hparams
[default=model_hparams]- Dictionary of the hyperparameter for the linear model.
resblock_hparams
[default=resblock_hparams]- Dictionary of the hyperparameter for the resblock.
fg_eta
[default=1e-1]- Learning rate used in the functional gradient method.
max_iters
[default=30- The number of iterations of the functional gradient method, which corresponds to the depth of an obtained network.
seed
[default=1]- Random seed used in the method.