-
Notifications
You must be signed in to change notification settings - Fork 415
Layers
All layer configurations comes into
netconfig = start
layer[from->to] = layer_type:nick
netconfig = end
- from is an integer, 0 means input data
- to is an integer, max integer in layer configuration part is the output.
- layer_type is described below
- nick is an optional
Layers contains weight ( Connection Layers, Convolution Layers ) require random weight initialization. By default it is using this configuration globally:
random_type = gaussian
init_sigma = 0.01
We extra provide Xavier initialization method, by using the configuration
random_type = xavier
Global setting can be override in the layer configuration, eg
# global setting
random_type = gaussian
netconfig = start
eta = 0.1
layer[0->1] = fullc:fc1
# local setting start
nhidden = 50
random_type = xavier
# local setting end
layer[1->2] = relu
layer[2-3] = fullc
# local setting start
nhidden = 6
init_sigma = 0.005
wmat:lr = 0.2
# local setting end
netconfig = end
By using this configuration, the fc1
layer will use Xavier method to do initialization, fully connection layer without nick will use Gaussian random number with mu=0
, sigma=0.005
to do initialization. Meanwhile fully connection layer without nick will use a learning rate different with global.
Globally the network will use Gaussian method to initialize weight, but in fc1
, the weight will be initialized by using Xavier method.
This page will introduce layers supported by cxxnet, including
= Connection Layer
= Activation Layer
= Convolution and Pooling Layer
= Normalization Layer
=
Connection Layer is used to connect two nodes. We provide three connection layers; Flatten Layer , Fully Connection and Drop Connection .
- Flatten Layer is used for flatten convolution layer. After flattening, we can use convolution output in the feed forward neural network. Here is an example:
layer[15->16] = flatten
- Fully Connection Layer fully connection layer is the basic element in feed forward neural network.
layer[18->19] = fullc
nhidden = 1024
- Drop Connection Layer is still in experiment, It drops connection between two layer
layer[18->19] = dropconn
threshold = 0.5
nhidden = 1024
- threshold is the threshold to abandon an edge.
We provide common active layers including Softmax , Rectified Linear , Sigmoid , Tanh, Soft Plus ,and so on. Here we treat Dropout as a special activation layer. Declare layer should follow the general configuration format
layer[ from_num -> to_num ] = layer_type:nick
=
- Rectified Linear need to set to_num different to the from_num , eg
layer[4->5] = relu:rl3
- Tanh need to set to_num different to the from_num , eg
layer[2->3] = tanh:th2
- Sigmoid need to set to_num different to the from_num , eg
layer[2->3] = sigmoid:sg2
- Soft Plus need to set to_num different to the from_num , eg
layer[2->3] = softplus:sp2
- Dropout Layer need to set to_num equal the from_num, eg
layer[3->3] = dropout:dp
threshold = 0.5
- threshold is the threshold to abandon an edge.
- Softmax Layer need to set to_num equal the from_num, eg
layer[5->5] = softmax:sm
=
Our convolution implementation is fastest so far. And it is extremely easy to use. The configuration looks like
layer[0->1] = conv
kernel_size = 11
stride = 4
nchannel = 96
- kernel_size is the convolution kernel size
- stride is stride for convolution operation
- nchannel is the output channel
- temp_col_max is the maximum size in convolution operation. The default value is 64, means the maximum size of temp_col is 64MB. Adjusting this variable may boost speed in training especially the input size is small in the convolution network.
Currectly we provide 3 Pooling methods: Sum Pooling , Max Pooling and Average Pooling . All pooling layers shared same option kinds stride and kernel_size
- Sum Pooling need to set to_num different to the from_num , eg
layer[4->5] = sum_pooling
kernel_size = 3
stride = 2
- Max Pooling need to set to_num different to the from_num , eg
layer[4->5] = max_pooling
kernel_size = 3
stride = 2
- Average Pooling need to set to_num different to the from_num , eg
layer[4->5] = avg_pooling
kernel_size = 3
stride = 2
Currently we provide Local Response Normalization for convolution layer. LRN normalize the response of nearby kernels. Details can be found in the Alex's paper.
layer[3->4] = lrn
local_size = 5
alpha = 0.001
beta = 0.75
knorm = 1
- local_size change the nearby kernel size to be evaluated
- alpha, beta and knorm is normalization param.