-
Notifications
You must be signed in to change notification settings - Fork 965
Error in mobilenet conversion from Tensorflow to Caffe Different way of padding
Model: MobileNets v1 & MobileNets v2
Source: Tensorflow
Destination: Caffe
Author: Jiahao
We test the tensorflow parser and caffe emitter, using the same weights in every layer.
The mobilenet v1 gets low SNR result.
error: 0.61917245
L1 error: 1431.8882
SNR: 5.100172946375377
PSNR: 19.59308572747237
The mobilenet v2 gets different shape from original shape.
Take the first conv layer as examples. It takes the input of 224x224x3 and outputs 112x112x32, with kernel size of 3 and stride 2. In tensorflow, the padding is same, which actually means padding_left=0, padding_right=1, padding_top=0, padding_bottom=1. However, in caffe, the padding is symmetric. p_h = 0, p_w = 0
( or padding_left=0, padding_right=0, padding_top=0, padding_bottom=0
) can also make the output 111x111x32 shape, which is actually the case when mobilenet v2 is converted to caffe.
Even though p_h = 1, p_w = 1
( or padding_left=1, padding_right=1, padding_top=1, padding_bottom=1
) can also make the output 112x112x32 shape, the value of the output can be different because of mismatch in convolution arithmetic computation. That may be the reason resulting in low SNR of mobilent v1 conversion.
Mxnet also uses symmetric padding in convolution layer like caffe. However, the problem mentioned above can be solved by adding padding layer before the convolution layer.
Following the traditional way of conversion mentioned in tutorial, one might find this trick in converted code.
input = mx.sym.var('input')
MobilenetV2_Conv_Conv2D_pad = mx.sym.pad(data = input, mode = 'constant', pad_width=(0, 0, 0, 0, 0L, 1L, 0L, 1L), constant_value = 0.0, name = 'MobilenetV2/Conv/Conv2D_pad')
MobilenetV2_Conv_Conv2D = mx.sym.Convolution(data=MobilenetV2_Conv_Conv2D_pad, kernel=(3L, 3L), stride=(2L, 2L), dilate = (1, 1), num_filter = 32, num_group = 1, no_bias = True, layout = 'NCHW', name = 'MobilenetV2/Conv/Conv2D')
Different way of padding results in different shape after convolution.
Since the paddings in caffe are symmetric, the shapes after this convolution layer are 111x112, 112x111, or 112x112 (with different values)
Possible solutions to this problem can be either adding padding layer if non-symmetric padding when padding layer is implemented in caffe.