ImageNet performance #1

ppwwyyxx · 2018-05-10T22:11:27Z

This repo uses fb.resnet.torch for ImageNet experiments. However the performance of ResNet-50 in fb.resnet.torch is 24.01: https://github.com/facebook/fb.resnet.torch. The performance of ResNet-50 baseline reported in the DBN paper is 24.87. Why is that?

huangleiBuaa · 2018-05-12T06:37:32Z

@ppwwyyxx , I guess the difference is probably from the different version of cudnn.
I ran the experiments of Res-18/Res-34 on the machine with cudnn-5.0, while I ran the experiment of Res-50/Res-101 on another machine (I don't remember the version of cudnn and I currently have no access to that machine). It seems that Res-18 and Res-34 have similar results with the experiments on https://github.com/facebook/fb.resnet.torch.

ppwwyyxx · 2018-05-12T07:07:23Z

Perhaps.

Another question: in this repo there are resnet_BN vs resnet_DBN_scale_L1 and preresnet_BN vs preresnet_DBN_scale_L1. Which pair was used as the experiments in the paper? I didn't see the paper mention the use of preresnet but the README here mentions it.

huangleiBuaa · 2018-05-12T07:36:57Z

@ppwwyyxx
It's resnet_BN vs resnent_DBN_scale_L1 as described in the paper. Thanks for your pointing out, I will revise the README.

Actually, we also ran preresnet-18 and preresnet-34 (The same configuration with res-18 and res-34 described in this repo). The respective results are: 30.44 vs 29.97 (preresnet-18 vs preresnet-DBN-scale-L1-18), 26.76 vs 26.44 (preresnet-34 vs preresnet-DBN-scale-L1-34). We also further ran extra 10 epochs with learning rate diving 10 (100 epochs in total), which is the common setups in recent papers), then we get the results: 29.79 vs 29.31 (preresnet-18 vs preresnet-DBN-scale-18), 26.01 vs 25.76 (preresnet-34 vs preresnet-DBN-scale-L1-34).

ppwwyyxx · 2018-05-12T07:39:24Z

Thanks. The diff shows that resnet_DBN_scale_L1 has one more convolution layer, which means the comparison between it and resnet_BN is not fair:

$ diff resnet_BN.lua resnet_DBN_scale_L1.lua
13a14
> require 'cudbn'
121a123,126
>       model:add(Convolution(64,64,3,3,1,1,1,1))
>       model:add(nn.Spatial_DBN_opt(64,opt.m_perGroup, opt.eps,_,true))
>       model:add(ReLU(true))
>

huangleiBuaa · 2018-05-12T08:11:27Z

There is no big difference, I guess. You can check the model of preresnet.lua and preresnet-DBN-scale.lua (They have the same number convolution). I also did experiments with the original preresnet (without this extra conv, on 18 and 34 layer), the original preresnet has slightly better performance (30.38 vs 30.44 for 18 layers, 26.66 vs 26.76 for 34 layers). So I guess there is no big deal for the original residual network with this extra convolution. If you are interested in it, you can validate it. I also can run the experiments, however, I only have one machine with 8 GPUs available (and shared with other Lab members), it may take long times to get the results.

JaeDukSeo · 2019-05-03T03:13:11Z

thanks for this

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ImageNet performance #1

ImageNet performance #1

ppwwyyxx commented May 10, 2018

huangleiBuaa commented May 12, 2018

ppwwyyxx commented May 12, 2018 •

edited

Loading

huangleiBuaa commented May 12, 2018

ppwwyyxx commented May 12, 2018

huangleiBuaa commented May 12, 2018

JaeDukSeo commented May 3, 2019

ImageNet performance #1

ImageNet performance #1

Comments

ppwwyyxx commented May 10, 2018

huangleiBuaa commented May 12, 2018

ppwwyyxx commented May 12, 2018 • edited Loading

huangleiBuaa commented May 12, 2018

ppwwyyxx commented May 12, 2018

huangleiBuaa commented May 12, 2018

JaeDukSeo commented May 3, 2019

ppwwyyxx commented May 12, 2018 •

edited

Loading