Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSD300 Detection Time #123

Open
CHUNYUWANG opened this issue Aug 18, 2016 · 22 comments
Open

SSD300 Detection Time #123

CHUNYUWANG opened this issue Aug 18, 2016 · 22 comments

Comments

@CHUNYUWANG
Copy link

I was testing SSD 300 on a new dataset consisting of 30 object classes. Testing on an image (cropped to 300*300) takes about 0.04 seconds (25fps). This is about two times slower than 58fps. I am using cudnn v4, Titan X, on windows server 2012. Do you know what factors can cause the difference?

@weiliu89
Copy link
Owner

Did you use the ssd_pascal_speed.py?

@CHUNYUWANG
Copy link
Author

Thanks for the quick response. No, I used the ssd_pascal.py instead. Can you briefly talk about the difference?

@weiliu89
Copy link
Owner

use vimdiff if you are a vim user

@Li1991
Copy link

Li1991 commented Aug 23, 2016

When I use the model we trained to detect ,we can only get the speed of 26fps. I remember that you have get 58fps in your paper.I just want to know why you are so fast? We use the pic of size 300x300, and the GPU we use is K40. The cpp file we use is ssd_detect.cpp from the file we download from your github address!
You said you use ssd_pascal_speed.py, how to use it?

@thomaspreece
Copy link

The author suggested in another issue that the Titan X (card used by author) is about twice as powerful as the K40 which would explain your 26fps.
What however is not clear is why the author gets roughly the same fps for faster rcnn as the figures quoted in the faster rcnn paper (5fps for vgg16, 17fps for ZF) as the author is using a titanX whereas the faster rcnn paper is using a K40.
Can you please explain this?

@weiliu89
Copy link
Owner

https://github.com/ShaoqingRen/faster_rcnn has the speed for using K40 and Titan X. I haven't tested the speed of Faster R-CNN myself.

@thomaspreece
Copy link

Thanks for that. I find it very odd that using the titan x with faster rcnn produces very little improvement. Thats a question for the faster rcnn folks though.

@Li1991
Copy link

Li1991 commented Aug 25, 2016

I have uesed the same as you, but only get 26fps. Can you tell me about how to be faster? I use the ssd_detect.cpp @weiliu89

@weiliu89
Copy link
Owner

@Li1991 You could try ssd_pascal_speed.py and you can see how long it takes to process the whole VOC07 test dataset. And from your previous comment, you said you are using K40, which is slower than Titan X.

@Li1991
Copy link

Li1991 commented Aug 25, 2016

Ok, Thank you very much! I will have a try ! @weiliu89

@jzwang1
Copy link

jzwang1 commented Sep 21, 2016

Hi @weiliu89 ,

I was benchmarking SSD on voc07, using Titan X and cudnn. SSD 500 is about 13 fps, and SSD 300 is about 23 fps (similar to @CHUNYUWANG ). Since both are two times slower than what you reported, I guess I might did something wrong. Any insights of the reasons are much appreciated!

@weiliu89
Copy link
Owner

@jzwang1 Did you use ssd_pascal_speed.py to see how much time it takes to process 4952 image? Besides I would also suggest set debug_info to True to see which step is time-consuming. Notice that the speed reported in the paper is using batch size of 8. And I run this on a server (with powerful CPU as well). So I would guess your speed is affected either by I/O speed or data-processing or non-maximum suppression step. Also make sure no one else is running on the same GPU.

@jzwang1
Copy link

jzwang1 commented Sep 21, 2016

Thank you for your suggestions, @weiliu89 . Much appreciated! I was using the following caffe interface to benchmark the time taken by one forward pass:

./build/tools/caffe time --model models/VGGNet/VOC0712/SSD_300x300/deploy.prototxt --gpu 0

This above command gives:

I0920 17:00:35.315847 47696 caffe.cpp:401] Average time per layer:
I0920 17:00:35.315856 47696 caffe.cpp:404] input forward: 0.00172544 ms.
I0920 17:00:35.315866 47696 caffe.cpp:407] input backward: 0.00186176 ms.
I0920 17:00:35.315876 47696 caffe.cpp:404] data_input_0_split forward: 0.0023264 ms.
I0920 17:00:35.315893 47696 caffe.cpp:407] data_input_0_split backward: 0.0020096 ms.
I0920 17:00:35.315903 47696 caffe.cpp:404] conv1_1 forward: 0.409583 ms.
I0920 17:00:35.315922 47696 caffe.cpp:407] conv1_1 backward: 0.00763904 ms.
I0920 17:00:35.315930 47696 caffe.cpp:404] relu1_1 forward: 0.196339 ms.
I0920 17:00:35.315942 47696 caffe.cpp:407] relu1_1 backward: 0.00185792 ms.
I0920 17:00:35.315956 47696 caffe.cpp:404] conv1_2 forward: 0.973155 ms.
I0920 17:00:35.315966 47696 caffe.cpp:407] conv1_2 backward: 0.00775616 ms.
I0920 17:00:35.315979 47696 caffe.cpp:404] relu1_2 forward: 0.195208 ms.
I0920 17:00:35.315994 47696 caffe.cpp:407] relu1_2 backward: 0.00193664 ms.
I0920 17:00:35.316005 47696 caffe.cpp:404] pool1 forward: 0.142433 ms.
I0920 17:00:35.316018 47696 caffe.cpp:407] pool1 backward: 0.00192 ms.
I0920 17:00:35.316030 47696 caffe.cpp:404] conv2_1 forward: 0.50658 ms.
I0920 17:00:35.316040 47696 caffe.cpp:407] conv2_1 backward: 0.00699904 ms.
I0920 17:00:35.316051 47696 caffe.cpp:404] relu2_1 forward: 0.105338 ms.
I0920 17:00:35.316061 47696 caffe.cpp:407] relu2_1 backward: 0.00192704 ms.
I0920 17:00:35.316071 47696 caffe.cpp:404] conv2_2 forward: 0.762468 ms.
I0920 17:00:35.316081 47696 caffe.cpp:407] conv2_2 backward: 0.00705088 ms.
I0920 17:00:35.316089 47696 caffe.cpp:404] relu2_2 forward: 0.10462 ms.
I0920 17:00:35.316102 47696 caffe.cpp:407] relu2_2 backward: 0.0019936 ms.
I0920 17:00:35.316112 47696 caffe.cpp:404] pool2 forward: 0.0754099 ms.
I0920 17:00:35.316120 47696 caffe.cpp:407] pool2 backward: 0.0018592 ms.
I0920 17:00:35.316130 47696 caffe.cpp:404] conv3_1 forward: 0.423645 ms.
I0920 17:00:35.316140 47696 caffe.cpp:407] conv3_1 backward: 0.869139 ms.
I0920 17:00:35.316150 47696 caffe.cpp:404] relu3_1 forward: 0.0545638 ms.
I0920 17:00:35.316160 47696 caffe.cpp:407] relu3_1 backward: 0.00188608 ms.
I0920 17:00:35.316170 47696 caffe.cpp:404] conv3_2 forward: 0.745635 ms.
I0920 17:00:35.316181 47696 caffe.cpp:407] conv3_2 backward: 1.90238 ms.
I0920 17:00:35.316190 47696 caffe.cpp:404] relu3_2 forward: 0.0541421 ms.
I0920 17:00:35.316200 47696 caffe.cpp:407] relu3_2 backward: 0.0018272 ms.
I0920 17:00:35.316210 47696 caffe.cpp:404] conv3_3 forward: 0.74524 ms.
I0920 17:00:35.316220 47696 caffe.cpp:407] conv3_3 backward: 1.90659 ms.
I0920 17:00:35.316231 47696 caffe.cpp:404] relu3_3 forward: 0.0540499 ms.
I0920 17:00:35.316241 47696 caffe.cpp:407] relu3_3 backward: 0.00180672 ms.
I0920 17:00:35.316251 47696 caffe.cpp:404] pool3 forward: 0.0481709 ms.
I0920 17:00:35.316260 47696 caffe.cpp:407] pool3 backward: 0.00224896 ms.
I0920 17:00:35.316270 47696 caffe.cpp:404] conv4_1 forward: 0.50175 ms.
I0920 17:00:35.316279 47696 caffe.cpp:407] conv4_1 backward: 1.79344 ms.
I0920 17:00:35.316289 47696 caffe.cpp:404] relu4_1 forward: 0.0211034 ms.
I0920 17:00:35.316299 47696 caffe.cpp:407] relu4_1 backward: 0.00180608 ms.
I0920 17:00:35.316309 47696 caffe.cpp:404] conv4_2 forward: 1.33381 ms.
I0920 17:00:35.316319 47696 caffe.cpp:407] conv4_2 backward: 3.72427 ms.
I0920 17:00:35.316329 47696 caffe.cpp:404] relu4_2 forward: 0.0218515 ms.
I0920 17:00:35.316339 47696 caffe.cpp:407] relu4_2 backward: 0.00187136 ms.
I0920 17:00:35.316359 47696 caffe.cpp:404] conv4_3 forward: 1.35525 ms.
I0920 17:00:35.316376 47696 caffe.cpp:407] conv4_3 backward: 3.70396 ms.
I0920 17:00:35.316385 47696 caffe.cpp:404] relu4_3 forward: 0.021376 ms.
I0920 17:00:35.316395 47696 caffe.cpp:407] relu4_3 backward: 0.00188928 ms.
I0920 17:00:35.316403 47696 caffe.cpp:404] conv4_3_relu4_3_0_split forward: 0.0019456 ms.
I0920 17:00:35.316413 47696 caffe.cpp:407] conv4_3_relu4_3_0_split backward: 0.00193344 ms.
I0920 17:00:35.316421 47696 caffe.cpp:404] pool4 forward: 0.0261709 ms.
I0920 17:00:35.316429 47696 caffe.cpp:407] pool4 backward: 0.00188224 ms.
I0920 17:00:35.316438 47696 caffe.cpp:404] conv5_1 forward: 0.411267 ms.
I0920 17:00:35.316447 47696 caffe.cpp:407] conv5_1 backward: 1.73606 ms.
I0920 17:00:35.316457 47696 caffe.cpp:404] relu5_1 forward: 0.0127366 ms.
I0920 17:00:35.316467 47696 caffe.cpp:407] relu5_1 backward: 0.00194816 ms.
I0920 17:00:35.316475 47696 caffe.cpp:404] conv5_2 forward: 0.411683 ms.
I0920 17:00:35.316483 47696 caffe.cpp:407] conv5_2 backward: 1.71365 ms.
I0920 17:00:35.316493 47696 caffe.cpp:404] relu5_2 forward: 0.0127213 ms.
I0920 17:00:35.316503 47696 caffe.cpp:407] relu5_2 backward: 0.00185024 ms.
I0920 17:00:35.316512 47696 caffe.cpp:404] conv5_3 forward: 0.410236 ms.
I0920 17:00:35.316520 47696 caffe.cpp:407] conv5_3 backward: 1.69861 ms.
I0920 17:00:35.316529 47696 caffe.cpp:404] relu5_3 forward: 0.012807 ms.
I0920 17:00:35.316537 47696 caffe.cpp:407] relu5_3 backward: 0.00195584 ms.
I0920 17:00:35.316546 47696 caffe.cpp:404] pool5 forward: 0.0300595 ms.
I0920 17:00:35.316555 47696 caffe.cpp:407] pool5 backward: 0.00201216 ms.
I0920 17:00:35.316565 47696 caffe.cpp:404] fc6 forward: 2.18304 ms.
I0920 17:00:35.316575 47696 caffe.cpp:407] fc6 backward: 0.790854 ms.
I0920 17:00:35.316582 47696 caffe.cpp:404] relu6 forward: 0.0155597 ms.
I0920 17:00:35.316591 47696 caffe.cpp:407] relu6 backward: 0.00191488 ms.
I0920 17:00:35.316599 47696 caffe.cpp:404] fc7 forward: 0.194488 ms.
I0920 17:00:35.316608 47696 caffe.cpp:407] fc7 backward: 0.912402 ms.
I0920 17:00:35.316618 47696 caffe.cpp:404] relu7 forward: 0.0144832 ms.
I0920 17:00:35.316627 47696 caffe.cpp:407] relu7 backward: 0.00189952 ms.
I0920 17:00:35.316637 47696 caffe.cpp:404] fc7_relu7_0_split forward: 0.00211456 ms.
I0920 17:00:35.316644 47696 caffe.cpp:407] fc7_relu7_0_split backward: 0.00189824 ms.
I0920 17:00:35.316654 47696 caffe.cpp:404] conv6_1 forward: 0.179386 ms.
I0920 17:00:35.316664 47696 caffe.cpp:407] conv6_1 backward: 0.265288 ms.
I0920 17:00:35.316673 47696 caffe.cpp:404] conv6_1_relu forward: 0.0107091 ms.
I0920 17:00:35.316684 47696 caffe.cpp:407] conv6_1_relu backward: 0.00189312 ms.
I0920 17:00:35.316691 47696 caffe.cpp:404] conv6_2 forward: 0.205999 ms.
I0920 17:00:35.316699 47696 caffe.cpp:407] conv6_2 backward: 0.454163 ms.
I0920 17:00:35.316709 47696 caffe.cpp:404] conv6_2_relu forward: 0.0105216 ms.
I0920 17:00:35.316720 47696 caffe.cpp:407] conv6_2_relu backward: 0.00192128 ms.
I0920 17:00:35.316730 47696 caffe.cpp:404] conv6_2_conv6_2_relu_0_split forward: 0.00203072 ms.
I0920 17:00:35.316738 47696 caffe.cpp:407] conv6_2_conv6_2_relu_0_split backward: 0.0018784 ms.
I0920 17:00:35.316746 47696 caffe.cpp:404] conv7_1 forward: 0.11219 ms.
I0920 17:00:35.316757 47696 caffe.cpp:407] conv7_1 backward: 0.0719315 ms.
I0920 17:00:35.316768 47696 caffe.cpp:404] conv7_1_relu forward: 0.0102618 ms.
I0920 17:00:35.316777 47696 caffe.cpp:407] conv7_1_relu backward: 0.0018816 ms.
I0920 17:00:35.316787 47696 caffe.cpp:404] conv7_2 forward: 0.149859 ms.
I0920 17:00:35.316795 47696 caffe.cpp:407] conv7_2 backward: 0.0821242 ms.
I0920 17:00:35.316804 47696 caffe.cpp:404] conv7_2_relu forward: 0.00955008 ms.
I0920 17:00:35.316813 47696 caffe.cpp:407] conv7_2_relu backward: 0.00202304 ms.
I0920 17:00:35.316823 47696 caffe.cpp:404] conv7_2_conv7_2_relu_0_split forward: 0.00203456 ms.
I0920 17:00:35.316839 47696 caffe.cpp:407] conv7_2_conv7_2_relu_0_split backward: 0.00189376 ms.
I0920 17:00:35.316850 47696 caffe.cpp:404] conv8_1 forward: 0.0757005 ms.
I0920 17:00:35.316859 47696 caffe.cpp:407] conv8_1 backward: 0.0486957 ms.
I0920 17:00:35.316869 47696 caffe.cpp:404] conv8_1_relu forward: 0.00964608 ms.
I0920 17:00:35.316877 47696 caffe.cpp:407] conv8_1_relu backward: 0.00190016 ms.
I0920 17:00:35.316887 47696 caffe.cpp:404] conv8_2 forward: 0.144488 ms.
I0920 17:00:35.316896 47696 caffe.cpp:407] conv8_2 backward: 0.0801523 ms.
I0920 17:00:35.316905 47696 caffe.cpp:404] conv8_2_relu forward: 0.00934784 ms.
I0920 17:00:35.316912 47696 caffe.cpp:407] conv8_2_relu backward: 0.00193024 ms.
I0920 17:00:35.316921 47696 caffe.cpp:404] conv8_2_conv8_2_relu_0_split forward: 0.00202176 ms.
I0920 17:00:35.316931 47696 caffe.cpp:407] conv8_2_conv8_2_relu_0_split backward: 0.00190016 ms.
I0920 17:00:35.316941 47696 caffe.cpp:404] pool6 forward: 0.0162157 ms.
I0920 17:00:35.316949 47696 caffe.cpp:407] pool6 backward: 0.0249178 ms.
I0920 17:00:35.316957 47696 caffe.cpp:404] pool6_pool6_0_split forward: 0.00201152 ms.
I0920 17:00:35.316967 47696 caffe.cpp:407] pool6_pool6_0_split backward: 0.00191232 ms.
I0920 17:00:35.316977 47696 caffe.cpp:404] conv4_3_norm forward: 0.140894 ms.
I0920 17:00:35.316987 47696 caffe.cpp:407] conv4_3_norm backward: 0.0860666 ms.
I0920 17:00:35.316995 47696 caffe.cpp:404] conv4_3_norm_conv4_3_norm_0_split forward: 0.00202688 ms.
I0920 17:00:35.317004 47696 caffe.cpp:407] conv4_3_norm_conv4_3_norm_0_split backward: 0.00186752 ms.
I0920 17:00:35.317013 47696 caffe.cpp:404] conv4_3_norm_mbox_loc forward: 0.135181 ms.
I0920 17:00:35.317023 47696 caffe.cpp:407] conv4_3_norm_mbox_loc backward: 0.223034 ms.
I0920 17:00:35.317031 47696 caffe.cpp:404] conv4_3_norm_mbox_loc_perm forward: 0.455466 ms.
I0920 17:00:35.317041 47696 caffe.cpp:407] conv4_3_norm_mbox_loc_perm backward: 0.0206432 ms.
I0920 17:00:35.317049 47696 caffe.cpp:404] conv4_3_norm_mbox_loc_flat forward: 0.0023552 ms.
I0920 17:00:35.317057 47696 caffe.cpp:407] conv4_3_norm_mbox_loc_flat backward: 0.00193216 ms.
I0920 17:00:35.317067 47696 caffe.cpp:404] conv4_3_norm_mbox_conf forward: 0.187476 ms.
I0920 17:00:35.317075 47696 caffe.cpp:407] conv4_3_norm_mbox_conf backward: 0.342875 ms.
I0920 17:00:35.317085 47696 caffe.cpp:404] conv4_3_norm_mbox_conf_perm forward: 0.462397 ms.
I0920 17:00:35.317095 47696 caffe.cpp:407] conv4_3_norm_mbox_conf_perm backward: 0.039639 ms.
I0920 17:00:35.317103 47696 caffe.cpp:404] conv4_3_norm_mbox_conf_flat forward: 0.00211136 ms.
I0920 17:00:35.317111 47696 caffe.cpp:407] conv4_3_norm_mbox_conf_flat backward: 0.00211584 ms.
I0920 17:00:35.317121 47696 caffe.cpp:404] conv4_3_norm_mbox_priorbox forward: 0.11353 ms.
I0920 17:00:35.317131 47696 caffe.cpp:407] conv4_3_norm_mbox_priorbox backward: 0.00200896 ms.
I0920 17:00:35.317142 47696 caffe.cpp:404] fc7_mbox_loc forward: 0.21679 ms.
I0920 17:00:35.317150 47696 caffe.cpp:407] fc7_mbox_loc backward: 0.171496 ms.
I0920 17:00:35.317158 47696 caffe.cpp:404] fc7_mbox_loc_perm forward: 0.502195 ms.
I0920 17:00:35.317167 47696 caffe.cpp:407] fc7_mbox_loc_perm backward: 0.0132128 ms.
I0920 17:00:35.317176 47696 caffe.cpp:404] fc7_mbox_loc_flat forward: 0.00211456 ms.
I0920 17:00:35.317188 47696 caffe.cpp:407] fc7_mbox_loc_flat backward: 0.00196736 ms.
I0920 17:00:35.317198 47696 caffe.cpp:404] fc7_mbox_conf forward: 0.265835 ms.
I0920 17:00:35.317206 47696 caffe.cpp:407] fc7_mbox_conf backward: 0.795565 ms.
I0920 17:00:35.317215 47696 caffe.cpp:404] fc7_mbox_conf_perm forward: 0.475324 ms.
I0920 17:00:35.317224 47696 caffe.cpp:407] fc7_mbox_conf_perm backward: 0.0235616 ms.
I0920 17:00:35.317231 47696 caffe.cpp:404] fc7_mbox_conf_flat forward: 0.00202304 ms.
I0920 17:00:35.317240 47696 caffe.cpp:407] fc7_mbox_conf_flat backward: 0.00193792 ms.
I0920 17:00:35.317247 47696 caffe.cpp:404] fc7_mbox_priorbox forward: 0.0556474 ms.
I0920 17:00:35.317261 47696 caffe.cpp:407] fc7_mbox_priorbox backward: 0.00192448 ms.
I0920 17:00:35.317278 47696 caffe.cpp:404] conv6_2_mbox_loc forward: 0.120418 ms.
I0920 17:00:35.317288 47696 caffe.cpp:407] conv6_2_mbox_loc backward: 0.0844448 ms.
I0920 17:00:35.317297 47696 caffe.cpp:404] conv6_2_mbox_loc_perm forward: 0.556711 ms.
I0920 17:00:35.317306 47696 caffe.cpp:407] conv6_2_mbox_loc_perm backward: 0.0128154 ms.
I0920 17:00:35.317314 47696 caffe.cpp:404] conv6_2_mbox_loc_flat forward: 0.00194624 ms.
I0920 17:00:35.317323 47696 caffe.cpp:407] conv6_2_mbox_loc_flat backward: 0.00190016 ms.
I0920 17:00:35.317332 47696 caffe.cpp:404] conv6_2_mbox_conf forward: 0.146482 ms.
I0920 17:00:35.317342 47696 caffe.cpp:407] conv6_2_mbox_conf backward: 0.164652 ms.
I0920 17:00:35.317353 47696 caffe.cpp:404] conv6_2_mbox_conf_perm forward: 0.605221 ms.
I0920 17:00:35.317361 47696 caffe.cpp:407] conv6_2_mbox_conf_perm backward: 0.0131059 ms.
I0920 17:00:35.317370 47696 caffe.cpp:404] conv6_2_mbox_conf_flat forward: 0.00196736 ms.
I0920 17:00:35.317378 47696 caffe.cpp:407] conv6_2_mbox_conf_flat backward: 0.0019904 ms.
I0920 17:00:35.317386 47696 caffe.cpp:404] conv6_2_mbox_priorbox forward: 0.0176922 ms.
I0920 17:00:35.317395 47696 caffe.cpp:407] conv6_2_mbox_priorbox backward: 0.00191232 ms.
I0920 17:00:35.317405 47696 caffe.cpp:404] conv7_2_mbox_loc forward: 0.0833082 ms.
I0920 17:00:35.317416 47696 caffe.cpp:407] conv7_2_mbox_loc backward: 0.0571725 ms.
I0920 17:00:35.317425 47696 caffe.cpp:404] conv7_2_mbox_loc_perm forward: 0.53201 ms.
I0920 17:00:35.317433 47696 caffe.cpp:407] conv7_2_mbox_loc_perm backward: 0.0118419 ms.
I0920 17:00:35.317441 47696 caffe.cpp:404] conv7_2_mbox_loc_flat forward: 0.00195328 ms.
I0920 17:00:35.317451 47696 caffe.cpp:407] conv7_2_mbox_loc_flat backward: 0.00196032 ms.
I0920 17:00:35.317461 47696 caffe.cpp:404] conv7_2_mbox_conf forward: 0.087399 ms.
I0920 17:00:35.317471 47696 caffe.cpp:407] conv7_2_mbox_conf backward: 0.0832819 ms.
I0920 17:00:35.317479 47696 caffe.cpp:404] conv7_2_mbox_conf_perm forward: 0.664561 ms.
I0920 17:00:35.317487 47696 caffe.cpp:407] conv7_2_mbox_conf_perm backward: 0.0126438 ms.
I0920 17:00:35.317497 47696 caffe.cpp:404] conv7_2_mbox_conf_flat forward: 0.00196864 ms.
I0920 17:00:35.317505 47696 caffe.cpp:407] conv7_2_mbox_conf_flat backward: 0.0024416 ms.
I0920 17:00:35.317515 47696 caffe.cpp:404] conv7_2_mbox_priorbox forward: 0.00607936 ms.
I0920 17:00:35.317524 47696 caffe.cpp:407] conv7_2_mbox_priorbox backward: 0.00193984 ms.
I0920 17:00:35.317533 47696 caffe.cpp:404] conv8_2_mbox_loc forward: 0.0765971 ms.
I0920 17:00:35.317541 47696 caffe.cpp:407] conv8_2_mbox_loc backward: 0.0570688 ms.
I0920 17:00:35.317550 47696 caffe.cpp:404] conv8_2_mbox_loc_perm forward: 0.45165 ms.
I0920 17:00:35.317564 47696 caffe.cpp:407] conv8_2_mbox_loc_perm backward: 0.0115981 ms.
I0920 17:00:35.317574 47696 caffe.cpp:404] conv8_2_mbox_loc_flat forward: 0.00196864 ms.
I0920 17:00:35.317582 47696 caffe.cpp:407] conv8_2_mbox_loc_flat backward: 0.00195072 ms.
I0920 17:00:35.317590 47696 caffe.cpp:404] conv8_2_mbox_conf forward: 0.0897779 ms.
I0920 17:00:35.317600 47696 caffe.cpp:407] conv8_2_mbox_conf backward: 0.0641338 ms.
I0920 17:00:35.317610 47696 caffe.cpp:404] conv8_2_mbox_conf_perm forward: 0.295352 ms.
I0920 17:00:35.317620 47696 caffe.cpp:407] conv8_2_mbox_conf_perm backward: 0.0119674 ms.
I0920 17:00:35.317628 47696 caffe.cpp:404] conv8_2_mbox_conf_flat forward: 0.00193024 ms.
I0920 17:00:35.317637 47696 caffe.cpp:407] conv8_2_mbox_conf_flat backward: 0.00194176 ms.
I0920 17:00:35.317647 47696 caffe.cpp:404] conv8_2_mbox_priorbox forward: 0.00364544 ms.
I0920 17:00:35.317658 47696 caffe.cpp:407] conv8_2_mbox_priorbox backward: 0.00192 ms.
I0920 17:00:35.317667 47696 caffe.cpp:404] pool6_mbox_loc forward: 0.0763162 ms.
I0920 17:00:35.317675 47696 caffe.cpp:407] pool6_mbox_loc backward: 0.0550477 ms.
I0920 17:00:35.317683 47696 caffe.cpp:404] pool6_mbox_loc_perm forward: 0.285896 ms.
I0920 17:00:35.317690 47696 caffe.cpp:407] pool6_mbox_loc_perm backward: 0.0120141 ms.
I0920 17:00:35.317698 47696 caffe.cpp:404] pool6_mbox_loc_flat forward: 0.00216768 ms.
I0920 17:00:35.317713 47696 caffe.cpp:407] pool6_mbox_loc_flat backward: 0.0019424 ms.
I0920 17:00:35.317723 47696 caffe.cpp:404] pool6_mbox_conf forward: 0.0890682 ms.
I0920 17:00:35.317731 47696 caffe.cpp:407] pool6_mbox_conf backward: 0.0671814 ms.
I0920 17:00:35.317740 47696 caffe.cpp:404] pool6_mbox_conf_perm forward: 0.328555 ms.
I0920 17:00:35.317749 47696 caffe.cpp:407] pool6_mbox_conf_perm backward: 0.0117453 ms.
I0920 17:00:35.317756 47696 caffe.cpp:404] pool6_mbox_conf_flat forward: 0.00246592 ms.
I0920 17:00:35.317764 47696 caffe.cpp:407] pool6_mbox_conf_flat backward: 0.00194752 ms.
I0920 17:00:35.317772 47696 caffe.cpp:404] pool6_mbox_priorbox forward: 0.00213952 ms.
I0920 17:00:35.317780 47696 caffe.cpp:407] pool6_mbox_priorbox backward: 0.00195264 ms.
I0920 17:00:35.317788 47696 caffe.cpp:404] mbox_loc forward: 0.031392 ms.
I0920 17:00:35.317796 47696 caffe.cpp:407] mbox_loc backward: 0.0022496 ms.
I0920 17:00:35.317806 47696 caffe.cpp:404] mbox_conf forward: 0.0349331 ms.
I0920 17:00:35.317816 47696 caffe.cpp:407] mbox_conf backward: 0.00778304 ms.
I0920 17:00:35.317824 47696 caffe.cpp:404] mbox_priorbox forward: 0.681747 ms.
I0920 17:00:35.317834 47696 caffe.cpp:407] mbox_priorbox backward: 0.00247872 ms.
I0920 17:00:35.317842 47696 caffe.cpp:404] mbox_conf_reshape forward: 0.00217984 ms.
I0920 17:00:35.317852 47696 caffe.cpp:407] mbox_conf_reshape backward: 0.00191744 ms.
I0920 17:00:35.317862 47696 caffe.cpp:404] mbox_conf_softmax forward: 0.169134 ms.
I0920 17:00:35.317869 47696 caffe.cpp:407] mbox_conf_softmax backward: 0.00193792 ms.
I0920 17:00:35.317878 47696 caffe.cpp:404] mbox_conf_flatten forward: 0.00194176 ms.
I0920 17:00:35.317888 47696 caffe.cpp:407] mbox_conf_flatten backward: 0.00174656 ms.
I0920 17:00:35.317910 47696 caffe.cpp:412] Average Forward pass: 43.5817 ms.
I0920 17:00:35.317921 47696 caffe.cpp:414] Average Backward pass: 40.3919 ms.
I0920 17:00:35.317931 47696 caffe.cpp:416] Average Forward-Backward: 84.5878 ms.
I0920 17:00:35.317939 47696 caffe.cpp:418] Total Time: 4229.39 ms.

Where at the last three lines, we see average forward pass takes 43ms for SSD 300. I guess this time is irrelevant from the batch size? I am also running this on a server, and no one else is using the same GPU.

I also tested using ssd_pascal_speed.py just now. It takes 101s to finish 4952 images (see below). So roughly 50fps, which is two times faster than what reported by the above caffe interface. I am not quite sure which one measures the running time more appropriately. Or is there anything special in ssd_pascal_speed.py that causes such difference?

I0921 09:09:09.878713 48790 net.cpp:693] Ignoring source layer mbox_loss
I0921 09:09:12.580425 48790 blocking_queue.cpp:50] Data layer prefetch queue empty
I0921 09:10:53.682924 48790 solver.cpp:325] Optimization Done.
I0921 09:10:53.683087 48790 caffe.cpp:254] Optimization Done.

@weiliu89
Copy link
Owner

weiliu89 commented Sep 21, 2016

Here is what I get when I run:

./build/tools/caffe time --model models/VGGNet/VOC0712/SSD_300x300/deploy.prototxt --gpu 0

I0921 12:44:57.308749 93668 caffe.cpp:369] *** Benchmark begins ***
I0921 12:44:57.308753 93668 caffe.cpp:370] Testing for 50 iterations.
I0921 12:44:57.374699 93668 caffe.cpp:398] Iteration: 1 forward-backward time: 65.7729 ms.
I0921 12:44:57.417562 93668 caffe.cpp:398] Iteration: 2 forward-backward time: 42.8266 ms.
I0921 12:44:57.458479 93668 caffe.cpp:398] Iteration: 3 forward-backward time: 40.8865 ms.
I0921 12:44:57.500021 93668 caffe.cpp:398] Iteration: 4 forward-backward time: 41.5003 ms.
I0921 12:44:57.541098 93668 caffe.cpp:398] Iteration: 5 forward-backward time: 41.0242 ms.
I0921 12:44:57.581739 93668 caffe.cpp:398] Iteration: 6 forward-backward time: 40.5996 ms.
I0921 12:44:57.620677 93668 caffe.cpp:398] Iteration: 7 forward-backward time: 38.898 ms.
I0921 12:44:57.659832 93668 caffe.cpp:398] Iteration: 8 forward-backward time: 39.1141 ms.
I0921 12:44:57.698798 93668 caffe.cpp:398] Iteration: 9 forward-backward time: 38.9249 ms.
I0921 12:44:57.737757 93668 caffe.cpp:398] Iteration: 10 forward-backward time: 38.9188 ms.
I0921 12:44:57.776656 93668 caffe.cpp:398] Iteration: 11 forward-backward time: 38.8588 ms.
I0921 12:44:57.815783 93668 caffe.cpp:398] Iteration: 12 forward-backward time: 39.084 ms.
I0921 12:44:57.855101 93668 caffe.cpp:398] Iteration: 13 forward-backward time: 39.2744 ms.
I0921 12:44:57.894239 93668 caffe.cpp:398] Iteration: 14 forward-backward time: 39.0862 ms.
I0921 12:44:57.933421 93668 caffe.cpp:398] Iteration: 15 forward-backward time: 39.1409 ms.
I0921 12:44:57.972398 93668 caffe.cpp:398] Iteration: 16 forward-backward time: 38.9353 ms.
I0921 12:44:58.011224 93668 caffe.cpp:398] Iteration: 17 forward-backward time: 38.7851 ms.
I0921 12:44:58.050153 93668 caffe.cpp:398] Iteration: 18 forward-backward time: 38.8893 ms.
I0921 12:44:58.089375 93668 caffe.cpp:398] Iteration: 19 forward-backward time: 39.1794 ms.
I0921 12:44:58.128445 93668 caffe.cpp:398] Iteration: 20 forward-backward time: 39.0299 ms.
I0921 12:44:58.167655 93668 caffe.cpp:398] Iteration: 21 forward-backward time: 39.1678 ms.
I0921 12:44:58.207145 93668 caffe.cpp:398] Iteration: 22 forward-backward time: 39.449 ms.
I0921 12:44:58.246228 93668 caffe.cpp:398] Iteration: 23 forward-backward time: 39.0412 ms.
I0921 12:44:58.285327 93668 caffe.cpp:398] Iteration: 24 forward-backward time: 39.0581 ms.
I0921 12:44:58.324494 93668 caffe.cpp:398] Iteration: 25 forward-backward time: 39.126 ms.
I0921 12:44:58.363543 93668 caffe.cpp:398] Iteration: 26 forward-backward time: 39.0065 ms.
I0921 12:44:58.402640 93668 caffe.cpp:398] Iteration: 27 forward-backward time: 39.0564 ms.
I0921 12:44:58.441819 93668 caffe.cpp:398] Iteration: 28 forward-backward time: 39.1377 ms.
I0921 12:44:58.480762 93668 caffe.cpp:398] Iteration: 29 forward-backward time: 38.903 ms.
I0921 12:44:58.519987 93668 caffe.cpp:398] Iteration: 30 forward-backward time: 39.1828 ms.
I0921 12:44:58.559106 93668 caffe.cpp:398] Iteration: 31 forward-backward time: 39.0784 ms.
I0921 12:44:58.598469 93668 caffe.cpp:398] Iteration: 32 forward-backward time: 39.3226 ms.
I0921 12:44:58.637629 93668 caffe.cpp:398] Iteration: 33 forward-backward time: 39.1198 ms.
I0921 12:44:58.676589 93668 caffe.cpp:398] Iteration: 34 forward-backward time: 38.9187 ms.
I0921 12:44:58.715849 93668 caffe.cpp:398] Iteration: 35 forward-backward time: 39.2193 ms.
I0921 12:44:58.755058 93668 caffe.cpp:398] Iteration: 36 forward-backward time: 39.1665 ms.
I0921 12:44:58.794318 93668 caffe.cpp:398] Iteration: 37 forward-backward time: 39.2167 ms.
I0921 12:44:58.833540 93668 caffe.cpp:398] Iteration: 38 forward-backward time: 39.192 ms.
I0921 12:44:58.872761 93668 caffe.cpp:398] Iteration: 39 forward-backward time: 39.1796 ms.
I0921 12:44:58.911931 93668 caffe.cpp:398] Iteration: 40 forward-backward time: 39.1294 ms.
I0921 12:44:58.951001 93668 caffe.cpp:398] Iteration: 41 forward-backward time: 39.03 ms.
I0921 12:44:58.990162 93668 caffe.cpp:398] Iteration: 42 forward-backward time: 39.119 ms.
I0921 12:44:59.029446 93668 caffe.cpp:398] Iteration: 43 forward-backward time: 39.2429 ms.
I0921 12:44:59.068653 93668 caffe.cpp:398] Iteration: 44 forward-backward time: 39.1658 ms.
I0921 12:44:59.107841 93668 caffe.cpp:398] Iteration: 45 forward-backward time: 39.1468 ms.
I0921 12:44:59.146932 93668 caffe.cpp:398] Iteration: 46 forward-backward time: 39.0368 ms.
I0921 12:44:59.185886 93668 caffe.cpp:398] Iteration: 47 forward-backward time: 38.9128 ms.
I0921 12:44:59.224895 93668 caffe.cpp:398] Iteration: 48 forward-backward time: 38.9667 ms.
I0921 12:44:59.263950 93668 caffe.cpp:398] Iteration: 49 forward-backward time: 39.0142 ms.
I0921 12:44:59.302883 93668 caffe.cpp:398] Iteration: 50 forward-backward time: 38.893 ms.
I0921 12:44:59.302913 93668 caffe.cpp:401] Average time per layer:
I0921 12:44:59.302922 93668 caffe.cpp:404] input forward: 0.00154816 ms.
I0921 12:44:59.302932 93668 caffe.cpp:407] input backward: 0.00151744 ms.
I0921 12:44:59.302938 93668 caffe.cpp:404] data_input_0_split forward: 0.00227136 ms.
I0921 12:44:59.302947 93668 caffe.cpp:407] data_input_0_split backward: 0.00154176 ms.
I0921 12:44:59.302954 93668 caffe.cpp:404] conv1_1 forward: 0.424646 ms.
I0921 12:44:59.302963 93668 caffe.cpp:407] conv1_1 backward: 0.016128 ms.
I0921 12:44:59.302969 93668 caffe.cpp:404] relu1_1 forward: 0.197293 ms.
I0921 12:44:59.302978 93668 caffe.cpp:407] relu1_1 backward: 0.00149056 ms.
I0921 12:44:59.302985 93668 caffe.cpp:404] conv1_2 forward: 0.998214 ms.
I0921 12:44:59.302991 93668 caffe.cpp:407] conv1_2 backward: 0.0247533 ms.
I0921 12:44:59.302999 93668 caffe.cpp:404] relu1_2 forward: 0.196303 ms.
I0921 12:44:59.303005 93668 caffe.cpp:407] relu1_2 backward: 0.00150784 ms.
I0921 12:44:59.303014 93668 caffe.cpp:404] pool1 forward: 0.139887 ms.
I0921 12:44:59.303021 93668 caffe.cpp:407] pool1 backward: 0.00152064 ms.
I0921 12:44:59.303028 93668 caffe.cpp:404] conv2_1 forward: 0.528038 ms.
I0921 12:44:59.303035 93668 caffe.cpp:407] conv2_1 backward: 0.0148608 ms.
I0921 12:44:59.303043 93668 caffe.cpp:404] relu2_1 forward: 0.106312 ms.
I0921 12:44:59.303050 93668 caffe.cpp:407] relu2_1 backward: 0.00150592 ms.
I0921 12:44:59.303058 93668 caffe.cpp:404] conv2_2 forward: 0.807416 ms.
I0921 12:44:59.303066 93668 caffe.cpp:407] conv2_2 backward: 0.0158195 ms.
I0921 12:44:59.303073 93668 caffe.cpp:404] relu2_2 forward: 0.106395 ms.
I0921 12:44:59.303086 93668 caffe.cpp:407] relu2_2 backward: 0.00150976 ms.
I0921 12:44:59.303092 93668 caffe.cpp:404] pool2 forward: 0.0754982 ms.
I0921 12:44:59.303100 93668 caffe.cpp:407] pool2 backward: 0.00153792 ms.
I0921 12:44:59.303113 93668 caffe.cpp:404] conv3_1 forward: 0.446676 ms.
I0921 12:44:59.303122 93668 caffe.cpp:407] conv3_1 backward: 0.947684 ms.
I0921 12:44:59.303134 93668 caffe.cpp:404] relu3_1 forward: 0.053824 ms.
I0921 12:44:59.303141 93668 caffe.cpp:407] relu3_1 backward: 0.00151808 ms.
I0921 12:44:59.303153 93668 caffe.cpp:404] conv3_2 forward: 0.782561 ms.
I0921 12:44:59.303160 93668 caffe.cpp:407] conv3_2 backward: 1.78214 ms.
I0921 12:44:59.303170 93668 caffe.cpp:404] relu3_2 forward: 0.0540435 ms.
I0921 12:44:59.303179 93668 caffe.cpp:407] relu3_2 backward: 0.00153088 ms.
I0921 12:44:59.303186 93668 caffe.cpp:404] conv3_3 forward: 0.780499 ms.
I0921 12:44:59.303195 93668 caffe.cpp:407] conv3_3 backward: 1.78259 ms.
I0921 12:44:59.303202 93668 caffe.cpp:404] relu3_3 forward: 0.0538778 ms.
I0921 12:44:59.303210 93668 caffe.cpp:407] relu3_3 backward: 0.00151104 ms.
I0921 12:44:59.303217 93668 caffe.cpp:404] pool3 forward: 0.0491603 ms.
I0921 12:44:59.303228 93668 caffe.cpp:407] pool3 backward: 0.00149248 ms.
I0921 12:44:59.303236 93668 caffe.cpp:404] conv4_1 forward: 0.532324 ms.
I0921 12:44:59.303247 93668 caffe.cpp:407] conv4_1 backward: 1.56729 ms.
I0921 12:44:59.303254 93668 caffe.cpp:404] relu4_1 forward: 0.0187469 ms.
I0921 12:44:59.303263 93668 caffe.cpp:407] relu4_1 backward: 0.0015136 ms.
I0921 12:44:59.303270 93668 caffe.cpp:404] conv4_2 forward: 1.22821 ms.
I0921 12:44:59.303279 93668 caffe.cpp:407] conv4_2 backward: 3.12129 ms.
I0921 12:44:59.303287 93668 caffe.cpp:404] relu4_2 forward: 0.0185389 ms.
I0921 12:44:59.303295 93668 caffe.cpp:407] relu4_2 backward: 0.00152512 ms.
I0921 12:44:59.303315 93668 caffe.cpp:404] conv4_3 forward: 1.23005 ms.
I0921 12:44:59.303325 93668 caffe.cpp:407] conv4_3 backward: 3.14453 ms.
I0921 12:44:59.303333 93668 caffe.cpp:404] relu4_3 forward: 0.0186202 ms.
I0921 12:44:59.303340 93668 caffe.cpp:407] relu4_3 backward: 0.00156096 ms.
I0921 12:44:59.303349 93668 caffe.cpp:404] conv4_3_relu4_3_0_split forward: 0.00159872 ms.
I0921 12:44:59.303359 93668 caffe.cpp:407] conv4_3_relu4_3_0_split backward: 0.0015488 ms.
I0921 12:44:59.303366 93668 caffe.cpp:404] pool4 forward: 0.027152 ms.
I0921 12:44:59.303375 93668 caffe.cpp:407] pool4 backward: 0.00149056 ms.
I0921 12:44:59.303381 93668 caffe.cpp:404] conv5_1 forward: 0.43581 ms.
I0921 12:44:59.303390 93668 caffe.cpp:407] conv5_1 backward: 1.52789 ms.
I0921 12:44:59.303398 93668 caffe.cpp:404] relu5_1 forward: 0.0124826 ms.
I0921 12:44:59.303406 93668 caffe.cpp:407] relu5_1 backward: 0.00153408 ms.
I0921 12:44:59.303413 93668 caffe.cpp:404] conv5_2 forward: 0.433536 ms.
I0921 12:44:59.303422 93668 caffe.cpp:407] conv5_2 backward: 1.5252 ms.
I0921 12:44:59.303431 93668 caffe.cpp:404] relu5_2 forward: 0.0125158 ms.
I0921 12:44:59.303438 93668 caffe.cpp:407] relu5_2 backward: 0.00150528 ms.
I0921 12:44:59.303447 93668 caffe.cpp:404] conv5_3 forward: 0.432739 ms.
I0921 12:44:59.303455 93668 caffe.cpp:407] conv5_3 backward: 1.52493 ms.
I0921 12:44:59.303463 93668 caffe.cpp:404] relu5_3 forward: 0.0123366 ms.
I0921 12:44:59.303472 93668 caffe.cpp:407] relu5_3 backward: 0.00153216 ms.
I0921 12:44:59.303479 93668 caffe.cpp:404] pool5 forward: 0.0312538 ms.
I0921 12:44:59.303488 93668 caffe.cpp:407] pool5 backward: 0.00154368 ms.
I0921 12:44:59.303495 93668 caffe.cpp:404] fc6 forward: 1.91961 ms.
I0921 12:44:59.303503 93668 caffe.cpp:407] fc6 backward: 0.801802 ms.
I0921 12:44:59.303510 93668 caffe.cpp:404] relu6 forward: 0.0151642 ms.
I0921 12:44:59.303519 93668 caffe.cpp:407] relu6 backward: 0.00149888 ms.
I0921 12:44:59.303527 93668 caffe.cpp:404] fc7 forward: 0.203574 ms.
I0921 12:44:59.303535 93668 caffe.cpp:407] fc7 backward: 0.968319 ms.
I0921 12:44:59.303544 93668 caffe.cpp:404] relu7 forward: 0.0139654 ms.
I0921 12:44:59.303551 93668 caffe.cpp:407] relu7 backward: 0.00154048 ms.
I0921 12:44:59.303560 93668 caffe.cpp:404] fc7_relu7_0_split forward: 0.00183232 ms.
I0921 12:44:59.303567 93668 caffe.cpp:407] fc7_relu7_0_split backward: 0.0015136 ms.
I0921 12:44:59.303575 93668 caffe.cpp:404] conv6_1 forward: 0.186532 ms.
I0921 12:44:59.303583 93668 caffe.cpp:407] conv6_1 backward: 0.276173 ms.
I0921 12:44:59.303591 93668 caffe.cpp:404] conv6_1_relu forward: 0.0103987 ms.
I0921 12:44:59.303597 93668 caffe.cpp:407] conv6_1_relu backward: 0.00150144 ms.
I0921 12:44:59.303606 93668 caffe.cpp:404] conv6_2 forward: 0.21057 ms.
I0921 12:44:59.303614 93668 caffe.cpp:407] conv6_2 backward: 0.457754 ms.
I0921 12:44:59.303622 93668 caffe.cpp:404] conv6_2_relu forward: 0.0104614 ms.
I0921 12:44:59.303629 93668 caffe.cpp:407] conv6_2_relu backward: 0.0015328 ms.
I0921 12:44:59.303637 93668 caffe.cpp:404] conv6_2_conv6_2_relu_0_split forward: 0.00162304 ms.
I0921 12:44:59.303647 93668 caffe.cpp:407] conv6_2_conv6_2_relu_0_split backward: 0.00150976 ms.
I0921 12:44:59.303656 93668 caffe.cpp:404] conv7_1 forward: 0.114451 ms.
I0921 12:44:59.303664 93668 caffe.cpp:407] conv7_1 backward: 0.0725914 ms.
I0921 12:44:59.303673 93668 caffe.cpp:404] conv7_1_relu forward: 0.00943552 ms.
I0921 12:44:59.303680 93668 caffe.cpp:407] conv7_1_relu backward: 0.00155456 ms.
I0921 12:44:59.303689 93668 caffe.cpp:404] conv7_2 forward: 0.152168 ms.
I0921 12:44:59.303699 93668 caffe.cpp:407] conv7_2 backward: 0.0929325 ms.
I0921 12:44:59.303706 93668 caffe.cpp:404] conv7_2_relu forward: 0.00910272 ms.
I0921 12:44:59.303714 93668 caffe.cpp:407] conv7_2_relu backward: 0.00152 ms.
I0921 12:44:59.303722 93668 caffe.cpp:404] conv7_2_conv7_2_relu_0_split forward: 0.00165888 ms.
I0921 12:44:59.303737 93668 caffe.cpp:407] conv7_2_conv7_2_relu_0_split backward: 0.00151936 ms.
I0921 12:44:59.303746 93668 caffe.cpp:404] conv8_1 forward: 0.0767846 ms.
I0921 12:44:59.303756 93668 caffe.cpp:407] conv8_1 backward: 0.0482573 ms.
I0921 12:44:59.303763 93668 caffe.cpp:404] conv8_1_relu forward: 0.00931904 ms.
I0921 12:44:59.303771 93668 caffe.cpp:407] conv8_1_relu backward: 0.00150528 ms.
I0921 12:44:59.303779 93668 caffe.cpp:404] conv8_2 forward: 0.145694 ms.
I0921 12:44:59.303786 93668 caffe.cpp:407] conv8_2 backward: 0.0856538 ms.
I0921 12:44:59.303794 93668 caffe.cpp:404] conv8_2_relu forward: 0.00906112 ms.
I0921 12:44:59.303802 93668 caffe.cpp:407] conv8_2_relu backward: 0.00151488 ms.
I0921 12:44:59.303809 93668 caffe.cpp:404] conv8_2_conv8_2_relu_0_split forward: 0.00163584 ms.
I0921 12:44:59.303817 93668 caffe.cpp:407] conv8_2_conv8_2_relu_0_split backward: 0.00156224 ms.
I0921 12:44:59.303825 93668 caffe.cpp:404] pool6 forward: 0.0168096 ms.
I0921 12:44:59.303833 93668 caffe.cpp:407] pool6 backward: 0.00150016 ms.
I0921 12:44:59.303841 93668 caffe.cpp:404] pool6_pool6_0_split forward: 0.001568 ms.
I0921 12:44:59.303850 93668 caffe.cpp:407] pool6_pool6_0_split backward: 0.00152448 ms.
I0921 12:44:59.303858 93668 caffe.cpp:404] conv4_3_norm forward: 0.146181 ms.
I0921 12:44:59.303866 93668 caffe.cpp:407] conv4_3_norm backward: 0.0898246 ms.
I0921 12:44:59.303874 93668 caffe.cpp:404] conv4_3_norm_conv4_3_norm_0_split forward: 0.00156736 ms.
I0921 12:44:59.303881 93668 caffe.cpp:407] conv4_3_norm_conv4_3_norm_0_split backward: 0.00152192 ms.
I0921 12:44:59.303889 93668 caffe.cpp:404] conv4_3_norm_mbox_loc forward: 0.140976 ms.
I0921 12:44:59.303899 93668 caffe.cpp:407] conv4_3_norm_mbox_loc backward: 0.231925 ms.
I0921 12:44:59.303906 93668 caffe.cpp:404] conv4_3_norm_mbox_loc_perm forward: 0.0306867 ms.
I0921 12:44:59.303915 93668 caffe.cpp:407] conv4_3_norm_mbox_loc_perm backward: 0.023993 ms.
I0921 12:44:59.303922 93668 caffe.cpp:404] conv4_3_norm_mbox_loc_flat forward: 0.00198144 ms.
I0921 12:44:59.303930 93668 caffe.cpp:407] conv4_3_norm_mbox_loc_flat backward: 0.00170816 ms.
I0921 12:44:59.303938 93668 caffe.cpp:404] conv4_3_norm_mbox_conf forward: 0.195863 ms.
I0921 12:44:59.303947 93668 caffe.cpp:407] conv4_3_norm_mbox_conf backward: 0.362148 ms.
I0921 12:44:59.303954 93668 caffe.cpp:404] conv4_3_norm_mbox_conf_perm forward: 0.0530637 ms.
I0921 12:44:59.303962 93668 caffe.cpp:407] conv4_3_norm_mbox_conf_perm backward: 0.0414976 ms.
I0921 12:44:59.303972 93668 caffe.cpp:404] conv4_3_norm_mbox_conf_flat forward: 0.00188032 ms.
I0921 12:44:59.303980 93668 caffe.cpp:407] conv4_3_norm_mbox_conf_flat backward: 0.00150784 ms.
I0921 12:44:59.303988 93668 caffe.cpp:404] conv4_3_norm_mbox_priorbox forward: 0.120214 ms.
I0921 12:44:59.303997 93668 caffe.cpp:407] conv4_3_norm_mbox_priorbox backward: 0.00150144 ms.
I0921 12:44:59.304004 93668 caffe.cpp:404] fc7_mbox_loc forward: 0.219078 ms.
I0921 12:44:59.304013 93668 caffe.cpp:407] fc7_mbox_loc backward: 0.180834 ms.
I0921 12:44:59.304021 93668 caffe.cpp:404] fc7_mbox_loc_perm forward: 0.0277632 ms.
I0921 12:44:59.304029 93668 caffe.cpp:407] fc7_mbox_loc_perm backward: 0.0133946 ms.
I0921 12:44:59.304038 93668 caffe.cpp:404] fc7_mbox_loc_flat forward: 0.00175616 ms.
I0921 12:44:59.304045 93668 caffe.cpp:407] fc7_mbox_loc_flat backward: 0.00150208 ms.
I0921 12:44:59.304054 93668 caffe.cpp:404] fc7_mbox_conf forward: 0.277156 ms.
I0921 12:44:59.304061 93668 caffe.cpp:407] fc7_mbox_conf backward: 0.795271 ms.
I0921 12:44:59.304069 93668 caffe.cpp:404] fc7_mbox_conf_perm forward: 0.0386618 ms.
I0921 12:44:59.304077 93668 caffe.cpp:407] fc7_mbox_conf_perm backward: 0.0239827 ms.
I0921 12:44:59.304085 93668 caffe.cpp:404] fc7_mbox_conf_flat forward: 0.00174016 ms.
I0921 12:44:59.304095 93668 caffe.cpp:407] fc7_mbox_conf_flat backward: 0.0015136 ms.
I0921 12:44:59.304102 93668 caffe.cpp:404] fc7_mbox_priorbox forward: 0.0628819 ms.
I0921 12:44:59.304111 93668 caffe.cpp:407] fc7_mbox_priorbox backward: 0.00152192 ms.
I0921 12:44:59.304124 93668 caffe.cpp:404] conv6_2_mbox_loc forward: 0.122681 ms.
I0921 12:44:59.304133 93668 caffe.cpp:407] conv6_2_mbox_loc backward: 0.0866714 ms.
I0921 12:44:59.304141 93668 caffe.cpp:404] conv6_2_mbox_loc_perm forward: 0.0264058 ms.
I0921 12:44:59.304150 93668 caffe.cpp:407] conv6_2_mbox_loc_perm backward: 0.0120275 ms.
I0921 12:44:59.304158 93668 caffe.cpp:404] conv6_2_mbox_loc_flat forward: 0.00194752 ms.
I0921 12:44:59.304165 93668 caffe.cpp:407] conv6_2_mbox_loc_flat backward: 0.0015392 ms.
I0921 12:44:59.304173 93668 caffe.cpp:404] conv6_2_mbox_conf forward: 0.152012 ms.
I0921 12:44:59.304183 93668 caffe.cpp:407] conv6_2_mbox_conf backward: 0.172544 ms.
I0921 12:44:59.304190 93668 caffe.cpp:404] conv6_2_mbox_conf_perm forward: 0.0295507 ms.
I0921 12:44:59.304198 93668 caffe.cpp:407] conv6_2_mbox_conf_perm backward: 0.013111 ms.
I0921 12:44:59.304205 93668 caffe.cpp:404] conv6_2_mbox_conf_flat forward: 0.00177792 ms.
I0921 12:44:59.304214 93668 caffe.cpp:407] conv6_2_mbox_conf_flat backward: 0.0015392 ms.
I0921 12:44:59.304221 93668 caffe.cpp:404] conv6_2_mbox_priorbox forward: 0.0193306 ms.
I0921 12:44:59.304229 93668 caffe.cpp:407] conv6_2_mbox_priorbox backward: 0.00152256 ms.
I0921 12:44:59.304236 93668 caffe.cpp:404] conv7_2_mbox_loc forward: 0.0842669 ms.
I0921 12:44:59.304244 93668 caffe.cpp:407] conv7_2_mbox_loc backward: 0.057344 ms.
I0921 12:44:59.304253 93668 caffe.cpp:404] conv7_2_mbox_loc_perm forward: 0.0268902 ms.
I0921 12:44:59.304262 93668 caffe.cpp:407] conv7_2_mbox_loc_perm backward: 0.0113158 ms.
I0921 12:44:59.304270 93668 caffe.cpp:404] conv7_2_mbox_loc_flat forward: 0.00173888 ms.
I0921 12:44:59.304278 93668 caffe.cpp:407] conv7_2_mbox_loc_flat backward: 0.00155328 ms.
I0921 12:44:59.304286 93668 caffe.cpp:404] conv7_2_mbox_conf forward: 0.0896371 ms.
I0921 12:44:59.304294 93668 caffe.cpp:407] conv7_2_mbox_conf backward: 0.0863904 ms.
I0921 12:44:59.304302 93668 caffe.cpp:404] conv7_2_mbox_conf_perm forward: 0.0280384 ms.
I0921 12:44:59.304311 93668 caffe.cpp:407] conv7_2_mbox_conf_perm backward: 0.0117581 ms.
I0921 12:44:59.304318 93668 caffe.cpp:404] conv7_2_mbox_conf_flat forward: 0.00170944 ms.
I0921 12:44:59.304327 93668 caffe.cpp:407] conv7_2_mbox_conf_flat backward: 0.00153024 ms.
I0921 12:44:59.304334 93668 caffe.cpp:404] conv7_2_mbox_priorbox forward: 0.00645952 ms.
I0921 12:44:59.304342 93668 caffe.cpp:407] conv7_2_mbox_priorbox backward: 0.00149696 ms.
I0921 12:44:59.304349 93668 caffe.cpp:404] conv8_2_mbox_loc forward: 0.0786605 ms.
I0921 12:44:59.304358 93668 caffe.cpp:407] conv8_2_mbox_loc backward: 0.0562022 ms.
I0921 12:44:59.304364 93668 caffe.cpp:404] conv8_2_mbox_loc_perm forward: 0.0263936 ms.
I0921 12:44:59.304373 93668 caffe.cpp:407] conv8_2_mbox_loc_perm backward: 0.0111635 ms.
I0921 12:44:59.304381 93668 caffe.cpp:404] conv8_2_mbox_loc_flat forward: 0.00170752 ms.
I0921 12:44:59.304390 93668 caffe.cpp:407] conv8_2_mbox_loc_flat backward: 0.00150848 ms.
I0921 12:44:59.304399 93668 caffe.cpp:404] conv8_2_mbox_conf forward: 0.0932525 ms.
I0921 12:44:59.304405 93668 caffe.cpp:407] conv8_2_mbox_conf backward: 0.0670214 ms.
I0921 12:44:59.304414 93668 caffe.cpp:404] conv8_2_mbox_conf_perm forward: 0.0277024 ms.
I0921 12:44:59.304421 93668 caffe.cpp:407] conv8_2_mbox_conf_perm backward: 0.0116134 ms.
I0921 12:44:59.304430 93668 caffe.cpp:404] conv8_2_mbox_conf_flat forward: 0.00179776 ms.
I0921 12:44:59.304437 93668 caffe.cpp:407] conv8_2_mbox_conf_flat backward: 0.0015264 ms.
I0921 12:44:59.304445 93668 caffe.cpp:404] conv8_2_mbox_priorbox forward: 0.0037216 ms.
I0921 12:44:59.304453 93668 caffe.cpp:407] conv8_2_mbox_priorbox backward: 0.00150144 ms.
I0921 12:44:59.304461 93668 caffe.cpp:404] pool6_mbox_loc forward: 0.0778112 ms.
I0921 12:44:59.304469 93668 caffe.cpp:407] pool6_mbox_loc backward: 0.0578381 ms.
I0921 12:44:59.304477 93668 caffe.cpp:404] pool6_mbox_loc_perm forward: 0.0265805 ms.
I0921 12:44:59.304486 93668 caffe.cpp:407] pool6_mbox_loc_perm backward: 0.0115187 ms.
I0921 12:44:59.304493 93668 caffe.cpp:404] pool6_mbox_loc_flat forward: 0.00172224 ms.
I0921 12:44:59.304507 93668 caffe.cpp:407] pool6_mbox_loc_flat backward: 0.00153088 ms.
I0921 12:44:59.304517 93668 caffe.cpp:404] pool6_mbox_conf forward: 0.0918541 ms.
I0921 12:44:59.304524 93668 caffe.cpp:407] pool6_mbox_conf backward: 0.0663642 ms.
I0921 12:44:59.304532 93668 caffe.cpp:404] pool6_mbox_conf_perm forward: 0.0270739 ms.
I0921 12:44:59.304539 93668 caffe.cpp:407] pool6_mbox_conf_perm backward: 0.0114406 ms.
I0921 12:44:59.304548 93668 caffe.cpp:404] pool6_mbox_conf_flat forward: 0.00165056 ms.
I0921 12:44:59.304555 93668 caffe.cpp:407] pool6_mbox_conf_flat backward: 0.0015648 ms.
I0921 12:44:59.304563 93668 caffe.cpp:404] pool6_mbox_priorbox forward: 0.00205952 ms.
I0921 12:44:59.304571 93668 caffe.cpp:407] pool6_mbox_priorbox backward: 0.00151936 ms.
I0921 12:44:59.304580 93668 caffe.cpp:404] mbox_loc forward: 0.0306758 ms.
I0921 12:44:59.304589 93668 caffe.cpp:407] mbox_loc backward: 0.00180864 ms.
I0921 12:44:59.304596 93668 caffe.cpp:404] mbox_conf forward: 0.0345062 ms.
I0921 12:44:59.304605 93668 caffe.cpp:407] mbox_conf backward: 0.0105939 ms.
I0921 12:44:59.304613 93668 caffe.cpp:404] mbox_priorbox forward: 0.104519 ms.
I0921 12:44:59.304620 93668 caffe.cpp:407] mbox_priorbox backward: 0.00199936 ms.
I0921 12:44:59.304630 93668 caffe.cpp:404] mbox_conf_reshape forward: 0.00204288 ms.
I0921 12:44:59.304636 93668 caffe.cpp:407] mbox_conf_reshape backward: 0.00153408 ms.
I0921 12:44:59.304644 93668 caffe.cpp:404] mbox_conf_softmax forward: 0.179105 ms.
I0921 12:44:59.304652 93668 caffe.cpp:407] mbox_conf_softmax backward: 0.00152384 ms.
I0921 12:44:59.304659 93668 caffe.cpp:404] mbox_conf_flatten forward: 0.00178944 ms.
I0921 12:44:59.304668 93668 caffe.cpp:407] mbox_conf_flatten backward: 0.00140544 ms.
I0921 12:44:59.304689 93668 caffe.cpp:412] Average Forward pass: 16.7595 ms.
I0921 12:44:59.304698 93668 caffe.cpp:414] Average Backward pass: 23.0653 ms.
I0921 12:44:59.304708 93668 caffe.cpp:416] Average Forward-Backward: 39.9146 ms.
I0921 12:44:59.304715 93668 caffe.cpp:418] Total Time: 1995.73 ms.
I0921 12:44:59.304721 93668 caffe.cpp:419] *** Benchmark ends ***

@jzwang1
Copy link

jzwang1 commented Sep 21, 2016

Thanks a lot! @weiliu89

@weiliu89
Copy link
Owner

The time above is using cuDNN v5. Using v4 has the following timing:

I0921 12:51:19.585435 101116 caffe.cpp:412] Average Forward pass: 19.9982 ms.
I0921 12:51:19.585444 101116 caffe.cpp:414] Average Backward pass: 23.1922 ms.
I0921 12:51:19.585451 101116 caffe.cpp:416] Average Forward-Backward: 43.2689 ms.
I0921 12:51:19.585459 101116 caffe.cpp:418] Total Time: 2163.45 ms.
I0921 12:51:19.585465 101116 caffe.cpp:419] *** Benchmark ends ***

ssd_pascal_speed.py process images with batch size of 8, which is slightly faster.

@jzwang1
Copy link

jzwang1 commented Sep 21, 2016

I see. Thank you for the statistics. I will investigate more.

@gurkirt
Copy link

gurkirt commented Oct 14, 2016

I get backward faster than forward, what am I doing wrong

I1014 18:31:08.736440 31775 caffe.cpp:404] mbox_priorbox forward: 0.100224 ms.
I1014 18:31:08.736449 31775 caffe.cpp:407] mbox_priorbox backward: 0.0019456 ms.
I1014 18:31:08.736459 31775 caffe.cpp:404] mbox_conf_reshape forward: 0.00163328 ms.
I1014 18:31:08.736466 31775 caffe.cpp:407] mbox_conf_reshape backward: 0.00142784 ms.
I1014 18:31:08.736474 31775 caffe.cpp:404] mbox_conf_softmax forward: 0.0716787 ms.
I1014 18:31:08.736485 31775 caffe.cpp:407] mbox_conf_softmax backward: 0.0588646 ms.
I1014 18:31:08.736492 31775 caffe.cpp:404] mbox_conf_flatten forward: 0.00151168 ms.
I1014 18:31:08.736500 31775 caffe.cpp:407] mbox_conf_flatten backward: 0.00131328 ms.
I1014 18:31:08.736523 31775 caffe.cpp:412] Average Forward pass: 26.6667 ms.
I1014 18:31:08.736532 31775 caffe.cpp:414] Average Backward pass: 14.2217 ms.
I1014 18:31:08.736541 31775 caffe.cpp:416] Average Forward-Backward: 40.97 ms.
I1014 18:31:08.736551 31775 caffe.cpp:418] Total Time: 2048.5 ms.
I1014 18:31:08.736557 31775 caffe.cpp:419] *** Benchmark ends **

@gurkirt
Copy link

gurkirt commented Oct 14, 2016

It turns out that I was not using cuDNN with v5

I get your numbers.

I1014 19:37:53.356597 19125 caffe.cpp:412] Average Forward pass: 17.2102 ms.
I1014 19:37:53.356611 19125 caffe.cpp:414] Average Backward pass: 23.4602 ms.
I1014 19:37:53.356621 19125 caffe.cpp:416] Average Forward-Backward: 40.7575 ms.
I1014 19:37:53.356631 19125 caffe.cpp:418] Total Time: 2037.87 ms.
I1014 19:37:53.356637 19125 caffe.cpp:419] *** Benchmark ends ***

But it's weird that backward pass in faster than forward pass without cuDNN

@mxmxlwlw
Copy link

mxmxlwlw commented Nov 6, 2017

Hi @weiliu89 ,
Why my training is too slow?
Here's my network

name: "VGG_VOC0712_SSD_100x100_train"
layer {
  name: "data"
  type: "AnnotatedData"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    mirror: true
    mean_value: 127.5
    mean_value: 127.5
    mean_value: 127.5
    scale: 0.0078125
    resize_param {
      prob: 1.0
      resize_mode: WARP
      height: 100
      width: 100
      interp_mode: LINEAR
      interp_mode: AREA
      interp_mode: NEAREST
      interp_mode: CUBIC
      interp_mode: LANCZOS4
    }
    emit_constraint {
      emit_type: CENTER
    }
    distort_param {
      brightness_prob: 0.5
      brightness_delta: 32.0
      contrast_prob: 0.5
      contrast_lower: 0.5
      contrast_upper: 1.5
      hue_prob: 0.5
      hue_delta: 18.0
      saturation_prob: 0.5
      saturation_lower: 0.5
      saturation_upper: 1.5
      random_order_prob: 0.0
    }
    expand_param {
      prob: 0.5
      max_expand_ratio: 4.0
    }
  }
  data_param {
    source: "/media/em/data/ssd-renlian/renlian/lmdb/trainval_lmdb"
    batch_size: 64
    backend: LMDB
  }
  annotated_data_param {
    batch_sampler {
      max_sample: 1
      max_trials: 1
    }
    batch_sampler {
      sampler {
        min_scale: 0.300000011921
        max_scale: 1.0
        min_aspect_ratio: 0.5
        max_aspect_ratio: 2.0
      }
      sample_constraint {
        min_jaccard_overlap: 0.10000000149
      }
      max_sample: 1
      max_trials: 50
    }
    batch_sampler {
      sampler {
        min_scale: 0.300000011921
        max_scale: 1.0
        min_aspect_ratio: 0.5
        max_aspect_ratio: 2.0
      }
      sample_constraint {
        min_jaccard_overlap: 0.300000011921
      }
      max_sample: 1
      max_trials: 50
    }
    batch_sampler {
      sampler {
        min_scale: 0.300000011921
        max_scale: 1.0
        min_aspect_ratio: 0.5
        max_aspect_ratio: 2.0
      }
      sample_constraint {
        min_jaccard_overlap: 0.5
      }
      max_sample: 1
      max_trials: 50
    }
    batch_sampler {
      sampler {
        min_scale: 0.300000011921
        max_scale: 1.0
        min_aspect_ratio: 0.5
        max_aspect_ratio: 2.0
      }
      sample_constraint {
        min_jaccard_overlap: 0.699999988079
      }
      max_sample: 1
      max_trials: 50
    }
    batch_sampler {
      sampler {
        min_scale: 0.300000011921
        max_scale: 1.0
        min_aspect_ratio: 0.5
        max_aspect_ratio: 2.0
      }
      sample_constraint {
        min_jaccard_overlap: 0.899999976158
      }
      max_sample: 1
      max_trials: 50
    }
    batch_sampler {
      sampler {
        min_scale: 0.300000011921
        max_scale: 1.0
        min_aspect_ratio: 0.5
        max_aspect_ratio: 2.0
      }
      sample_constraint {
        max_jaccard_overlap: 1.0
      }
      max_sample: 1
      max_trials: 50
    }
    label_map_file: "/media/d/datasets/face_dt_jiakao_combine/renlian/lmdb/renlian_test_lmdb"
  }
}

layer {
  name: "conv1_1"
  type: "Convolution"
  bottom: "data"
  top: "conv1_1"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 64
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "relu1_1"
  type: "PReLU"
  bottom: "conv1_1"
  top: "conv1_1"
}

layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1_1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv2_1"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2_1"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 128
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "relu2_1"
  type: "PReLU"
  bottom: "conv2_1"
  top: "conv2_1"
}

layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2_1"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv3_1"
  type: "Convolution"
  bottom: "pool2"
  top: "conv3_1"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 128
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "relu3_1"
  type: "PReLU"
  bottom: "conv3_1"
  top: "conv3_1"
}

layer {
  name: "pool3"
  type: "Pooling"
  bottom: "conv3_1"
  top: "pool3"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}


layer {
  name: "conv4_3"
  type: "Convolution"
  bottom: "pool3"
  top: "conv4_3"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 128
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "relu4_3"
  type: "PReLU"
  bottom: "conv4_3"
  top: "conv4_3"
}


layer {
  name: "conv4_3_norm_mbox_loc_1"
  type: "Convolution"
  bottom: "conv4_3"
  top: "conv4_3_norm_mbox_loc"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 48
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "conv4_3_norm_mbox_loc_perm_1"
  type: "Permute"
  bottom: "conv4_3_norm_mbox_loc"
  top: "conv4_3_norm_mbox_loc_perm"
  permute_param {
    order: 0
    order: 2
    order: 3
    order: 1
  }
}
layer {
  name: "conv4_3_norm_mbox_loc_flat_1"
  type: "Flatten"
  bottom: "conv4_3_norm_mbox_loc_perm"
  top: "conv4_3_norm_mbox_loc_flat"
  flatten_param {
    axis: 1
  }
}
layer {
  name: "conv4_3_norm_mbox_conf_1"
  type: "Convolution"
  bottom: "conv4_3"
  top: "conv4_3_norm_mbox_conf"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 24
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "conv4_3_norm_mbox_conf_perm_1"
  type: "Permute"
  bottom: "conv4_3_norm_mbox_conf"
  top: "conv4_3_norm_mbox_conf_perm"
  permute_param {
    order: 0
    order: 2
    order: 3
    order: 1
  }
}
layer {
  name: "conv4_3_norm_mbox_conf_flat_1"
  type: "Flatten"
  bottom: "conv4_3_norm_mbox_conf_perm"
  top: "conv4_3_norm_mbox_conf_flat"
  flatten_param {
    axis: 1
  }
}
layer {
  name: "conv4_3_norm_mbox_priorbox_1"
  type: "PriorBox"
  bottom: "conv4_3"
  bottom: "data"
  top: "conv4_3_norm_mbox_priorbox"
  prior_box_param {
    min_size: 20.0
    min_size: 40.0
    max_size: 60.0
    max_size: 80.0
    aspect_ratio: 2.0
    aspect_ratio: 3.0
    flip: true
    clip: false
    variance: 0.10000000149
    variance: 0.10000000149
    variance: 0.20000000298
    variance: 0.20000000298
    offset: 0.5
  }
}

layer {
  name: "mbox_loss_1"
  type: "MultiBoxLoss"
  bottom: "conv4_3_norm_mbox_loc_flat"
  bottom: "conv4_3_norm_mbox_conf_flat"
  bottom: "conv4_3_norm_mbox_priorbox"
  bottom: "label"
  top: "mbox_loss"
  include {
    phase: TRAIN
  }
  propagate_down: true
  propagate_down: true
  propagate_down: false
  propagate_down: false
  loss_param {
    normalization: VALID
  }
  multibox_loss_param {
    loc_loss_type: SMOOTH_L1
    conf_loss_type: SOFTMAX
    loc_weight: 1.0
    num_classes: 2
    share_location: true
    match_type: PER_PREDICTION
    overlap_threshold: 0.5
    use_prior_for_matching: true
    background_label_id: 0
    use_difficult_gt: true
    neg_pos_ratio: 3.0
    neg_overlap: 0.5
    code_type: CENTER_SIZE
    ignore_cross_boundary_bbox: false
    mining_type: MAX_NEGATIVE
  }
}

here's the time test without cudnn5:

I1106 11:17:03.249493 31700 caffe.cpp:398] Iteration: 1 forward-backward time: 8281 ms.
I1106 11:17:11.043820 31700 caffe.cpp:398] Iteration: 2 forward-backward time: 7794 ms.
I1106 11:17:19.165422 31700 caffe.cpp:398] Iteration: 3 forward-backward time: 8121 ms.
I1106 11:17:27.455461 31700 caffe.cpp:398] Iteration: 4 forward-backward time: 8290 ms.
I1106 11:17:35.354444 31700 caffe.cpp:398] Iteration: 5 forward-backward time: 7898 ms.
I1106 11:17:43.515108 31700 caffe.cpp:398] Iteration: 6 forward-backward time: 8160 ms.
I1106 11:17:51.328234 31700 caffe.cpp:398] Iteration: 7 forward-backward time: 7812 ms.
I1106 11:17:59.333545 31700 caffe.cpp:398] Iteration: 8 forward-backward time: 8005 ms.
I1106 11:18:07.743176 31700 caffe.cpp:398] Iteration: 9 forward-backward time: 8409 ms.
I1106 11:18:16.014807 31700 caffe.cpp:398] Iteration: 10 forward-backward time: 8271 ms.
I1106 11:18:16.014977 31700 caffe.cpp:401] Average time per layer: 
I1106 11:18:16.014982 31700 caffe.cpp:404]       data	forward: 1.2343 ms.
I1106 11:18:16.014987 31700 caffe.cpp:407]       data	backward: 0.0008 ms.
I1106 11:18:16.014991 31700 caffe.cpp:404] data_data_0_split	forward: 0.0015 ms.
I1106 11:18:16.014994 31700 caffe.cpp:407] data_data_0_split	backward: 0.001 ms.
I1106 11:18:16.014998 31700 caffe.cpp:404]    conv1_1	forward: 385.633 ms.
I1106 11:18:16.015002 31700 caffe.cpp:407]    conv1_1	backward: 385.433 ms.
I1106 11:18:16.015005 31700 caffe.cpp:404]    relu1_1	forward: 334.749 ms.
I1106 11:18:16.015009 31700 caffe.cpp:407]    relu1_1	backward: 678.403 ms.
I1106 11:18:16.015012 31700 caffe.cpp:404]      pool1	forward: 168.716 ms.
I1106 11:18:16.015015 31700 caffe.cpp:407]      pool1	backward: 38.0806 ms.
I1106 11:18:16.015018 31700 caffe.cpp:404]    conv2_1	forward: 1020.33 ms.
I1106 11:18:16.015022 31700 caffe.cpp:407]    conv2_1	backward: 1888.71 ms.
I1106 11:18:16.015025 31700 caffe.cpp:404]    relu2_1	forward: 170.07 ms.
I1106 11:18:16.015028 31700 caffe.cpp:407]    relu2_1	backward: 345.836 ms.
I1106 11:18:16.015031 31700 caffe.cpp:404]      pool2	forward: 87.9664 ms.
I1106 11:18:16.015034 31700 caffe.cpp:407]      pool2	backward: 20.5241 ms.
I1106 11:18:16.015038 31700 caffe.cpp:404]    conv3_1	forward: 494.114 ms.
I1106 11:18:16.015041 31700 caffe.cpp:407]    conv3_1	backward: 957.19 ms.
I1106 11:18:16.015044 31700 caffe.cpp:404]    relu3_1	forward: 41.5649 ms.
I1106 11:18:16.015048 31700 caffe.cpp:407]    relu3_1	backward: 77.7996 ms.
I1106 11:18:16.015051 31700 caffe.cpp:404]      pool3	forward: 25.1124 ms.
I1106 11:18:16.015055 31700 caffe.cpp:407]      pool3	backward: 5.3525 ms.
I1106 11:18:16.015058 31700 caffe.cpp:404]    conv4_3	forward: 182.764 ms.
I1106 11:18:16.015063 31700 caffe.cpp:407]    conv4_3	backward: 356.156 ms.
I1106 11:18:16.015065 31700 caffe.cpp:404]    relu4_3	forward: 11.7312 ms.
I1106 11:18:16.015069 31700 caffe.cpp:407]    relu4_3	backward: 22.5605 ms.
I1106 11:18:16.015072 31700 caffe.cpp:404] conv4_3_relu4_3_0_split	forward: 0.002 ms.
I1106 11:18:16.015075 31700 caffe.cpp:407] conv4_3_relu4_3_0_split	backward: 3.2887 ms.
I1106 11:18:16.015079 31700 caffe.cpp:404] conv4_3_norm_mbox_loc_1	forward: 80.8785 ms.
I1106 11:18:16.015082 31700 caffe.cpp:407] conv4_3_norm_mbox_loc_1	backward: 148.878 ms.
I1106 11:18:16.015085 31700 caffe.cpp:404] conv4_3_norm_mbox_loc_perm_1	forward: 7.2887 ms.
I1106 11:18:16.015089 31700 caffe.cpp:407] conv4_3_norm_mbox_loc_perm_1	backward: 6.8734 ms.
I1106 11:18:16.015091 31700 caffe.cpp:404] conv4_3_norm_mbox_loc_flat_1	forward: 0.0025 ms.
I1106 11:18:16.015095 31700 caffe.cpp:407] conv4_3_norm_mbox_loc_flat_1	backward: 0.0014 ms.
I1106 11:18:16.015099 31700 caffe.cpp:404] conv4_3_norm_mbox_conf_1	forward: 48.3151 ms.
I1106 11:18:16.015102 31700 caffe.cpp:407] conv4_3_norm_mbox_conf_1	backward: 91.9961 ms.
I1106 11:18:16.015106 31700 caffe.cpp:404] conv4_3_norm_mbox_conf_perm_1	forward: 3.5013 ms.
I1106 11:18:16.015110 31700 caffe.cpp:407] conv4_3_norm_mbox_conf_perm_1	backward: 3.4832 ms.
I1106 11:18:16.015112 31700 caffe.cpp:404] conv4_3_norm_mbox_conf_flat_1	forward: 0.0015 ms.
I1106 11:18:16.015116 31700 caffe.cpp:407] conv4_3_norm_mbox_conf_flat_1	backward: 0.0009 ms.
I1106 11:18:16.015120 31700 caffe.cpp:404] conv4_3_norm_mbox_priorbox_1	forward: 0.0262 ms.
I1106 11:18:16.015122 31700 caffe.cpp:407] conv4_3_norm_mbox_priorbox_1	backward: 0.0004 ms.
I1106 11:18:16.015126 31700 caffe.cpp:404] mbox_loss_1	forward: 9.7352 ms.
I1106 11:18:16.015130 31700 caffe.cpp:407] mbox_loss_1	backward: 0.2464 ms.
I1106 11:18:16.015136 31700 caffe.cpp:412] Average Forward pass: 3073.79 ms.
I1106 11:18:16.015138 31700 caffe.cpp:414] Average Backward pass: 5030.86 ms.
I1106 11:18:16.015142 31700 caffe.cpp:416] Average Forward-Backward: 8104.7 ms.
I1106 11:18:16.015146 31700 caffe.cpp:418] Total Time: 81047 ms.
I1106 11:18:16.015149 31700 caffe.cpp:419] *** Benchmark ends ***

Here's my time test with cudnn5:


I1106 11:20:06.031409 32010 caffe.cpp:369] *** Benchmark begins ***
I1106 11:20:06.031430 32010 caffe.cpp:370] Testing for 10 iterations.
I1106 11:20:13.964776 32010 caffe.cpp:398] Iteration: 1 forward-backward time: 7933 ms.
I1106 11:20:21.775849 32010 caffe.cpp:398] Iteration: 2 forward-backward time: 7811 ms.
I1106 11:20:29.520659 32010 caffe.cpp:398] Iteration: 3 forward-backward time: 7744 ms.
I1106 11:20:37.396687 32010 caffe.cpp:398] Iteration: 4 forward-backward time: 7875 ms.
I1106 11:20:45.607374 32010 caffe.cpp:398] Iteration: 5 forward-backward time: 8210 ms.
I1106 11:20:53.685642 32010 caffe.cpp:398] Iteration: 6 forward-backward time: 8078 ms.
I1106 11:21:01.820922 32010 caffe.cpp:398] Iteration: 7 forward-backward time: 8135 ms.
I1106 11:21:10.000944 32010 caffe.cpp:398] Iteration: 8 forward-backward time: 8179 ms.
I1106 11:21:17.793071 32010 caffe.cpp:398] Iteration: 9 forward-backward time: 7792 ms.
I1106 11:21:25.497119 32010 caffe.cpp:398] Iteration: 10 forward-backward time: 7704 ms.
I1106 11:21:25.497313 32010 caffe.cpp:401] Average time per layer: 
I1106 11:21:25.497321 32010 caffe.cpp:404]       data	forward: 1.2612 ms.
I1106 11:21:25.497328 32010 caffe.cpp:407]       data	backward: 0.0008 ms.
I1106 11:21:25.497334 32010 caffe.cpp:404] data_data_0_split	forward: 0.0013 ms.
I1106 11:21:25.497337 32010 caffe.cpp:407] data_data_0_split	backward: 0.001 ms.
I1106 11:21:25.497342 32010 caffe.cpp:404]    conv1_1	forward: 396.525 ms.
I1106 11:21:25.497347 32010 caffe.cpp:407]    conv1_1	backward: 375.332 ms.
I1106 11:21:25.497352 32010 caffe.cpp:404]    relu1_1	forward: 334.79 ms.
I1106 11:21:25.497356 32010 caffe.cpp:407]    relu1_1	backward: 613.277 ms.
I1106 11:21:25.497361 32010 caffe.cpp:404]      pool1	forward: 172.288 ms.
I1106 11:21:25.497366 32010 caffe.cpp:407]      pool1	backward: 38.682 ms.
I1106 11:21:25.497370 32010 caffe.cpp:404]    conv2_1	forward: 1015.34 ms.
I1106 11:21:25.497375 32010 caffe.cpp:407]    conv2_1	backward: 1833.98 ms.
I1106 11:21:25.497380 32010 caffe.cpp:404]    relu2_1	forward: 165.413 ms.
I1106 11:21:25.497383 32010 caffe.cpp:407]    relu2_1	backward: 308.408 ms.
I1106 11:21:25.497388 32010 caffe.cpp:404]      pool2	forward: 87.3854 ms.
I1106 11:21:25.497392 32010 caffe.cpp:407]      pool2	backward: 20.7159 ms.
I1106 11:21:25.497398 32010 caffe.cpp:404]    conv3_1	forward: 498.007 ms.
I1106 11:21:25.497402 32010 caffe.cpp:407]    conv3_1	backward: 945.563 ms.
I1106 11:21:25.497407 32010 caffe.cpp:404]    relu3_1	forward: 44.0054 ms.
I1106 11:21:25.497411 32010 caffe.cpp:407]    relu3_1	backward: 79.7896 ms.
I1106 11:21:25.497416 32010 caffe.cpp:404]      pool3	forward: 25.31 ms.
I1106 11:21:25.497421 32010 caffe.cpp:407]      pool3	backward: 5.6864 ms.
I1106 11:21:25.497426 32010 caffe.cpp:404]    conv4_3	forward: 189.265 ms.
I1106 11:21:25.497429 32010 caffe.cpp:407]    conv4_3	backward: 355.1 ms.
I1106 11:21:25.497434 32010 caffe.cpp:404]    relu4_3	forward: 11.5027 ms.
I1106 11:21:25.497438 32010 caffe.cpp:407]    relu4_3	backward: 21.9205 ms.
I1106 11:21:25.497442 32010 caffe.cpp:404] conv4_3_relu4_3_0_split	forward: 0.0023 ms.
I1106 11:21:25.497447 32010 caffe.cpp:407] conv4_3_relu4_3_0_split	backward: 3.3882 ms.
I1106 11:21:25.497452 32010 caffe.cpp:404] conv4_3_norm_mbox_loc_1	forward: 80.9309 ms.
I1106 11:21:25.497457 32010 caffe.cpp:407] conv4_3_norm_mbox_loc_1	backward: 148.428 ms.
I1106 11:21:25.497462 32010 caffe.cpp:404] conv4_3_norm_mbox_loc_perm_1	forward: 7.451 ms.
I1106 11:21:25.497465 32010 caffe.cpp:407] conv4_3_norm_mbox_loc_perm_1	backward: 7.0047 ms.
I1106 11:21:25.497470 32010 caffe.cpp:404] conv4_3_norm_mbox_loc_flat_1	forward: 0.0025 ms.
I1106 11:21:25.497474 32010 caffe.cpp:407] conv4_3_norm_mbox_loc_flat_1	backward: 0.0012 ms.
I1106 11:21:25.497478 32010 caffe.cpp:404] conv4_3_norm_mbox_conf_1	forward: 48.01 ms.
I1106 11:21:25.497483 32010 caffe.cpp:407] conv4_3_norm_mbox_conf_1	backward: 94.6241 ms.
I1106 11:21:25.497489 32010 caffe.cpp:404] conv4_3_norm_mbox_conf_perm_1	forward: 3.673 ms.
I1106 11:21:25.497493 32010 caffe.cpp:407] conv4_3_norm_mbox_conf_perm_1	backward: 3.5457 ms.
I1106 11:21:25.497498 32010 caffe.cpp:404] conv4_3_norm_mbox_conf_flat_1	forward: 0.0022 ms.
I1106 11:21:25.497501 32010 caffe.cpp:407] conv4_3_norm_mbox_conf_flat_1	backward: 0.0009 ms.
I1106 11:21:25.497504 32010 caffe.cpp:404] conv4_3_norm_mbox_priorbox_1	forward: 0.0264 ms.
I1106 11:21:25.497509 32010 caffe.cpp:407] conv4_3_norm_mbox_priorbox_1	backward: 0.0008 ms.
I1106 11:21:25.497512 32010 caffe.cpp:404] mbox_loss_1	forward: 9.5271 ms.
I1106 11:21:25.497515 32010 caffe.cpp:407] mbox_loss_1	backward: 0.2283 ms.
I1106 11:21:25.497521 32010 caffe.cpp:412] Average Forward pass: 3090.78 ms.
I1106 11:21:25.497524 32010 caffe.cpp:414] Average Backward pass: 4855.73 ms.
I1106 11:21:25.497527 32010 caffe.cpp:416] Average Forward-Backward: 7946.6 ms.
I1106 11:21:25.497529 32010 caffe.cpp:418] Total Time: 79466 ms.
I1106 11:21:25.497532 32010 caffe.cpp:419] *** Benchmark ends ***

My relu type is PRelu, so it spend some time. And when I use nvidia-smi -lms, I find that the gpu doesn't run in most of time. It usually waits a lot of time and than hit some level of usage at one time, and wait again!

@ujsyehao
Copy link

@CHUNYUWANG Hi, I use ssd_detect.py and get 16 fps on GTX 1080ti, so which graphics card do you use? I have submitted a new issue about the speed problem #832

@HHY-ZJU
Copy link

HHY-ZJU commented Sep 10, 2018

@mxmxlwlw hi,did you solve it? i have the same problem now,and i don't know why it is toooo slow,about 6 fps on TITAN V...thanks!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants