-
Notifications
You must be signed in to change notification settings - Fork 18.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Identify the critical parts of computation time in GPU mode #102
Comments
The speed reported in the decaf paper is single image only. With batches, Yangqing On Wed, Feb 12, 2014 at 9:15 PM, kloudkl notifications@github.com wrote:
|
This explains why the convolutional layers dominate the training time. |
cuDNN and pooling fixes
There are three motivations to do this.
First, pull #99 referenced the benchmark results of pull #85. As noted in the latter, the experiments conducted in GPU mode was not very accurate because the batch size is set to 1 due to limited memory of the GPU in question. This severely reduced the data throughput and probably distorted the layer wise distribution of computation time. To make more fair comparisons, new benchmark should use devices with bigger memory.
The second objective is to compare and analyze the distributions of computation time of Caffe[1] and DeCAF[2]. During training on the ImageNet dataset, nearly 60% percent of the computation time was spent on the last three fully connected layers in DeCAF which can only run CPU. It is not necessarily the case for Caffe especially in GPU mode.
The third more practical purpose is to help future optimization efforts avoid the root of all evil (#81). This relates to the first motivation and is of the greatest value among the three.
[1] Yangqing Jia. Caffe: An Open Source Convolutional Architecture for Fast Feature Embedding. http://caffe.berkeleyvision.org/. 2013.
[2] Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, Trevor Darrell. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. arXiv:1310.1531 [cs.CV]. 2013.
The text was updated successfully, but these errors were encountered: