Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use cuDNN routine FindEx to find the best algorithm. #158

Merged
merged 1 commit into from
Jun 8, 2016

Conversation

pooyadavoodi
Copy link

FindEx is more stable than Get (heuristic-based) because it runs all the available algorithms and sorts them according to their speed.

Both Get and FindEx are supported now and can be specified through the definition of each layer in prototxt.

@@ -146,7 +149,6 @@ void CuDNNConvolutionLayer<Dtype>::Backward_gpu(const vector<Blob<Dtype>*>& top,
// NOLINT_NEXT_LINE(whitespace/operators)
CUDA_CHECK(cudaStreamSynchronize(cudaStreamLegacy));
}
++backward_passed_ctr_;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to protect overflow.

@pooyadavoodi pooyadavoodi force-pushed the caffe-0.15 branch 4 times, most recently from af6772c to 4248b1f Compare June 8, 2016 03:26
FindEx is more stable than Get (heuristic-based) because it runs all the available algorithms and sorts them according to their speed.
Both Get and FindEx are supported now and can be specified through the definition of each layer in prototxt.
In Reshape, check whether shape of bottom and convolution descriptors have changed.
In caffe time, do multiple (instead of one) fwd/bwd pass in the initilization phase.
This is crucial because FindEx is executed in the first iterations and it takes quite a long time.
@drnikolaev
Copy link

drnikolaev commented Jun 8, 2016

Travis build is green, merging.

@drnikolaev drnikolaev merged commit 4ff8e07 into NVIDIA:caffe-0.15 Jun 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants