Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caffe OpenCL support #2610

Closed
wants to merge 405 commits into from
Closed

Caffe OpenCL support #2610

wants to merge 405 commits into from

Conversation

naibaf7
Copy link
Member

@naibaf7 naibaf7 commented Jun 16, 2015

DISCONTINUED, now available as official Caffe branch here: https://github.com/BVLC/caffe/tree/opencl

Technical Report

Available on arXiv:
http://arxiv.org/abs/1509.03371

@@ -29,11 +31,11 @@ include(cmake/Dependencies.cmake)

# ---[ Flags
if(UNIX OR APPLE)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fPIC -Wall")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fPIC -Wall -std=c++11")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ty for the big contribution 👍 .

Would -std=c++0x work? I think travis runs GCC 4.6 and only GCC 4.7 and later support -std=c++11.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: it's trivial to install a newer toolchain on Travis, see krasin/CuraEngine@976b373

@bhack
Copy link
Contributor

bhack commented Jun 17, 2015

@Bigene At #2537 I'm already building on Travis with GCC 4.8.x

@bhack
Copy link
Contributor

bhack commented Jun 17, 2015

@naibaf7 Great effort! @shelhamer What is your opinion?

@bhack bhack mentioned this pull request Jun 17, 2015
@bhack
Copy link
Contributor

bhack commented Jun 17, 2015

@naibaf7 Please take something from the Travis script I've modified in #2537 to get c++11 support

@naibaf7
Copy link
Member Author

naibaf7 commented Jun 17, 2015

@bhack
Which parts exactly are needed? I am not very used to Travis CI stuff... what else is needed there to get the Travis CI build working?
Help is appreciated :)

@bhack
Copy link
Contributor

bhack commented Jun 17, 2015

@naibaf7 Take and adapt c++11 and g++ version changes in Makefile CmakeList and travis_install.sh that you see at https://github.com/BVLC/caffe/pull/2537/files

@BlGene
Copy link
Contributor

BlGene commented Jun 17, 2015

@naibaf7 Try copying over scripts/travis/travis_install.sh from #2537 .

@bhack
Copy link
Contributor

bhack commented Jun 18, 2015

Remove libboost-python1.54-dev. It is for my pull request.

@bhack
Copy link
Contributor

bhack commented Jun 18, 2015

When this will build fine we want to extend Travis build matrix for switching opencl on and off and test different vendors (nvidia, intel, amd)?

@naibaf7
Copy link
Member Author

naibaf7 commented Jun 18, 2015

@bhack
That's the plan. But I don't know yet how Travis will handle OpenCL stuff and how we can get ViennaCL and an opencl-BLAS onto it (probably not hard, but the scripting has to be done and work fine).
It's not my top priority at the moment, but contributions to it would happily be merged!

@bhack
Copy link
Contributor

bhack commented Jun 18, 2015

@naibaf7 Ok let this build correctly first :) .

@bhack
Copy link
Contributor

bhack commented Jun 18, 2015

Remember also to run make lint locally before pushing.
You could also use a git hook to automatize this.
Note: I've update the link.

@jyegerlehner
Copy link
Contributor

@naibaf7 Thank you for the nice work.
I posted benchmark comparison over in the other PR thread.

@bhack
Copy link
Contributor

bhack commented Jun 19, 2015

Travis is still stalled on test compilation fail.

@bhack
Copy link
Contributor

bhack commented Jun 20, 2015

@naibaf7 Do you see any prospective in this opencv/opencv#4072 for interacting with opencv accelerated image transformation routines on training?

@naibaf7 naibaf7 closed this Jun 20, 2015
@naibaf7 naibaf7 reopened this Jun 20, 2015
@naibaf7
Copy link
Member Author

naibaf7 commented Jun 20, 2015

@bhack
actually yes, but attaching the same Opencl context is not absolutely required. It does not hurt to open a second one.
Or how did you think of using it? As Opencl data transformer layer implementation?

@bhack
Copy link
Contributor

bhack commented Jun 20, 2015

Yes as opencl data transformation layer. Here an old not opencl related discussion #569. Other transformations PRs are still open if you check around.

@naibaf7
Copy link
Member Author

naibaf7 commented Jun 21, 2015

I can't get Travis CI to build correctly.
Among other things, this seems to be one of the main issues:
https://devtalk.nvidia.com/default/topic/787442/nvcc-compilation-invalid-qualifiers-on-non-member-function/

Any help appreciated. Issue is that it builds correctly on all our test systems (including Fedora 21, 22 and Ubuntu 13.04, 14.04). But not on Travis (Ubuntu 12.04?).

@bhack
Copy link
Contributor

bhack commented Jun 21, 2015

The first error on the first build I see on the Travis build matrix it is in linking phase:

.build_release/lib/libcaffe.so: undefined reference to `caffe::Caffe::GetDeviceContext(int)'

@naibaf7
Copy link
Member Author

naibaf7 commented Jun 21, 2015

@bhack
Yes, but it does not happen on my systems. Which makes me wonder if I am missing something obvious, or what is happening on the Travis CI?
Caffe::GetDeviceContext(int) and Caffe::GetDefaultDeviceContext() are defined in common.hpp and implemented in common.cpp

Also note that the CUDA builds have further issues with CHECK of glog, as mentioned above. If g++ is 4.8.1 (buggy) with the Ubuntu PPA, then that's bad. g++ 4.8.2 or higher is required.

@bhack
Copy link
Contributor

bhack commented Jun 21, 2015

There is also g++-4.9 in the same ppa

@naibaf7
Copy link
Member Author

naibaf7 commented Jun 21, 2015

@bhack
Ok thanks, all is well now :)

@bhack
Copy link
Contributor

bhack commented Jun 21, 2015

Finally :) I'm a little bit worried that with this and PR on dependencies modularization we will income in a sort of preprocessor abuse syndrome.

@naibaf7
Copy link
Member Author

naibaf7 commented Jun 21, 2015

Yes but don't worry. I already have plans going on for the complete device abstraction. That will just about remove most ugly preprocessor macros. And most of the additional code in the layers as well. But some of it is not avoidable if you want to keep the dependencies low for CUDA and CPU_ONLY users and keep OpenCL users from having to install CUDA.
So it will be great :)
I just had to move forward the side-by-side implementation now as a proof of concept.

If anyone has time to look into fixing CMAKE and Travis to do the OpenCL builds as well, that would be cool.

@bhack
Copy link
Contributor

bhack commented Jun 21, 2015

What do you need in Cmake? Surely also @Nerei could give us some support if he is available.

@Nerei
Copy link

Nerei commented Jun 22, 2015

I don't think a lot of changes in cmake required for OpenCL. For instance, OpenCV only provides opencl headers in 3rdparty subfolder, and uses OpenCL via dlopen/LoadLibrary().

@bhack
Copy link
Contributor

bhack commented Jun 22, 2015

@cypof What do you think of this in prospective of #2114. What kind of merge conflict will need to be managed?

@bhack
Copy link
Contributor

bhack commented Jan 19, 2016

" We believe that AMD has an "unofficial" (not, at least yet, integrated into the main Caffe build) OpenCL version of Caffe available, targeting the company's CPUs, GPUs and integrated APUs". We belive? Are we on X-Files?

@naibaf7
Copy link
Member Author

naibaf7 commented Jan 19, 2016

@bhack
Are you refering to http://www.embedded-vision.com/industry-analysis/technical-articles/caffe-deep-learning-framework-interview-core-developers ?
The problem is really just convolution efficiency. With both Intel's and AMD's and my approach. Not having a counterpart to cuDNN to back the OpenCL implementations hurts the spreading of these approaches. My opinion is strongly that AMD and Intel should focus their resources on a few good convolution implementations for their newest hardware and leave the framework development to the community. Had talked with Junli Gu (AMD) and Zhigang Gong (Intel) about this but the slowness of cooperation and bad workload distribution is staggering, given how far ahead nVidia is.
Do you agree?

@bhack
Copy link
Contributor

bhack commented Jan 19, 2016

Yes. And I agree with you. But I noted in particular the specific word "I believe". It is almost quite ironic after all the comments and developments in this thread.

@shelhamer
Copy link
Member

@naibaf7

Please send me a message at shelhamer@imaginarynumber.net so that we can better coordinate the efforts of the BVLC, AMD, Intel, NVIDIA, and yourself. We appreciate your continued OpenCL development and would like to promote this to an official branch. It would still be a work-in-progress branch and not for imminent merge but it should help focus the work. Drop a line when you have the chance!

@bhack
I want to believe
It seems that some content was lost in summarization, but the text will be updated soon to better reflect the clear OpenCL Caffe development on github.

@bhack
Copy link
Contributor

bhack commented Jan 19, 2016

@shelhamer Other than believe I want to see some sign of life. Seems really that your team at BVLC is out of band to manage contributions. There are some "scaling up" interesting notes in the interview but I hope that it is not too late with all of this new frameworks offers.

@olesalscheider
Copy link
Contributor

There is now also HcCaffe as part of AMD's GPUOpen initiative. It is written in C++ AMP which can be compiled with HCC to run on top of OpenCL or AMD's HSA stack.
It would be interesting to see a performance comparison...

@naibaf7
Copy link
Member Author

naibaf7 commented Jan 26, 2016

@olesalscheider
It mostly boils down to how the convolution is implemented. Did not have time to look at their implementation yet.

@olesalscheider
Copy link
Contributor

@naibaf7
Sure. It seems that they have ported a subset of hcBLAS from HC to C++ AMP and use that. But I also did not have a closer look yet.

@naibaf7
Copy link
Member Author

naibaf7 commented Jan 26, 2016

@olesalscheider
If it indeed is based on a BLAS it is going to be no faster. A kernel fusion implementation is needed to reach e.g. cuDNN levels, and writing it in GPU assembly is needed to reach Nervanasys level. I don't think the HCC/hcBLAS approach will be enough to compete with nVidia.

@naibaf7
Copy link
Member Author

naibaf7 commented Jan 30, 2016

Closing this PR, branch is now officially available here:
https://github.com/BVLC/caffe/tree/opencl

@Uchanka
Copy link

Uchanka commented Sep 1, 2016

@naibaf7
Hey mate, great job out there! I'm still wondering how to port the OpenCL branch onto Android mobile phones, and get it running on mobile GPUs, for instance, Adreno Mali.
Is there any tutorials or something that I can refer to?
Thanks a lot.

@naibaf7
Copy link
Member Author

naibaf7 commented Sep 1, 2016

@Grillnov
There's a guy called @sh1r0 who is coding the additional toolchain to support Caffe on Android.
See here: https://github.com/sh1r0/caffe-android-lib/tree/opencl_dev
He will probably be able to help you further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.