-
-
Notifications
You must be signed in to change notification settings - Fork 281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU Support? #63
Comments
I'm completely out of my depth on this one. @msarahan any knowledge on the subject? |
I'm also interested in this. The trick is most CIs do not provide GPUs. However, if you are willing to work with OpenCL, which works with CPUs and GPUs, then we can work together on getting that support on CIs. |
Unfortunately, CUDA really is the de-facto standard for machine learning. I Question -- is conda and conda-build updated regularly on this system? My On Sun, Mar 27, 2016 at 6:56 PM jakirkham notifications@github.com wrote:
|
I thought you might say that. Unfortunately, without the CI support, we are kind of in a bind on this one. If NVIDIA is willing to work on CI support with GPUs, that would be great. To be completely honest with you, I don't think we should be holding our breath on this. The real problem is that most CI services are leasing time on other infrastructure. Primarily Google and Amazon's infrastructure. Unless someone has the infrastructure with GPUs that they are willing to lease to some CI service for this purpose, we are kind of stuck. I think we can all imagine what they would prefer to do with this infrastructure, right? However, if you figure out something on this, please let us know and we can work on something. I'm guessing you are using Torch then? At a bare minimum, let's work on getting Torch's dependencies in here. At least, that will make your job a little easier, right? For that matter any run-of-the-mill Lua packages that you have would be good to try to get in, as well. It should help you and others looking for more Lua support in conda. How does that sound? |
This repo seems to use NVIDA components for their CI. |
Did you see the link above, @alexbw? This might not be totally impossible after all, but I think we should do some research on how this works. What platforms are you trying to support? Just Linux? Mac also? I'm totally out of my depth on Windows. So we may need someone else to show us the ropes there. |
Saw the link. Looking more into this, but on the timescale of a few weeks. On Sun, Mar 27, 2016 at 6:56 PM jakirkham notifications@github.com wrote:
|
So, I looked a little bit more closely at this and it looks like one could add GPU libs to CentOS 6 (what we are currently using). There is Mac and Windows support too, but IMHO this is secondary to getting Linux up and working. However, I am not seeing any support for CentOS 5 (a platform we were debating switching too), which is something to keep in mind. |
Good to know. We are collecting datapoints on whether continuing with CentOS 5 is a good idea. If anyone knows of definitive reasons to stay with CentOS5, it is currently preventing:
|
Glad you saw this, @msarahan. Was debating cross referencing, but didn't want to have a mess of links. Are there still many customers using a CentOS 5 equivalent Linux? Could we maybe build the compiler on CentOS 5 and somehow add it to CentOS 6? |
Building the compiler on the older architecture doesn't help. What matters is the GLibc version present on the build system when packages are built. We don't have hard data on customers, outside of HTTP headers for downloads of packages. We're digging through that to see how many people have kernels older than the one corresponding to CentOS 6. |
Right. I was just hoping there was a way we could somehow have both. I guess in the worst case some things could be CentOS 6 as needed. Will that have any consequences if we mix the two? Is that already done to some extent (noting that Qt5 was mentioned). Yeah, it seems like it would be good to give a survey. Might need an incentive to make sure it actually gets filled out. |
Also, interesting footnote (though I would appreciate if other people check and make sure I am reading this right as IANAL), it appears that at least some of the CUDA libraries can be shipped. This means we could create a CUDA package that simply makes sure CUDA is installed in the CI and moves them into the conda build prefix for packaging. The resulting package could then be added as a dependency of anything that requires them (e.g. Torch, Caffe, etc.). This would avoid us having to add these hacks in multiple places and risk losing them when we re-render. Furthermore, we would be able to guarantee that the libraries we used to build would be installed on the user's system. |
We should verify whether one version of CUDA for one Linux distro and version can be used on other Linux distros and versions easily or if we need to have multiple flavors. This survey will have to be extended to other OSes at some point, but starting with Linux makes the most sense to me. |
So, I am trying something out in this PR ( conda-forge/caffe-feedstock#1 ). In it, I am installing CUDA libraries into the docker container and attempting to have Caffe build against them. It is possible some tests will fail if we don't have access to an NVIDA GPU so we will have to play with that. Also, we don't have cuDNN as that appears to require some registration process that I have not looked into yet and may be a pain to download in batch mode. In the long run, I expect the CUDA libraries will be wrapped in their own package for installation and packages needing them will simply install these libraries. We may need to features to differentiate different GPU variants (CUDA/OpenCL). However, that CUDA package will probably need to hack the CI script in a similar way. Definitely am interested in feedback. So, please feel free to share. |
Another thought might be that we don't ship the CUDA libraries. Instead we have a package that merely checks to set that they are installed via a pre or post-link step. If it fails to find them, the install fails. This would avoid figuring out where the libraries can or cannot be distributed safely. Hopefully, as we are linking against the CUDA API. All that will matter is that we have an acceptable version of the CUDA libraries to run regardless of what Linux distribution it was initially built on. |
Appears Circle CI does provide GPU support or at least that is what my testing suggests. |
Also, as FYI, in case you didn't already know @msarahan, CentOS 5 maintenance support ends March 2017. In other words, less than a year. That sounds like a pretty big negative to me. Given how many recipes have been added from conda-recipes and how many remain to be added at this point, trying to add a CentOS 5 switch before that point sounds challenging. Not to mention, we may find ourselves needing to migrate back to CentOS 6 by that point. Maybe it is just me, but I'm starting to feel a lot of friction in switching to CentOS 5. Is it reasonable that we consider just accepting CentOS 6 as part of this transition? |
FWIW, we have GPU support on Omnia. Might be worth reading over. |
Thanks for these links @kyleabeauchamp. I'll certainly try to brush up on this. Do you have any thoughts on this PR ( conda-forge/caffe-feedstock#1 )? Also, how do you guys handle the GPU lib dependency? Is that packaged somehow, used from the system (possibly with some sort of check), some other way? |
So AFAIK our main use of GPUs was building a simulation engine OpenMM (openmm.org). OpenMM is a C++ library and can dynamically detect the presence of CUDA support (via shared libraries) at runtime. This means that we did not package and ship anything related to CUDA. We basically just needed CUDA on the build box to build the conda package, then let OpenMM handle things later dynamically. |
Looks like our dockerfile is somewhat similar to your CUDA handling: https://github.com/omnia-md/omnia-build-box/blob/master/Dockerfile#L25 |
Ah, ok, thanks for clarifying. The trick with Caffe, in particular, is that it can use CPU, CUDA, or OpenCL. CPU support is always present; however, a BLAS is required, which includes a user's CPU choice (OpenBLAS, ATLAS, MKL, or possibly some hack to add other options) and a GPU choice (if any) cuBLAS or ViennaCL. Thus, having this dynamically determined ends up not really being as nice as it could be. To allow actual selection will require feature support and possibly multiple rebuilds of Caffe. One simple route might be to just always use ViennaCL, which can abstract the difference between the OpenCL and CUDA options. Also, it can always fallback to the CPU if no GPU support is present. Though I expect this layer of abstraction comes at some penalty, the question is how severe is that penalty. Would a solution like this work with OpenMM? I don't know if its GPU support proceeds primarily through a GPU BLAS or some other mechanisms. For instance, is it using FFTs? If you have deep learning interests, this may be relevant. In the case of Caffe, it can optionally support cuDNN. Researchers will want this support out of the box. Not only is this tricky because it may not be provided for due to hardware or software reasons, it is tricky because downloading cuDNN requires a registration step with unclear licensing restrictions. One way we might resolve this is to request cuDNN be loaded on an appropriate Docker container. NVIDA does do this with Ubuntu 14. However, I don't see a similar container for CentOS 6 and am unclear on whether it would be a supported platform. Ultimately, it will require us to communicate with NVIDA at some point to see what we need to do here to stay above board while providing users state of the art support. Fortunately, NVIDA is very clear as to what parts of the CUDA libraries can be distributed down to file level of what can and cannot be distributed. So, the concerns with cuDNN do not affect this. |
Another thought for more versatile support would be to use the clMath libraries. |
OpenMM dynamically chooses the best platform at runtime, with options including CPU (SSE), CUDA, OpenCL, and CPU (no SSE / reference). It does use FFTs. The idea with OpenMM is to build and ship binaries that support all possible platforms, then select at runtime. |
@scopatz did the legal council state what clause made the cuda toolkit non-redistributable? Are we allowed to link to cuda stuff so long as we pull in the dependency from defaults? Ref: https://docs.nvidia.com/cuda/eula/index.html#attachment-a |
Basically, counsel has said that the EULA - even attachment A - only refers to applications (it does), and conda-forge cannot be considered an application under any reasonable definition. Also, it looks like they have added the following to attachment A, which we don't meet:
|
I guess we can't distribute the Cuda driver. Many people are OK with installing that themselves :/. Second question would be: |
Linking to the deafults libraries is fine for us |
Then we can say goodbye to reproducible workflow involving CUDA. This is really a non sense we can't ship it in conda-forge. I understand the legal aspect of it. What I don't understand is why they took that decision...(unless it's a misunderstanding and we can actually ship it conda?) |
I agree on all counts.
From speaking with NVIDIA, I am reasonably sure that this is not a misunderstanding on their part. The folks who work on ML and cuda and PyData stuff at NVIDIA are different than those in their legal department. I don't know the justification for the license as it stands, but we have to live with it until:
|
Thank you @scopatz for working on all those things. In the meantime for those who want to have reproducible CUDA installation (only Linux unfortunately), see the following gist: https://gist.github.com/hadim/a4fe638587ef0d7baea01e005523e425 |
Now that building package with CUDA is working fine on linux (in case of https://github.com/conda-forge/prismatic_split-feedstock, it went smoothly), how would it be possible to add windows support? Is it worth trying to add a |
PRs welcome. We could use |
Ok, I will give a try. |
@ericpre, for an example of how to build GPU packages would take a look at nvidia-apex as a simple example. Sorry there are no docs atm. |
@jakirkham, does |
I've got an NVIDIA Jetson AGX Xavier (8-core Tegra I also have a laptop with a GTX 1050 Ti GPU / x86_64 CPU but there's plenty of other ways to get at the GPU there. |
Why isn't OpenCL included in OpenCV already. CI support and license issues aren't an issue so why is it blocked on Cuda? |
I've got a couple packages I'm preparing for upload that rely on GPUs. I'm not up to speed on what open-source CI solutions offer, but would building against a VM w/ GPUs be supported? If it would require pitching in or donating to the project, I'm pretty sure I can figure some way to help.
The text was updated successfully, but these errors were encountered: