An example build on AWS using latest generation GPU instances #1092

ghost · 2014-09-16T17:26:43Z

I have been unsuccessfully trying to get Caffe to install on latest GPU instances, is it possible to provide a public AMI that has caffe pre-installed?

shelhamer · 2014-09-16T17:35:48Z

Make sure you pick a GPU instance that has compute capability >= 3.0.

The wiki has a reference to an AMI but I'm not sure that it's up-to-date: https://github.com/BVLC/caffe/wiki/Setting-up-Caffe-on-Ubuntu-14.04

@cdoersch could you comment on any instance details from your recent installation?

cdoersch · 2014-09-16T21:00:15Z

I've only gotten it to run on g2.2xlarge instances, which are the newest GPU's on EC2. I was using the starcluster HVM AMI which is ubuntu 12.04. Confusingly, it comes with its own version of cuda and the nvidia driver that's too old to run caffe. I find that from stock ubuntu, this does the trick:

wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1204/x86_64/cuda-repo-ubuntu1204_6.5-14_amd64.deb
dpkg -i cuda-repo-ubuntu1204_6.5-14_amd64.deb
apt-get update
apt-get install -y cuda

Otherwise, I just followed the directions from the caffe website.

kloudkl · 2014-09-26T02:17:54Z

There are a few docker files for Caffe.

mmoghimi · 2014-10-15T06:02:01Z

I followed the instructions to setup caffe on aws but still have issues related to CUDA.

@cdoersch do you an ami that you can share?

cdoersch · 2014-10-15T15:35:11Z

I unfortunately don't have one at the moment; there's some additional customizations on the machine I'm using (not to mention that I'm currently running out of AWS funds). If you have a more specific issue, post it and I may be able to help.

mmoghimi · 2014-10-15T20:38:16Z

@cdoersch here is the error message.
http://pastebin.com/hfeEMVty

cdoersch · 2014-10-15T20:45:25Z

Looks like your GPU isn't recognized. What's the result of nvidia-smi -a?

mmoghimi · 2014-10-15T20:47:03Z

modprobe: ERROR: could not insert 'nvidia_340': Unknown symbol in module, or unknown parameter (see dmesg)
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

cdoersch · 2014-10-15T20:50:53Z

Did you get any errors when you ran apt-get install -y cuda? On my system, that installs the 340 driver.

mmoghimi · 2014-10-16T03:42:09Z

I just launched a new instance and did everything from scratch.
apt-get install cuda installs 340 but I get error when I'm trying to run make runtest
then I uninstalled
sudo apt-get remove --purge nvidia-340 nvidia-modprobe nvidia and installed them from the .run file NVIDIA-Linux-x86_64-340.46.run and installed it.

Still doesn't work.

E1016 03:36:38.640488 13775 common.cpp:98] Cannot create Curand generator. Curand won't be available.
F1016 03:36:38.640584 13775 benchmark.cpp:87] Check failed: error == cudaSuccess (38 vs. 0) no CUDA-capable device is detected

Any thoughts?

cdoersch · 2014-10-16T04:03:48Z

That apt-get remove would have gotten rid of cuda too, wouldn't it?

For me, cuda installs into /usr/local/cuda-6.5 when done via apt-get install cuda. Make sure it's there.

Which version of ubuntu are you using? If it's a more recent version, you may have an nvidia-340 in the ubuntu repositories, which may cause conflicts with what you would get from nvidia's repositories.

Also, this isn't a caffe problem, it's a problem with your install of cuda. I wouldn't bother running caffe again until you can get nvidia-smi -a to output your graphics card model. That program will hopefully give more helpful error messages.

kmatzen · 2014-10-16T16:49:11Z

@AKSHAYUBHAT, I have an AMI that works with caffe on g2.2xlarge. We can chat offline if you want access, but I'm pretty sure it's just the HVM Ubuntu 14.04 AMI with cuda 6.5 and docker installed. Then I use my kmatzen/caffe or kmatzen/caffe-debug docker image. Both are available on hub.docker.com.

sudo docker run -t -i --privileged -v /mnt/datastore:/datastore kmatzen/caffe /bin/bash

https://registry.hub.docker.com/u/kmatzen/caffe/dockerfile/
https://registry.hub.docker.com/u/kmatzen/caffe-debug/dockerfile/

I also have a docker image called kmatzen/caffe-base that includes just the dependencies. The Dockerfile can be found in my repo:
https://github.com/kmatzen/caffe/blob/mesos/docker/base/Dockerfile

One thing you might want to change is that this Dockerfile references my mesos-base docker image. You could just change it to reference the ubuntu:14.04 docker image.

shelhamer · 2014-10-16T17:13:13Z

@kmatzen your explanation and docker images could help a lot of new users -- if you have a chance, please add a section (perhaps under installation) to the wiki https://github.com/BVLC/caffe/wiki.

achalddave · 2014-10-20T21:36:12Z

I've made an AMI with the latest version of caffe on the g2.2x large instances. I ran into some issues setting up Cuda by starting with the AMI here: https://github.com/BVLC/caffe/wiki/Ubuntu-14.04-ec2-instance, so thought this might be useful: ami-03f2e746 on N. California.

Starting from the image in that wiki article, I had to do the following (I'm relatively confident I've not missed any steps, but if I did I apologize - feel free to ping me and I can try to help):

Get the latest official caffe repo. The one in that image is an unmaintained fork as far as I can tell.
sudo apt-get install libgflags-dev liblmdb-dev (unrelated to gpu - this was not installed in the image as I think it's a more recent dependency)
Add the following to /etc/modprobe.d/blacklist-nouveau.conf

blacklist nvidiafb
blacklist nouveau
blacklist rivafb
blacklist rivatv
blacklist vga16fb
options nouveau modeset=0

Run sudo update-initramfs -u, sudo reboot
sudo apt-get install linux-image-extra-virtual
Remove gcc-4.6, install gcc-4.8 if necessary, make sure gcc-4.8 is available.

sudo apt-get remove gcc-4.6
sudo apt-get install gcc-4.8
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.8 20
sudo update-alternatives --install /usr/bin/cc gcc /usr/bin/gcc-4.8 20

(unsure). I ran sudo apt-get install linux-headers-$(uname -r), but I forget if this was necessary.

shelhamer · 2014-10-21T00:45:24Z

Thanks for recording the changes to bring the ec2 instance up-to-date!
Could you update the wiki page to reflect your steps?
https://github.com/BVLC/caffe/wiki/Ubuntu-14.04-ec2-instance

On Mon, Oct 20, 2014 at 2:36 PM, Achal Dave notifications@github.com
wrote:

I've made an AMI with the latest version of caffe on the g2.2x large
instances. I ran into some issues setting up Cuda by starting with the AMI
here: https://github.com/BVLC/caffe/wiki/Ubuntu-14.04-ec2-instance, so
thought this might be useful: ami-03f2e746 on N. California.

Starting from the image in that wiki article, I had to do the following
(I'm relatively confident I've not missed any steps, but if I did I
apologize - feel free to ping me and I can try to help):

Get the latest official caffe repo. The one in that image is an
unmaintained fork as far as I can tell.

sudo apt-get install libgflags-dev liblmdb-dev (unrelated to gpu -
this was not installed in the image as I think it's a more recent
dependency)

Add the following to `/etc/modprobe.d/blacklist-nouveau.conf

blacklist nvidiafb
blacklist nouveau
blacklist rivafb
blacklist rivatv
blacklist vga16fb
options nouveau modeset=0

Run sudo update-initramfs -u, sudo reboot

sudo apt-get install linux-image-extra-virtual

Remove gcc-4.6, install gcc-4.8 if necessary, make sure gcc-4.8 is
available.

sudo apt-get remove gcc-4.6
sudo apt-get install gcc-4.8
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.8 20
sudo update-alternatives --install /usr/bin/cc gcc /usr/bin/gcc-4.8 20

(unsure). I ran sudo apt-get install linux-headers-$(uname -r), but
I forget if this was necessary.

—
Reply to this email directly or view it on GitHub
#1092 (comment).

achalddave · 2014-10-21T00:59:10Z

Sure; the reason I didn't is because you don't need most of these except on a GPU instance, so I wasn't sure if I should modify that page to have a GPU section, confuse things by include a separate ami, or a new page. I'll try to do one of them soon

Edit (December): Haven't had a chance yet sorry, if anyone else is up for it feel free to do so.

shuggiefisher · 2014-10-23T19:18:06Z

Awesome, thanks for sharing the AMI @achalddave. I found that performance is much better with cudnn. To recompile caffe with cudnn I had to downgrade to g++-4.6, and upgrade to cuda 6.5

./examples/mnist/train_lenet.sh on g2.2xlarge

GPU CUDA 6.0 = 239 secs
CPU = 1075 secs
GPU CUDA 6.5 w/CuDNN g++-4.6 = 47 secs
CPU g++-4.6 = 1052secs

tleyden · 2014-10-27T22:20:51Z

I have been unsuccessfully trying to get Caffe to install on latest GPU instances, is it possible to provide a public AMI that has caffe pre-installed?

@AKSHAYUBHAT I wrote up instructions on how I got it working, including a public AMI with the nvidia kernel module + cuda 6.5 drivers that can be used as an easy starting point for the host OS.

See Running Caffe on AWS GPU Instance via Docker

dylanvaughn · 2014-12-14T09:57:09Z

Thanks for all the great examples! I used this conversation heavily to create a chef cookbook that installs CUDA, cuDNN, and Caffe (with Python bindings) on an AWS g2.2xlarge running Ubuntu 14.04:

https://github.com/robomakery/caffe-cookbook

I am building AMIs with Caffe pre-installed using Packer and this cookbook.

ghost · 2015-01-16T21:50:50Z

Thank you everyone.

shelhamer added question compatibility labels Oct 16, 2014

ghost closed this as completed Jan 16, 2015

beniz mentioned this issue May 31, 2015

Amazon Machine Instance (AMI) on EC2 jolibrain/deepdetect#5

Closed

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

An example build on AWS using latest generation GPU instances #1092

An example build on AWS using latest generation GPU instances #1092

ghost commented Sep 16, 2014

shelhamer commented Sep 16, 2014

cdoersch commented Sep 16, 2014

kloudkl commented Sep 26, 2014

mmoghimi commented Oct 15, 2014

cdoersch commented Oct 15, 2014

mmoghimi commented Oct 15, 2014

cdoersch commented Oct 15, 2014

mmoghimi commented Oct 15, 2014

cdoersch commented Oct 15, 2014

mmoghimi commented Oct 16, 2014

cdoersch commented Oct 16, 2014

kmatzen commented Oct 16, 2014

shelhamer commented Oct 16, 2014

achalddave commented Oct 20, 2014

shelhamer commented Oct 21, 2014

achalddave commented Oct 21, 2014

shuggiefisher commented Oct 23, 2014

tleyden commented Oct 27, 2014

dylanvaughn commented Dec 14, 2014

ghost commented Jan 16, 2015

An example build on AWS using latest generation GPU instances #1092

An example build on AWS using latest generation GPU instances #1092

Comments

ghost commented Sep 16, 2014

shelhamer commented Sep 16, 2014

cdoersch commented Sep 16, 2014

kloudkl commented Sep 26, 2014

mmoghimi commented Oct 15, 2014

cdoersch commented Oct 15, 2014

mmoghimi commented Oct 15, 2014

cdoersch commented Oct 15, 2014

mmoghimi commented Oct 15, 2014

cdoersch commented Oct 15, 2014

mmoghimi commented Oct 16, 2014

cdoersch commented Oct 16, 2014

kmatzen commented Oct 16, 2014

shelhamer commented Oct 16, 2014

achalddave commented Oct 20, 2014

shelhamer commented Oct 21, 2014

achalddave commented Oct 21, 2014

shuggiefisher commented Oct 23, 2014

tleyden commented Oct 27, 2014

dylanvaughn commented Dec 14, 2014

ghost commented Jan 16, 2015