Newby question to CUDA container and ssh #36

spalkovits · 2016-01-18T12:30:06Z

Hello,
I have a machine with a proper CUDA and Docker installation. When I start an interactive container and for example do an nvidia-sim -l everything looks fine. However when I add an ssh server that in the future other users can also use CUDA (without knowing about Docker) the same container fails when I do an nvidia-sim, although the binary is there.
I read about the nvidia-docker-plugin, but I think I need something like a step by step instruction on how to use it.
Regards,
Stefan

3XX0 · 2016-01-18T18:35:39Z

I'm not sure I understood your problem correctly.
Where is sshd living? in the host or in the container? Are you using NV_HOST?
Can you give use the list of commands you issued with their respective output, it would help us reproduce the error.

spalkovits · 2016-01-19T07:40:09Z

Hello,
I did the following:

Prerequisites:

Docker is installed properly on my Ubuntu 14.04 machine, the "Hello World" Container works like expected
The Nvidia driver on the host machine is working properly. I did it after the instruction on http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/index.html and everything works fine
nvidia-docker is installed properly after the instructions on https://github.com/NVIDIA/nvidia-docker. Everything works fine. When I make the example and run the "nvidia-smi" example I get the expected output.
. The nvidia-docker-plugin is installed and working. When I "sudo nvidia-docker-plugin -l :3476" and on the other hand do a "curl localhost:3476/v1.0/gpu/info" I get the desired output.

Finally my problem:

I create a docker container with a Dockerfile. I start with a "FROM cuda" and add the rest of the Dockerfile from https://docs.docker.com/engine/examples/running_ssh_service/

It looks the like this:

FROM cuda
RUN apt-get update && apt-get install -y openssh-server
RUN mkdir /var/run/sshd
RUN echo 'root:screencast' | chpasswd
RUN sed -i 's/PermitRootLogin without-password/PermitRootLogin yes/' /etc/ssh/sshd_config

# SSH login fix. Otherwise user is kicked off after login
RUN sed 's@session\s*required\s*pam_loginuid.so@session optional pam_loginuid.so@g' -i    /etc/pam.d/sshd

ENV NOTVISIBLE "in users profile"
RUN echo "export VISIBLE=now" >> /etc/profile

EXPOSE 22
CMD ["/usr/sbin/sshd", "-D"]

I changed the password but that should not be an issue. The I build the container with docker with "docker build -t image_name_goes_here".

When I start the container interactively with "nvidia-docker run -it --name name_goes_here -p 10022:22 image_goes_here /bin/bash" I can use "nvidia-smi -q" to get the desired output.

BUT when I ssh into the same running container even a "which nvidia-smi" fails though it is in the right place.

Any ideas what I missed to get the desired behavior? I what the ssh-container solution because I do not want every user to work on the host machine though I know I does not completely fulfill the docker philosophy.

Regards,

Stefan

3XX0 · 2016-01-19T08:44:11Z

Your issue comes from the fact that the CUDA environment is not passed to the SSH session.
You need to export it in your /etc/profile as shown in your example.
The following should do the trick:

RUN echo "export PATH=$PATH" >> /etc/profile && \
    echo "ldconfig" >> /etc/profile

spalkovits · 2016-01-19T12:27:33Z

Indeed that solved it. Thank you very much.

May I add another two questions then:

What exactly is the nvidia-docker-plugin then meant for? I think I got then something wrong from the wiki. Especially the part from "Running it remotely" from https://github.com/NVIDIA/nvidia-docker/wiki/Using-nvidia-docker
Can I run two containers with nvidia-docker while only having one GPU?

I hope my questions are not too abstract.

Regards,

Stefan

3XX0 · 2016-01-19T17:46:50Z

The documentation of nvidia-docker and nvidia-docker-plugin explains it. The plugin is needed if you want to deploy NVIDIA Docker on a remote host (say AWS) or if you don't want to setup your volumes manually.
You can, however your GPU processes will have to share the GPU. You can use NVIDIA MPS for that purpose.

spalkovits · 2016-01-20T07:47:16Z

Thanks a lot. I think I can go on with your information.

Regards,

Stefan

3XX0 added the question label Jan 18, 2016

3XX0 closed this as completed Jan 20, 2016

This was referenced Mar 12, 2020

docker: Error response from daemon #1217

Closed

Unable to create container #1218

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Newby question to CUDA container and ssh #36

Newby question to CUDA container and ssh #36

spalkovits commented Jan 18, 2016

3XX0 commented Jan 18, 2016

spalkovits commented Jan 19, 2016

3XX0 commented Jan 19, 2016

spalkovits commented Jan 19, 2016

3XX0 commented Jan 19, 2016

spalkovits commented Jan 20, 2016

Newby question to CUDA container and ssh #36

Newby question to CUDA container and ssh #36

Comments

spalkovits commented Jan 18, 2016

3XX0 commented Jan 18, 2016

spalkovits commented Jan 19, 2016

3XX0 commented Jan 19, 2016

spalkovits commented Jan 19, 2016

3XX0 commented Jan 19, 2016

spalkovits commented Jan 20, 2016