-
Notifications
You must be signed in to change notification settings - Fork 2k
GPU Docker Plugin #8
Comments
Don't worry, we are already working on the plugin :) |
I figured, but you know I couldn't help but ask :P |
It's being worked on as we speak :) I should have a working implementation fairly soon for you to play with. |
The wrapper script was my biggest concern with this project, but the Docker plugin sounds like the ideal solution. Once this is ready (alongside #7 and hopefully #10) I'll be happy to port over the many DL images built on top of |
Leverage the Docker volume plugin mechanism introduced with Docker 1.9 This plugin also exports few REST endpoints to ease remote NVIDIA Docker management This should address issue #8
I just pushed an initial implementation of the plugin in the
In addition it provides Example of running CUDA runtime with two GPUs 0 and 1: make runtime
cd plugin && make
sudo sh -c "./bin/nvidia-docker-plugin &"
gpu(){ curl -s http://localhost:3476/docker/cli?dev=$1\&vol=bin+cuda; }
docker run -ti `gpu 0+1` cuda:runtime |
@3XX0 , I've tried running the plugin using the snippet you posted above, but It looks like I'm having some issues with the ~/git/NVIDIA/nvidia-docker/plugin$ sudo sh -c "./bin/plugin &"
nvidia-docker-plugin | 2015/12/08 11:29:07 Loading NVIDIA management library
nvidia-docker-plugin | 2015/12/08 11:29:07 Loading NVIDIA unified memory module
nvidia-docker-plugin | 2015/12/08 11:29:07 Discovering GPU devices
nvidia-docker-plugin | 2015/12/08 11:29:07 Creating volumes
nvidia-docker-plugin | 2015/12/08 11:29:07 Error: invalid ld.so.cache file I've reproduced the same error on two different systems: Let me know of any more specifics or logs you'd need. |
Weird, can you give the output of strings /etc/ld.so.cache | head -n 2
hexdump -C -n 256 /etc/ld.so.cache
hexdump -C /etc/ld.so.cache | grep -A2 glibc |
@3XX0 , I've amended my gist above with the added output. |
My bad ... Thanks for the report, it should be fixed now |
Success using the new nvidia-docker-plugin executable!
So correct me in what I see going on so far:
I like the REST endpoints, it's kind of handy alone just to be able to point a browser at |
The query string separator is wrong, that's why the
|
@ruffsl Correct, to use it remotely you would do something like: # On the docker-machine host
sudo ./bin/nvidia-docker-plugin -l :3476
# On the docker client
gpu(){
host="$( docker-machine url $1 | sed 's;tcp://\(.*\):[0-9]\+;http://\1:3476;' );"
curl -s $host/docker/cli?dev=$2\&vol=bin+cuda;
}
eval "$(docker-machine env <MACHINE>)"
docker run -ti `gpu <MACHINE> 0+1` cuda:runtime Note if your Eventually everything should be abstracted within |
@flx42 , what I'm seeing is that
--device=/dev/nvidiactl --device=/dev/nvidia-uvm --device=/dev/nvidia0 --volume-driver=nvidia --volume=bin:/usr/local/nvidia/bin --volume=cuda:/usr/local/nvidia It's not likely that someone would want to use this without mounting the bin+nvidia, but I was thinking it'd behave as a parameter (omit it, and it won't included)? @3XX0 , wouldn't that assume you'd need to expose port 3476 of the remote machine to the world? I'f I'm recalling correctly, docker-machine runs via daemon binding to a TCP port via key exchange and then some ssh. How would the request reach the remote REST endpoint from the local client's shell session? |
Yes, if you don't specify |
I see, that does work. Is there a way to specify |
@ruffsl no we didn't implement Regarding the REST API, if you want remote access, you need to expose it. |
Well, it'd be tedious for people to override this while still leveraging the device detection and mounting the rest of the plugin machinery here has to offer. Niche I know, but this would be useful for those who'd like to use set nvidia devices, but needn't use cuda. Remember people like me still need to bake the drivers into the container for some apps to get things like OpenGL working, I think these volumes may blow aways some files we'd need to preserve during runtime in that scenario. Yea, regarding remote access, I feel like there might be a better way to get about this. I'm wondering if there would be something better than just port forwarding with the docker-machine-ssh. Let's ask @psftw , maybe he'd know about this topic or know who to ask. |
I don't get it, why would you want no volumes ? GPU devices are unusable without at least one NVIDIA volume and if you really need it then you don't need to use the Speaking of which, I'm wondering if the current volume separation (aka. |
Leverage the Docker volume plugin mechanism introduced with Docker 1.9 This plugin also exports few REST endpoints to ease remote NVIDIA Docker management This should address issue #8
Leverage the Docker volume plugin mechanism introduced with Docker 1.9 This plugin also exports few REST endpoints to ease remote NVIDIA Docker management This should address issue #8
Leverage the Docker volume plugin mechanism introduced with Docker 1.9 This plugin also exports few REST endpoints to ease remote NVIDIA Docker management This should address issue #8
Is it now possible to use docker-compose along with nvidia-docker? If so, how? |
I too would be interesting in using docker-compose along with nvidia-docker |
See #39 ;) |
I've been looking at ways to use cuda containers at my workplace, as our lab shares a common Nvidia workstation, and I'd like interact with this server in a more abstract manner so that 1) I can more readily port my robotics work to any nvidia workstation, and 2) minimize the impact of changes effecting others using shared research workstation.
One gap I'm wrestling with is how to incorporate the current NVIDIA Docker wrapper with the rest of the existing docker ecosystem: docker compose, machine, and swarm. The current drop-in replacement for docker run|create CLI is awesome, but it only gets us so far. The moment we need to use any additional tooling for abstracting or scaling up our apps or avoiding the need to interact with the host directly, well its hard to get to that last step.
So I'm thinking this might be a case for making relevant docker plugin, harkening back to a recent post on the Docker blog, Extending Docker with Plugins. That post was perhaps geared more towards networking and storage drivers, but perhaps our issue here could be treated as a custom volume management. I feel the same level of integration of GPU device options may be called for to achieve the desired user experience in cloud development or cluster computing with Nvidia. This'll most likely call for something more demanding than shell scripts to extend the needed interfaces, so I'd like to hear the rest of the community's and Nvidia devs take on this.
The text was updated successfully, but these errors were encountered: