Using nvidia-docker from third-party tools #39

hannes-brt · 2016-01-25T17:05:26Z

It's very easy to use nvidia-docker when running individual containers, but is there a way to run nvidia-docker instead of docker from other Docker tools like docker-compose, Tutum, Rancher, etc?

I am assuming one would just need to specify the nvidia-docker volume to be mounted in the container, but I couldn't find any documentation on the correct syntax.

The text was updated successfully, but these errors were encountered:

flx42 · 2016-01-25T23:22:55Z

If the tool supports overriding the docker command, then you should use that to plug in nvidia-docker. For example, we provide this option ourselves with environment variable NV_DOCKER.

If it's not possible, you can query the Docker CLI arguments to the plugin:

$ curl -s localhost:3476/docker/cli
--device=/dev/nvidiactl --device=/dev/nvidia-uvm --device=/dev/nvidia0 --device=/dev/nvidia1 --volume-driver=nvidia-docker --volume=nvidia_driver_352.68:/usr/local/nvidia:ro

Of course you will need to transform this into YAML format for docker-compose (for example).
We were wondering what to do in this case and we couldn't find a clean solution. Would that help if we add a REST endpoint that returns the CLI above as YAML? Or as JSON?

Thank you!

ruffsl · 2016-01-26T01:24:31Z

If a sorf-of multiline variable substitution was possible in compose files, I guess you could add a YAML format option for the for the REST endpoint. But I think the environment variables are queried after the YAML syntax is parsed, resulting in "invalid mode" error from the improper structure. This still might be useful for just a 'copy and paste' approach for compose files though.

Is there a way we could use the volume-driver option with say set label format to convey the card IDs to export?

3XX0 · 2016-01-26T01:44:33Z

@ruffsl I like the idea of using the variable substitution.
Can't we leverage the one-line YAML syntax? If the Rest, output something like:

NV_DEVICES="['/dev/nvidia0', '/dev/nvidia1']"

We could easily do something like (the same way docker-machine does it):

eval "$(curl -s localhost:3476/docker/env)"

given the following docker-compose.yml, it would work right?

devices: ${NV_DEVICES}

ruffsl · 2016-01-26T02:02:10Z

I'm just testing a foo bar example:

test:
  image: ubuntu
  volumes: ${FOO_BAR}
  command: ping 127.0.0.1

and am seeing this:

$ mkdir /tmp/foo
$ mkdir /tmp/bar
$ export FOO_BAR="['/tmp/bar:/bar', '/tmp/foo:/foo']"
$ docker-compose up
ERROR: Validation failed in file './docker-compose.yml', reason(s):
Service 'test' configuration key 'volumes' contains an invalid type, it should be an array

Is this the correct one-line YAML syntax? I've never seen it before.
Also, how could you keep other volumes or device lists defined alongside (stuff you'd like to keep in the compose file)?

3XX0 · 2016-01-26T02:23:38Z

Not sure if docker-compose supports it but it's part of the YAML spec:
http://yaml.org/spec/1.2/spec.html#id2759963

For other volumes/devices if docker-compose supports multiple volumes or devices keywords it would work, otherwise by omitting the brackets I suppose.

flx42 · 2016-01-26T05:11:07Z

It seems to work without variable substitution:

test:
  image: ubuntu
  devices: ['/dev/nvidiactl', '/dev/nvidia-uvm', '/dev/nvidia0']
  command: ls /dev/

Variable substitution might be designed to generate values, and not YAML.
But let's wait for an official response from the docker-compose developers.

@ruffsl: I didn't understand the following, could you explain it?

Is there a way we could use the volume-driver option with say set label format to convey the card IDs to export?

ruffsl · 2016-01-26T13:20:08Z

@flx42 This was an example of what I was thinking

test:
  image: ubuntu
  volume_driver: nvidia-docker-driver
  labels:
    nvidia.gpu: "0,1"
  command: nvidia-smi

I'm not sure how feasible this would be, but I think its attractive in its simplicity from a user perspective. You've developed a plugin, could a custom driver be extend to parse the label metadata? This then could be a clean way to define nvidia containers with compose.

You could also use the variable substitution with this as well: export NV_GPU='0,1'

labels:
    nvidia.gpu: ${NV_GPU}

flx42 · 2016-01-26T22:29:44Z

A volume plugin will not be able to mount devices, and actually a plugin can't even inspect the starting image AFAIK.

flx42 · 2016-01-27T18:21:20Z

Good progress on docker/compose#2750
There is a pending PR to solve this use-case (haven't tested it).
But it will require a bleeding edge version of docker-compose if it's accepted.

3XX0 · 2016-01-28T18:26:36Z

@flx42 is correct, currently the only supported plugin is VolumeDriver and it only deals with volume names (code).
I think a JSON endpoint could be useful here, however, it means you would need to write small wrappers around docker-compose and other tools to do the conversion (e.g. JSON -> YAML config)

therc · 2016-04-07T02:01:22Z

Another vote here. I'm trying to get Kubernetes to talk to the plugin. Kubernetes doesn't build command lines, it uses a Docker client API (fsouza's, but there's a migration in progress to the official one). JSON might work. Or perhaps embedding parts of nvidia-docker and nvidia-docker-plugin as a library inside kubelet (the daemon that manages the node) or a helper process running on the same machine.

3XX0 · 2016-04-07T03:54:28Z

I'm not really familiar with Kubernetes but we are definitively interested in supporting it. Since it's written in Go, the nvidia package should do it. Alternatively we could use the nvidia-docker-plugin as a flex volume driver (maybe?).
Anyhow, feel free to create a separate issue and we'll address those requirements specifically.

matthieudelaro · 2016-04-26T17:44:27Z

Nut now uses nvidia-docker-plugin to mount GPUs in containers :)
I'm not using nvidia-docker/nvidia module though, but rather targeting the REST API directly to retrieve GPU paths, volume name, and to inject those values in docker API using go-dockerclient.

ruffsl · 2016-05-05T16:14:22Z

So is there method to use the nvidia plugin with docker-compose now,
or can that be broken out into it's own specific issue?

3XX0 · 2016-05-05T18:41:49Z

Well it's not working out of the box but with the addition of the /docker/cli/json endpoint you can generate docker-compose files easily. For example:

#! /usr/bin/env python

import urllib2
import json
import yaml
import sys

if len(sys.argv) == 1:
    print "usage: %s service [key=value]..." % sys.argv[0]
    sys.exit(0)

resp = urllib2.urlopen("http://localhost:3476/docker/cli/json").read()

args = json.loads(resp)
args["volumes"] = args.pop("Volumes")
args["devices"] = args.pop("Devices")
args["volume_driver"] = args.pop("VolumeDriver")

doc = {sys.argv[1]: args}
for arg in sys.argv[2:]:
    k, v = arg.split("=")
    args[k] = v

yaml.safe_dump(doc, file('docker-compose.yml', 'w'), default_flow_style=False)

./compose.py cuda image=nvidia/cuda command=nvidia-smi
docker-compose up

anibali · 2016-05-06T10:20:44Z

Whilst I appreciate this as a step in the right direction, this still isn't an ideal solution from my point of view. I'd like to see proper integration with docker-compose stay on the radar.

MadcowD · 2016-06-28T17:58:52Z

Agreed this is not a canonical solution by any means. This issue should be reopened :/

flx42 · 2016-06-28T19:46:53Z

@MadcowD I don't think there is much more we can do right now for a better integration. But it's still on our radar since we have people in our team using docker-compose with nvidia-docker.

jmerkow · 2016-07-29T14:38:38Z

@3XX0 I am trying to use your docker compose example above with a version '2' docker-compose but I am running into difficulty.

Here is my docker-compose file:

version: '2'

volumes:
  nvidia_driver_352.63:
      driver: nvidia-docker

services:
  cuda:
    command: nvidia-smi
    devices:
    - /dev/nvidiactl
    - /dev/nvidia-uvm
    - /dev/nvidia0
    image: nvidia/cuda
    volumes:
    - nvidia_driver_352.63:/usr/local/nvidia:ro

I get the following error:

Creating volume "utility_nvidia_driver_352.63" with nvidia-docker driver
ERROR: create utility_nvidia_driver_352.63: unsupported volume: utility_nvidia_driver_352.63

Any thoughts?

3XX0 · 2016-07-29T18:21:29Z

Last time I tried, I had to create the volume beforehand with docker volume create and specify the volume as external in the compose file (see here). Not really ideal though...

jmerkow · 2016-07-29T19:57:05Z

Better than nothing. This worked for me.

Steps:

$ docker volume create --name=nvidia_driver_352.63 -d nvidia-docker # create the docker

docker-compose:

version: '2'

volumes:
  nvidia_driver_352.63:
    external: true

services:
  cuda:
    command: nvidia-smi
    devices:
    - /dev/nvidiactl
    - /dev/nvidia-uvm
    - /dev/nvidia0
    image: nvidia/cuda
    volumes:
    - nvidia_driver_352.63:/usr/local/nvidia/:ro

You should be able to generate this yaml file (and generate the volume) by modifying compose.py above.

Thank you.

jmerkow · 2016-09-22T16:28:35Z

FYI. I use a different solution that is a little easier to manage between machines.

in the compose file I set an external driver as before, however, I give the service a static, common name, and set the name alias from variable interpolation like so:

volumes:
  nvidia_driver:
     external:
       name: ${NVIDIA_DRIVER_VOLUME}

then in my service, I use this common name (nvidia_driver)

    volumes:
     - nvidia_driver:/usr/local/nvidia/:ro

All that remains is to set environment variable NVIDIA_DRIVER_VOLUME to your local driver name. This can be obtained from docker volume ls, or from @3XX0's example code (just set the environment variable instead of writing a docker-compose file). I just inserted an export statement into my .bashrc.

flx42 · 2016-09-22T16:53:18Z

@jmerkow: @eywalker created a project called nvidia-docker-compose, we haven't tested it, but you might be interested to look at it.

achimnol · 2016-11-05T06:22:14Z

nvidia-docker ... works fine in my environment, but docker run $(curl -s http://localhost:3476/docker/cli) ... does not work with the following message:

docker: Error response from daemon: create 0cdaed180e31650f260d3902833b65560bf5ba6d995c8450138990711d6
6be36: bad volume format: 0cdaed180e31650f260d3902833b65560bf5ba6d995c8450138990711d66be36.
See 'docker run --help'.

Another symptom is that tensorflow.Session() hangs in Python 3.5.2 inside containers only when the exactly same containers are launched via my custom docker-py integration that interprets and adds configuration arguments from http://localhost:3476/docker/cli. If launched with nvidia-docker command, it works fine!

I'd like to know what exactly nvidia-docker command does, not only some volume/binding arguments, but also internal differences to the plain docker command.
For example, I found that it sets two environment variables:

CUDA_DISABLE_UNIFIED_MEMORY=1
CUDA_CACHE_DISABLE=1

and afterwards loads the nvml C library while the nvidia-docker command is running.
What differences does it make? What are the potential causes for indefinite hang in tensorflow when not launched with nvidia-docker?

achimnol · 2016-11-05T08:47:39Z

I've found that it's not actually hanging but becomes very very slow. (e.g., 10 sec in CPU or nvidia-docker vs. 92 sec in GPU with docker-py invocation) May be related to #224 ...?

flx42 · 2016-11-05T17:11:45Z

@achimnol We explain what nvidia-docker does on our wiki.

The "bad volume format" error is a limitation of Docker, see #181.

Finally, as I explained in #224, we have heard multiple users claiming their code was slower inside Docker, but every single time it was because they compiled the project with different flags, or they had different settings during execution.

flx42 added the question label Jan 25, 2016

3XX0 added enhancement new feature and removed enhancement labels Jan 26, 2016

ruffsl mentioned this issue Jan 26, 2016

Compose support flow sequence written as a comma separated list within square brackets? docker/compose#2750

Closed

3XX0 mentioned this issue Mar 2, 2016

GPU Docker Plugin #8

Closed

3XX0 mentioned this issue Apr 19, 2016

Document what the code does #67

Closed

ruffsl mentioned this issue Apr 26, 2016

Issue #2730 - better handle types for expanded variables docker/compose#2759

Closed

3XX0 closed this as completed in a481e73 May 4, 2016

kgk mentioned this issue Jun 21, 2017

Nvidia Docker rancher/rancher#4883

Closed

turian mentioned this issue Oct 6, 2018

docker: Error response from daemon: Unknown runtime specified nvidia. #838

Closed

8 tasks

sebo361 mentioned this issue Nov 12, 2018

Error response from daemon: OCI runtime create failed: #858

Closed

4 tasks

This was referenced Mar 12, 2020

docker: Error response from daemon #1217

Closed

Unable to create container #1218

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using nvidia-docker from third-party tools #39

Using nvidia-docker from third-party tools #39

hannes-brt commented Jan 25, 2016

flx42 commented Jan 25, 2016

ruffsl commented Jan 26, 2016

3XX0 commented Jan 26, 2016

ruffsl commented Jan 26, 2016

3XX0 commented Jan 26, 2016

flx42 commented Jan 26, 2016

ruffsl commented Jan 26, 2016

flx42 commented Jan 26, 2016

flx42 commented Jan 27, 2016

3XX0 commented Jan 28, 2016

therc commented Apr 7, 2016

3XX0 commented Apr 7, 2016

matthieudelaro commented Apr 26, 2016

ruffsl commented May 5, 2016

3XX0 commented May 5, 2016

anibali commented May 6, 2016

MadcowD commented Jun 28, 2016

flx42 commented Jun 28, 2016

jmerkow commented Jul 29, 2016

3XX0 commented Jul 29, 2016 •

edited

Loading

jmerkow commented Jul 29, 2016 •

edited

Loading

jmerkow commented Sep 22, 2016 •

edited

Loading

flx42 commented Sep 22, 2016 •

edited

Loading

achimnol commented Nov 5, 2016

achimnol commented Nov 5, 2016

flx42 commented Nov 5, 2016

Using nvidia-docker from third-party tools #39

Using nvidia-docker from third-party tools #39

Comments

hannes-brt commented Jan 25, 2016

flx42 commented Jan 25, 2016

ruffsl commented Jan 26, 2016

3XX0 commented Jan 26, 2016

ruffsl commented Jan 26, 2016

3XX0 commented Jan 26, 2016

flx42 commented Jan 26, 2016

ruffsl commented Jan 26, 2016

flx42 commented Jan 26, 2016

flx42 commented Jan 27, 2016

3XX0 commented Jan 28, 2016

therc commented Apr 7, 2016

3XX0 commented Apr 7, 2016

matthieudelaro commented Apr 26, 2016

ruffsl commented May 5, 2016

3XX0 commented May 5, 2016

anibali commented May 6, 2016

MadcowD commented Jun 28, 2016

flx42 commented Jun 28, 2016

jmerkow commented Jul 29, 2016

3XX0 commented Jul 29, 2016 • edited Loading

jmerkow commented Jul 29, 2016 • edited Loading

jmerkow commented Sep 22, 2016 • edited Loading

flx42 commented Sep 22, 2016 • edited Loading

achimnol commented Nov 5, 2016

achimnol commented Nov 5, 2016

flx42 commented Nov 5, 2016

3XX0 commented Jul 29, 2016 •

edited

Loading

jmerkow commented Jul 29, 2016 •

edited

Loading

jmerkow commented Sep 22, 2016 •

edited

Loading

flx42 commented Sep 22, 2016 •

edited

Loading