Skip to content
This repository has been archived by the owner on Jan 22, 2024. It is now read-only.

Caffe models don't work on the nvidia-docker image of DIGITs #556

Closed
6 tasks
apolo74 opened this issue Dec 4, 2017 · 2 comments
Closed
6 tasks

Caffe models don't work on the nvidia-docker image of DIGITs #556

apolo74 opened this issue Dec 4, 2017 · 2 comments

Comments

@apolo74
Copy link

apolo74 commented Dec 4, 2017

1. Issue or feature description

Hello, I'm reporting this issue after recommendation of AastaLLL from a discussion started here. The error appears in the nvidia-docker image of DIGITs when running any Caffe model:
ERROR: Check failed: error == cudaSuccess (8 vs. 0) invalid device function
All other models, datasets or functions in general in DIGITs work smoothly, it's just when using Caffe models when this error appears so I'm not sure if this is a DIGITs issue or Caffe issue embedded in the docker image.

2. Steps to reproduce the issue

Download docker
Download nvidia-docker: follow the instructions here
Download DIGITs image: docker pull nvidia/digits
Create a DIGITs container: sudo nvidia-docker run --name digits -d -p 5000:5000 nvidia/digits
Start the container: sudo nvidia-docker start digits
Run a Caffe model in DIGITs

3. Information to attach (optional if deemed irrelevant)

  • Kernel version from uname -a
    Linux boris-UX303UB 4.10.0-38-generic #42~16.04.1-Ubuntu SMP Tue Oct 10 16:32:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

  • Driver information from nvidia-smi -a

==============NVSMI LOG==============

Timestamp                           : Mon Dec  4 08:55:59 2017
Driver Version                      : 384.90

Attached GPUs                       : 1
GPU 00000000:01:00.0
    Product Name                    : GeForce 940M
    Product Brand                   : GeForce
    Display Mode                    : Disabled
    Display Active                  : Disabled
    Persistence Mode                : Disabled
    Accounting Mode                 : Disabled
    Accounting Mode Buffer Size     : 1920
    Driver Model
        Current                     : N/A
        Pending                     : N/A
    Serial Number                   : N/A
    GPU UUID                        : GPU-34603efe-3839-d579-db59-7783f3f81c15
    Minor Number                    : 0
    VBIOS Version                   : 82.08.3B.00.4B
    MultiGPU Board                  : No

  • Docker version from docker version
NVIDIA Docker: 2.0.0
Client:
 Version:      17.09.0-ce
 API version:  1.32
 Go version:   go1.8.3
 Git commit:   afdb6d4
 Built:        Tue Sep 26 22:42:18 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.09.0-ce
 API version:  1.32 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   afdb6d4
 Built:        Tue Sep 26 22:40:56 2017
 OS/Arch:      linux/amd64
 Experimental: false
  • NVIDIA packages version from dpkg -l '*nvidia*' or rpm -qa '*nvidia*'
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                       Version            Architecture       Description
+++-==========================-==================-==================-=========================================================
ii  libnvidia-container-tools  1.0.0~alpha.2-1    amd64              NVIDIA container runtime library (command-line tools)
ii  libnvidia-container1:amd64 1.0.0~alpha.2-1    amd64              NVIDIA container runtime library
ii  nvidia-375                 384.90-0ubuntu0.16 amd64              Transitional package for nvidia-384
ii  nvidia-384                 384.90-0ubuntu0.16 amd64              NVIDIA binary driver - version 384.90
un  nvidia-common              <none>             <none>             (no description available)
ii  nvidia-container-runtime   1.1.0+docker17.09. amd64              NVIDIA container runtime
un  nvidia-docker              <none>             <none>             (no description available)
ii  nvidia-docker2             2.0.1+docker17.09. all                nvidia-docker CLI wrapper
un  nvidia-driver-binary       <none>             <none>             (no description available)
un  nvidia-legacy-340xx-vdpau- <none>             <none>             (no description available)
un  nvidia-libopencl1-384      <none>             <none>             (no description available)
un  nvidia-libopencl1-dev      <none>             <none>             (no description available)
un  nvidia-opencl-icd          <none>             <none>             (no description available)
rc  nvidia-opencl-icd-375      384.90-0ubuntu0.16 amd64              Transitional package for nvidia-opencl-icd-384
ii  nvidia-opencl-icd-384      384.90-0ubuntu0.16 amd64              NVIDIA OpenCL ICD
un  nvidia-persistenced        <none>             <none>             (no description available)
ii  nvidia-prime               0.8.2              amd64              Tools to enable NVIDIA's Prime
ii  nvidia-settings            361.42-0ubuntu1    amd64              Tool for configuring the NVIDIA graphics driver
un  nvidia-settings-binary     <none>             <none>             (no description available)
un  nvidia-smi                 <none>             <none>             (no description available)
un  nvidia-vdpau-driver        <none>             <none>             (no description available)
  • NVIDIA container library version from nvidia-container-cli -V
version: 1.0.0
build date: 2017-10-30T23:47+00:00
build revision: ec15c7233bd2de821ad5127cb0de6b52d9d2083c
build compiler: gcc-5 5.4.0 20160609
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections
  • Docker command, image and tag used
:~$ sudo nvidia-docker image ls
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
nvidia/digits       latest              9fc89ed01ec0        13 days ago         2.81GB

@flx42
Copy link
Member

flx42 commented Dec 4, 2017

The DIGITS image on DockerHub is due for a refresh with a newer Caffe, cuDNN and CUDA.
But still, given your GPU (940M), it should work.
You should report the issue here: https://gitlab.com/nvidia/digits/

@flx42
Copy link
Member

flx42 commented Dec 4, 2017

You can try the NGC container images and check if it's different.
Closing this bug, please copy it to GitLab (for DockerHub).

@flx42 flx42 closed this as completed Dec 4, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants