Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can build CUDA kernel module but can NOT load it. #15

Closed
wangkuiyi opened this issue May 4, 2016 · 1 comment
Closed

Can build CUDA kernel module but can NOT load it. #15

wangkuiyi opened this issue May 4, 2016 · 1 comment

Comments

@wangkuiyi
Copy link

wangkuiyi commented May 4, 2016

I am using AWS EC2 g2.8xlarge instances with 4.3.6 CoreOS stable channel:

core@ip-172-31-32-170 ~# uname -a
Linux ip-172-31-32-170.us-west-2.compute.internal 4.3.6-coreos #2 SMP Tue Apr 5 10:32:16 UTC 2016 x86_64 Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz GenuineIntel GNU/Linux

The compiler used to build the kernel is GCC 4.9.3,

core@ip-172-31-32-170 ~# cat /proc/version 
Linux version 4.3.6-coreos (buildbot@ip-10-204-3-57) (gcc version 4.9.3 (Gentoo Hardened 4.9.3 p1.3, pie-0.6.3) ) #2 SMP Tue Apr 5 10:32:16 UTC 2016

which is the same version as the one used in the Dockerfile https://github.com/emergingstack/es-dev-stack/blob/master/corenvidiadrivers/Dockerfile

root@bd330876a124:/# gcc --version
gcc (Ubuntu 4.9.3-8ubuntu2~14.04) 4.9.3

I checkout the same version of Linux kernel source code as of the CoreOS kernel:

git clone -b v4.3.6 --depth 1 git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux

I manually did every step in the Dockerfile in docker run -it ubuntu:14.04 /bin/bash. Everything works fine. But the last command ./NVIDIA-Linux-x86_64-352.39/nvidia-installer -q -a -n -s --kernel-source-path=/usr/src/kernels/linux/ failed.

For your reference, the tail of /var/log/nvidia-installer.log is as following:

-> done.
-> Kernel module compilation complete.
-> Unable to determine if Secure Boot is enabled: No such file or directory
ERROR: Unable to load the kernel module 'nvidia.ko'.  This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if a driver such as rivafb, nvidiafb, or nouveau is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA graphics device(s), or no NVIDIA GPU installed in this system is supported by this NVIDIA Linux graphics driver release.

Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more information.
-> Kernel module load error: Operation not permitted
-> Kernel messages:
[16796.891483] docker0: port 1(vethef6dba9) entered forwarding state
[16796.994375] docker0: port 1(vethef6dba9) entered disabled state
[16796.998753] eth0: renamed from vethf74bd80
[16797.011532] docker0: port 1(vethef6dba9) entered forwarding state
[16797.015840] docker0: port 1(vethef6dba9) entered forwarding state
[16812.064059] docker0: port 1(vethef6dba9) entered forwarding state
[17191.805854] docker0: port 1(vethef6dba9) entered disabled state
[17191.805982] vethf74bd80: renamed from eth0
[17191.834880] docker0: port 1(vethef6dba9) entered forwarding state
[17191.839654] docker0: port 1(vethef6dba9) entered forwarding state
[17191.851665] docker0: port 1(vethef6dba9) entered disabled state
[17191.857097] device vethef6dba9 left promiscuous mode
[17191.857100] docker0: port 1(vethef6dba9) entered disabled state
[17321.036031] device veth0ad1bb0 entered promiscuous mode
[17321.036313] IPv6: ADDRCONF(NETDEV_UP): veth0ad1bb0: link is not ready
[17321.038931] IPv6: ADDRCONF(NETDEV_CHANGE): veth0ad1bb0: link becomes ready
[17321.038976] docker0: port 1(veth0ad1bb0) entered forwarding state
[17321.038982] docker0: port 1(veth0ad1bb0) entered forwarding state
[17322.119380] docker0: port 1(veth0ad1bb0) entered disabled state
[17322.123987] eth0: renamed from vethe4e5899
[17322.132413] docker0: port 1(veth0ad1bb0) entered forwarding state
[17322.136489] docker0: port 1(veth0ad1bb0) entered forwarding state
[17337.184133] docker0: port 1(veth0ad1bb0) entered forwarding state
[18408.303762] deprecated_sysctl_warning: 27 callbacks suppressed
[18408.308095] warning: process `nvidia-installe' used the deprecated sysctl system call with 1.23.
ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.
@mikeorzel2
Copy link
Contributor

Hi,

It looks like you are running the container in non-privileged mode, can you run your docker container again with --privileged on there. That should get you passed the operation not permitted issue you are having.

Let me know how you go. I will close this issue, re-open or open a new one for any other issues.

Thanks

Mike

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants