Skip to content
This repository has been archived by the owner on Oct 16, 2020. It is now read-only.

Custom kernel module #589

Closed
crawford opened this issue Sep 23, 2015 · 38 comments · Fixed by coreos/docs#915
Closed

Custom kernel module #589

crawford opened this issue Sep 23, 2015 · 38 comments · Fixed by coreos/docs#915

Comments

@crawford
Copy link
Contributor

Issue by trickkiste
Thursday Oct 16, 2014 at 13:29 GMT
Originally opened as https://github.com/coreos/coreos-overlay/issues/924


I would need a facility to add custom kernel modules to CoreOS.

I need to access an SDI-Card (professional video interface standard) from my CoreOS machines. I am running a TV playout software in a docker container, but I need the kernel module to output the video data.

CoreOS is perfect for my infrastructure needs, except for the missing facility to compile/load the needed kernel module for my Blackmagic SDI card.

I can imagine, that there is a general need for such a facility. Imagine people wanting to run a CUDA cluster on CoreOS. They would need to compile and install Nvidia modules for this as well.

@crawford
Copy link
Contributor Author

Comment by rrichardson
Wednesday Nov 05, 2014 at 00:01 GMT


+1
I am trying to build/include Solarflare Net Driver and OpenOnload support.

After attempting several avenues. I think I can nail this down to: If the dev image that is created as a result of the SDK guide had the ability to compile the linux kernel. I would be able to build my drivers.

Currently, my attempts at this failed because of a lack of bc and perl (and that's as far as I got) inside the dev VM. Perhaps if these (and whatever else is needed) were added, we could use the VM as a spot to at least build .ko's that we could insmod via the OEM interfaces.

Thoughts?

@crawford
Copy link
Contributor Author

Comment by marineam
Wednesday Nov 05, 2014 at 00:59 GMT


I should add bc and perl to the dev image and container by default but for now you can install them and other things with the following sequence: (-gk enables binary packages)

emerge-gitclone
emerge -gk bc perl

Thereafter a usual emerge --sync updates against the current master branch of coreos-overlay and portage-stable. The special emerge-gitclone command is required because emerge --sync only supports calling git pull to update an existing checkout.

@crawford
Copy link
Contributor Author

Comment by hookenz
Thursday Nov 06, 2014 at 04:00 GMT


I need this too. The NVIDIA driver to run cuda apps. A walk through example of adding that and building the developer image would be wonderful. I'm still trying to figure it out.

@crawford
Copy link
Contributor Author

Comment by cnelson
Saturday Feb 28, 2015 at 13:15 GMT


A good blog post on compiling kernel modules (focused on out of tree, but the initial steps work for in-tree as well):

http://tleyden.github.io/blog/2014/11/04/coreos-with-nvidia-cuda-gpu-drivers/

@crawford
Copy link
Contributor Author

Comment by h0tbird
Friday May 08, 2015 at 07:01 GMT


+1 I also need this because I run PPPoE inside containers and I need the kernel module.

@crawford
Copy link
Contributor Author

Comment by hookenz
Monday Jun 15, 2015 at 00:21 GMT


@cnelson - I've been using that approach, unfortunately it doesn't always work.

@crawford
Copy link
Contributor Author

Comment by majidaldo
Monday Aug 17, 2015 at 15:51 GMT


any update on this? coreos now is not using the linux repo. coreos kernel version is now >4

@crawford
Copy link
Contributor Author

Comment by majidaldo
Tuesday Aug 18, 2015 at 16:05 GMT


ok found a hint here http://coreos-dev.narkive.com/n9Yz1JzE/coreos-linux-kernel-tree-v3-19-0

@crawford
Copy link
Contributor Author

Comment by hookenz
Tuesday Aug 18, 2015 at 21:33 GMT


@majidaldo - that does work and I've been using it for a while. The hard part is getting the right compiler version. It has to match exactly when you compile modules.

@crawford
Copy link
Contributor Author

Comment by majidaldo
Wednesday Aug 19, 2015 at 01:57 GMT


@hookenz what are you referring to? how could it work since the old coreos linux repo does not go up to the current version 4+?

i'm experimenting with the linux unmodified linux kernel repo based on the tleyden post..but no luck. do you have more particular steps?

@crawford
Copy link
Contributor Author

Comment by hookenz
Wednesday Aug 19, 2015 at 02:45 GMT


@majidaldo - It works from the official linux repo. Have a look at my Dockerfile here:
https://github.com/hookenz/coreos-nvidia

This worked on CoreOS 681.2.0. It's really messy and difficult to get just the right build of compiler. I'm fully expecting to get stuck in that regard soon.

@crawford
Copy link
Contributor Author

Comment by majidaldo
Wednesday Aug 19, 2015 at 02:57 GMT


@hookenz Thanks. I'll try that on the latest alpha.

but why is everyone downloading the whole linux repo?? i'd just get the linux-x.x.x.tar.xz.

@crawford
Copy link
Contributor Author

Comment by hookenz
Wednesday Aug 19, 2015 at 04:38 GMT


@majidaldo - Can you tell me how I can do this? > why is everyone downloading the whole linux repo?? i'd just get the linux-x.x.x.tar.xz.

It used to be possible but changes to github server in the last few years and I can't find that feature anymore. It annoyed me that I had to clone 1GB + repository when a 30+MB compressed archive is all I really needed.

@crawford
Copy link
Contributor Author

Comment by majidaldo
Wednesday Aug 19, 2015 at 13:49 GMT


@hookenz ok i have something "working" but isn't there supposed to be a nvidia-uvm kernel module and/or device? is that a cuda-specific thing?

(sorry going off topic a bit here.)

@crawford
Copy link
Contributor Author

Comment by majidaldo
Wednesday Aug 19, 2015 at 17:22 GMT


apparently a cuda thing.

@crawford
Copy link
Contributor Author

Comment by hookenz
Wednesday Aug 19, 2015 at 21:10 GMT


@majidaldo - Ahh, you're using CUDA too. You have to create the device files manually at boot. The driver doesn't create those.

You should be able to put something together based on this:
https://gist.github.com/achimnol/3404967

@crawford
Copy link
Contributor Author

Comment by majidaldo
Wednesday Aug 19, 2015 at 21:15 GMT


@hookenz i believe that doesn't address the nvidia_uvm kernel module. i know about making device nodes.

@crawford
Copy link
Contributor Author

Comment by hookenz
Thursday Aug 20, 2015 at 01:25 GMT


@majidaldo - Sorry I misread that. I don't know what you're doing different than I am, but that build script I linked to creates an nvidia_uvm module. As the saying goes these days - "works on my machine" :) sorry I can't really further. Maybe the guys from CoreOS will give us a better overview of the SDK and better still, use the NVidia driver as an example! I tried reading the ChromeOS docs about it and didn't get very far at all.

core@node2-1 ~ $ lsmod | grep nvidia
nvidia_uvm             77824  0 
nvidia               8560640  1 nvidia_uvm
i2c_core               45056  3 i2c_i801,ipmi_ssif,nvidia

@crawford
Copy link
Contributor Author

Comment by majidaldo
Thursday Aug 20, 2015 at 01:52 GMT


@hookenz thx

i can get nvidia_uvm to load if i do a deviceQuery from cuda samples. BUT i can't get the driver version from CUDA to install. i CAN install the nvidia kernel module in a way similar to your Dockerfile.

i thought docker would alleviate 'works on my machine' problems :(
if i could only get the driver to install from CUDA, that would make things easier as i don't want to build my own cuda images as other docker images supposedly only work when the specific nvidia driver corresponding to the cuda version is installed.

@crawford
Copy link
Contributor Author

Comment by majidaldo
Friday Aug 21, 2015 at 14:25 GMT


@hookenz

success! see my fork . although it has nvidia's stuff the non-nvidia steps should be the same for any kernel module. https://github.com/majidaldo/coreos-nvidia/blob/master/Dockerfile

i gave up on trying to figure out why the driver didn't install from cuda. but i think it might hve to do with the new 4.x linux kernel

@crawford
Copy link
Contributor Author

Comment by hookenz
Saturday Aug 22, 2015 at 06:16 GMT


@majidaldo - Cool. I found earlier versions just needed the kernel driver. Unfortunately, the new kernel module gets somehow tied up with the driver versions so CUDA and the kernel drivers get tied to the base... it's frustrating. My hope is to have a container that will run on most base systems with the kernel driver baked in. I'm netbooting and managed to bake the driver into the image after unpacking it all and copying it in. It's a messy process.

@crawford
Copy link
Contributor Author

Comment by majidaldo
Sunday Aug 23, 2015 at 00:30 GMT


@hookenz geek cred to you! i just need to get my work done!! i'm not even trying earlier driver versions. the latest works and i'm calling it a day!

and people wonder why linux doesn't have desktop penetration :/

@philips
Copy link

philips commented Feb 11, 2016

@crawford @marineam @vcaputo Has anyone started working on documentation for how to use the containers to do this?

@vcaputo
Copy link

vcaputo commented Feb 11, 2016

it's on my plate, but I haven't made progress on it yet.

@therc
Copy link

therc commented Feb 11, 2016

The latest I found is https://gist.github.com/marineam/9914debc25c8d7dc458f

My end goal is to build NVIDIA drivers automatically as soon as a new CoreOS release (or kernel, really) is pushed, then store them somewhere in the cluster. I don't want to build things inside a bloated Ubuntu container and, ideally, I'd start from just the toolchain linked above and a tarball with the sources/headers. I don't want to build the whole OS, either. :-)

I can contribute PRs and testing, especially if you can provide guidance on where you see things headed, namely what the builder looks like and what your recommended storage/delivery mechanism would be. I definitely wouldn't store drivers in etcd and would rather use something P2P-like. Maybe packaging them as containers, allowing docker pull from a local registry, followed by docker save, is a simple starting point.

@therc
Copy link

therc commented Apr 29, 2016

I'm getting weird failures at the end of emerge-gitclone:

+ sudo systemd-nspawn -i coreos_developer_container.bin --share-system --bind=/home/core/gpu/build.sh:/build.sh --bind=/home/core/gpu/pkg/run_files:/nvidia_installers /bin/sh -x /build.sh 352.39
Spawning container coreos_developer_container.bin on /home/core/gpu/coreos_developer_container.bin.
Press ^] three times within 1s to kill container.
+ VERSION=352.39
+ echo Building 352.39
Building 352.39
+ emerge-gitclone
!!! Section 'portage-stable' in repos.conf has location attribute set to nonexistent directory: '/var/lib/portage/portage-stable'
!!! Section 'x-portage-stable' in repos.conf has location attribute set to nonexistent directory: '/var/lib/portage/portage-stable'
!!! main-repo not set in DEFAULT and PORTDIR is empty.
Unavailable repository 'portage-stable' referenced by masters entry in '/var/lib/portage/coreos-overlay/metadata/layout.conf'
!!! Unable to parse profile: '/etc/portage/make.profile'
!!! ParseError: Parent 'portage-stable:hardened/linux/amd64/no-multilib' not found: '/var/lib/portage/coreos-overlay/profiles/coreos/amd64/parent'
Undefined license group 'EULA'
>>> Cloning repository 'coreos' from 'https://github.com/coreos/coreos-overlay.git'...
>>> Starting git clone in /var/lib/portage/coreos-overlay
Cloning into '/var/lib/portage/coreos-overlay'...
remote: Counting objects: 27133, done.
remote: Compressing objects: 100% (17/17), done.
remote: Total 27133 (delta 3), reused 0 (delta 0), pack-reused 27116
Receiving objects: 100% (27133/27133), 9.59 MiB | 5.41 MiB/s, done.
Resolving deltas: 100% (12804/12804), done.
Checking connectivity... done.
>>> Git clone in /var/lib/portage/coreos-overlay successful
Container coreos_developer_container.bin terminated by signal KILL.

Any ideas?

@crawford
Copy link
Contributor Author

That sounds like it might be related to #1216.

@therc
Copy link

therc commented Apr 29, 2016

Maybe related, maybe not. I narrowed my case down to emerge.

core-01 / # emerge --check_news --quiet
Container coreos_developer_container.bin terminated by signal KILL.
[...]
core-01 / # emerge
Container coreos_developer_container.bin terminated by signal KILL.

@therc
Copy link

therc commented Apr 29, 2016

Running strace emerge repeatedly, it seems to crash at different points while loading Python modules. Arrrgh...

@therc
Copy link

therc commented Apr 29, 2016

I even gave the CoreOS VM 3GB of RAM, still the same thing. From running top inside the nspawn container, it does see all the memory, but it's still getting killed — and I'm assuming it's some kind of OOM.

@therc
Copy link

therc commented May 31, 2016

I am currently doing things the manual way, mounting the image manually, etc. I had found that the KILL came from systemd-nspawn itself, in its main loop (because it even emitted a newline to stdout if that wasn't the last character input by the user).

Anyway, it's still spewing a lot of emerge-related errors, although it eventually compiles a driver:

+ emerge-gitclone
!!! Invalid PORTDIR_OVERLAY (not a dir): '/var/lib/portage/coreos-overlay'
!!! Section 'portage-stable' in repos.conf has location attribute set to nonexistent directory: '/var/lib/portage/portage-stable'
!!! Section 'coreos' in repos.conf has location attribute set to nonexistent directory: '/var/lib/portage/coreos-overlay'
!!! Section 'x-portage-stable' in repos.conf has location attribute set to nonexistent directory: '/var/lib/portage/portage-stable'
!!! main-repo not set in DEFAULT and PORTDIR is empty.
>>> No git repositories configured.
+ emerge -gKav coreos-sources
!!! Invalid PORTDIR_OVERLAY (not a dir): '/var/lib/portage/coreos-overlay'
!!! Section 'portage-stable' in repos.conf has location attribute set to nonexistent directory: '/var/lib/portage/portage-stable'
!!! Section 'coreos' in repos.conf has location attribute set to nonexistent directory: '/var/lib/portage/coreos-overlay'
!!! Section 'x-portage-stable' in repos.conf has location attribute set to nonexistent directory: '/var/lib/portage/portage-stable'
!!! main-repo not set in DEFAULT and PORTDIR is empty.


!!! /etc/portage/make.profile is not a symlink and will probably prevent most merges.
!!! It should point into a profile within /var/lib/portage/portage-stable/profiles/
!!! (You can safely ignore this message when syncing. It's harmless.)


!!! Your current profile is invalid. If you have just changed your profile
!!! configuration, you should revert back to the previous configuration.
!!! Allowed actions are limited to --help, --info, --search, --sync, and
!!! --version.

@therc
Copy link

therc commented Jun 28, 2016

This is even more broken now. emerge-gitclone is gone in 1081 (and possibly earlier releases). There's an emerge-webrsync, but it doesn't seem to be a replacement for the former.

@therc
Copy link

therc commented Jul 15, 2016

I take back all that I said. Things have been working quite reliably with at least 1082 and 1097. I suspect part of my issues were caused by reusing developer containers across multiple attempts. I can now build NVIDIA modules with a couple of scripts (which we plan to release soon and would probably benefit from your feedback). There is life at the end of the tunnel.

@therc
Copy link

therc commented Jul 16, 2016

And things worked fine with 1109. What's the easiest way to keep up to date with releases? Just scrape the HTML from https://{alpha,beta,stable}.release.core-os.net/amd64-usr/ and pick up new directories since last check?

@crawford
Copy link
Contributor Author

@therc https://{alpha,beta,stable}.release.core-os.net/amd64-usr/current/version.txt will contain the latest version. You can then use this to grab https://{alpha,beta,stable}.release.core-os.net/amd64-usr/{version}/{file}.

@dm0-
Copy link

dm0- commented Nov 9, 2016

I've created a documentation PR at coreos/docs#915. I also uploaded some notes and scripts on automating building custom modules when new kernels are detected: https://gist.github.com/dm0-/0db058ba3a85d55aca99c93a642b8a20

@alban
Copy link

alban commented Nov 28, 2016

That would be nice if there was an official container image for each official version of CoreOS Linux that would include the kernel headers of the correct version and some compilers (I need gcc, llvm and clang). That would allow to compile kernel modules for a specific CoreOS Linux version using a free CI service such as CircleCI or SemaphoreCI, without having to deploy a CoreOS Linux VM in the CI pipeline (which is usually not possible in free CI services).

I would use that for building eBPF/kprobes programs that depend on a specific kernel headers version. My workflow would be something like this script in CircleCI:

curl https://coreos.com/releases/releases.json | parse_versions.sh > versions
for VERSION in $(cat versions) ; do
  docker pull quay.io/coreos/coreos-kernel-headers:${VERSION}
  docker run -v $PWD:/src \
        quay.io/coreos/coreos-kernel-headers:${VERSION} \
            gcc -o /src/foo-${VERSION} /src/foo.c
done

Then it would generate one compiled object for each CoreOS Linux version and I could publish all of them.

/cc @iaguis @alepuccetti @2opremio

@alban
Copy link

alban commented Nov 30, 2016

@crawford: I filed #1683 to capture my previous comment.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants