add support for host devices #5607

proppy · 2015-03-18T20:32:30Z

It would be nice if the container api payload had support for exposing host devices to the container (like docker run --device does).

The kubelet could pass it go-dockerclient once they add support for it (fsouza/go-dockerclient#241), or create container with the docker remote api by passing an addition member in the /create HostConfig payload:

{
    "PathOnHost": "/dev/deviceName",
    "PathInContainer": "/dev/deviceName",
    "CgroupPermissions": "mrw"
}

The text was updated successfully, but these errors were encountered:

proppy · 2015-03-19T00:01:24Z

So it's now fixed on the go-dockerclient side fsouza/go-dockerclient@e4fcc92

therc · 2016-02-10T21:36:18Z

We need to expose GPUs to the containers. I can write the PR and (given my previous experience with a types.go change) rebase it over and over. What are the odds of it getting accepted? The first thought that comes to mind is how to secure it, as in @erictune's linked issue.

therc · 2016-02-10T21:41:29Z

I could place it behind a kubelet or apiserver flag, which is off by default.

osterman · 2016-06-24T18:11:33Z

I need this feature so we can run s3fs inside of k8s. Will have to use fleet for now :(

praoreo · 2016-09-29T02:07:17Z

@proppy , @fsouza
Hi,

What is the syntax to mention device information in yaml/json file? I tried giving the below in .json file, but got "found invalid field device for v1.PodSpec" error. I am using 1.3.6 kubernetes version.

                                "device": {
                                        "PathOnHost": "/dev"
                                },
                                "nodeSelector": {

maci0 · 2016-10-17T15:29:45Z

I don't think it has been implemented yet. But it seems what @therc wants to do has.
https://github.com/kubernetes/kubernetes/blob/master/pkg/api/types.go has some nvidia stuff.

It's still missing support for other devices.
It used to work in docker with volume mounts, my guess it when they introduced --device they locked down the volume mounts using device cgroups
https://www.kernel.org/doc/Documentation/cgroup-v1/devices.txt

farmdawgnation · 2016-11-01T15:14:52Z

Hey what's the status here?

We need this as well. I haven't contributed to Kubernetes before, but it looks like a lightweight way to provide some (initial) support for this is to support a container annotation for it. That looks like it would get piped into the PodSandboxContext which I could in turn use to pass the requisite arguments into the host config for the Docker container creation.

drekle · 2016-11-02T21:18:42Z

I also believe that I need this. I will be trying to deploy a pod to a specific node which uses a napatech card or other networking devices.

jbiel · 2016-11-03T16:25:53Z

FWIW the following was working for me to pass through a sound card device ~3 months ago. Privileged mode was the key and according to the docs it looks like it should still work.

      containers:
      - name: foo
        ...
        volumeMounts:
        - mountPath: /dev/snd
          name: dev-snd
        securityContext:
          privileged: true
      volumes:
      - name: dev-snd
        hostPath:
          path: /dev/snd

farmdawgnation · 2016-11-03T19:23:19Z

Yeah, that'll work with privileged mode. The rub is we run code in a multi-tenant environment so that's a non-starter for our security requirements. Mounting devices using --device is safer.

thockin · 2016-11-03T21:17:42Z

The status of this is that we have not had a proposal for an API to capture this. The issue is that the API needs to be plausible across multiple runtimes.

farmdawgnation · 2016-11-07T15:28:47Z

@thockin Got it. I'm not super familiar with the Kubernetes proposal process, but I'm willing to suggest some things.

Does it need to be plausable across multiple runtimes or implementable across multiple runtimes? The latter would imply that if rkt doesn't support something, then we can't have any kind of support for it in Docker at all.

I know that there's already some pattern of using container annotations for things that are vendor specific. Is that an option here?

maci0 · 2016-12-09T14:01:44Z

+1

tcf909 · 2017-01-22T05:55:33Z

+1

Currently hostDevice option requires full privileged security rather than cap_adds -- very uneasy about this vs segmented cap permissions.

thockin · 2017-01-23T01:30:00Z

Sorry, I never replied! When I say "plausible" I mean that at least the major runtimes maintainers have looked at the API proposal and agreed that the semantic is doable, even if they don't have the requisite support yet. Annotations could work, but I worry that it becomes a de facto API, so I'd rather think about that up front. The issue for me is coupling - as soon as you allow users to specify details like this, they are forced to understand the host machines in great detail. Which machines have the device? What is the /dev name of the device? How do I know that nobody else is using the device at the same time? These are the questions that need answering, or rather - what can we do automatically to prevent users from having to know the answers to these? In a lot of cases, the upcoming opaque integer resources are better in pretty much every way. If you say "this machine has 1 instance of ' mycompany.com/soundcard'" and provide the metadata for it, we can schedule to that and map the device and provide mutual exclusion all automatically.

…

On Mon, Nov 7, 2016 at 7:29 AM, Matt Farmer ***@***.***> wrote: @thockin <https://github.com/thockin> Got it. I'm not super familiar with the Kubernetes proposal process, but I'm willing to suggest some things. Does it need to be *plausable* across multiple runtimes or *implementable* across multiple runtimes? The latter would imply that if rkt doesn't support something, then we can't have any kind of support for it in Docker at all. I know that there's already some pattern of using container annotations for things that are vendor specific. Is that an option here? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#5607 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFVgVG0hQrqKqaThYUYpFE3URzuVji5qks5q70PZgaJpZM4Dw6n_> .

gavrie · 2017-01-23T11:21:21Z

@thockin: It does sound interesting to use opaque integer resources for this. Is there a way to add metadata to such a resource? I couldn't find one in the documentation.

Your concern about growing a de facto API with annotations is understandable. On the other hand, it might be useful to provide a way to access devices and see how people use it in the real world before designing an API that then captures those real world requirements in a clean way.

Specifically, I'm interested in allocating node-local block devices to containers (or pods). If a node has a certain amount of local SSDs, I want to be able to use such an SSD directly from a pod. Metadata would include the capacity of the SSD, its device node, and maybe other fields such as device type.

The resource model design proposal mentions this, but it seems to be way down the road.

Would there be a simple way to allow usage of local devices on the short term and thereby gather real world requirements, without requiring the use of privileged containers?

thockin · 2017-01-23T18:33:01Z

@ConnorDoyle for opaque resources

@msau42 for local-storage stuff

thockin · 2017-01-23T18:34:23Z

@dchen1107 @yujuhong I am anxious about an annotation for this, but it certainly has come up and we don't have a "real" answer yet.

msau42 · 2017-01-23T20:46:22Z

Local storage will be a long-term project. For the short term, the only ways now to utilize local SSDs are hostpath volumes or a distributed fs like glusterfs.

maci0 · 2017-01-24T08:09:06Z

I also want this so, I can mount /dev/kvm into an unprivileged container.
Currently kubernetes has an alpha api to mount nvidia video cards into the container, does this work across all runtimes as well, if not why does --device need to be any different ?

yujuhong · 2017-01-24T17:32:41Z

@thockin, there are two APIs involved in this issue: the kubernetes api and the api between kubelet and the container runtime (a.k.a. CRI). For the former I think supporting opaque resources makes sense. As for the latter, the CRI already includes devices in its API in order to support the GPU devices in Alpha. The change was introduced in #35597.

maci0 · 2017-01-24T19:23:28Z

Exactly my point. Most of the code should be there, the kubernetes api just doesn't expose that functionality yet.

thockin · 2017-01-24T21:23:03Z

Supporting named devices in the CRI layer is perhaps appropriate, but supporting it at the cluster API layer isn't good.

…

On Tue, Jan 24, 2017 at 11:23 AM, Marcel Wysocki ***@***.***> wrote: Exactly my point. Most of the code should be there, the kubernetes api just doesn't expose that functionality yet. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#5607 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFVgVElBXIdI7bNe0L4DhMy2OdWAL6fiks5rVk_DgaJpZM4Dw6n_> .

ConnorDoyle · 2017-01-25T17:23:39Z

Coming into this late, just adding some notes about opaque resources. Opaque Integer Resources (OIRs) are alpha as of v1.5. The missing feature to support this use case is how to extend node-level isolation to an opaque resource. There are discussions happening this month about how to accomplish that. This is happening in sig-node and the resource management workgroup. At last mention, @derekwaynecarr is working on a proposal for isolation extensions. At the same time, @vishh @dchen1107 and @thockin have asked for a proposal to explore some kind of lifecycle hook to let operators execute extra steps during pod/container setup and teardown.

Agree with @thockin on not putting device names into the pod spec. Unless Kubernetes will include specializations for all sorts of devices, users would need some external way (an API) to do the matchmaking from resource type to concrete device name on the host.

vishh · 2017-01-25T23:39:49Z

I feel https://github.com/kubernetes/community/blob/master/contributors/design-proposals/volume-hostpath-qualifiers.md can be extended to have kubelet auto whitelist hostpath devices for the respective containers.

guanyuding · 2018-08-21T12:56:18Z

+1 need for /dev/fuse

micw · 2018-11-05T09:17:07Z

@t3hmrman I found https://github.com/kubevirt/kubernetes-device-plugins/blob/master/docs/README.kvm.md which may solve your particular use case.

micw · 2018-11-05T09:19:50Z

It should be possible to create a generic device plug-in which gets a device node as argument as well as a number of allowed instances.

t3hmrman · 2018-11-05T13:59:18Z

Hi @micw Thanks for the suggestion! That definitely looks like it would solve my problem (and others'), and it explains how kubevirt can get the functionality they provide.

Since posting here I've started using untrusted workload runtimes w/ containerd in combination with the kata-containers project to run pods in VMs, and in the future intend to use the runtime class proposal to solve this instead. These days kata-containers has a super easy to use installer as well, and I can only imagine that it will get better/easier as the runtime class proposal moves towards GA.

As far as running QEMU inside an actual pod, it seems like kubevirt or runtimeClass-annotation enabled controllers are the better way to go for now. That generic device plug-in does sound good though -- would likely solve all the other cases mentioned

OJFord · 2018-11-27T19:38:42Z

@thockin, you wrote:

Interestingly, to use /dev/fuse you have to be running with privileges anyway (right?) so you can literally hostPath mount /dev/fuse today. Not a great answer, but it seems to work.

and @dixudx said similar, but this is not true in Docker - 'just' cap_add: - SYS_ADMIN is enough.

dinathom · 2018-12-20T23:44:24Z

Is it true then that the only way in k8s to access host block device is to use a privileged container? So the volumeMode=Block would not mean anything (to read/write into the device) unless this is running in a privilege container?

kfox1111 · 2018-12-20T23:56:20Z

your application could read the block device directly too. stuff like, kvm would do that. or some databases. No privilege is needed in that case.

dinathom · 2018-12-21T00:51:59Z

@kfox1111 that seems different from what I read here: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#privileged
how would the application have the ability to read the device unless the container allows it via some additional capabilities?

The reason kvm works ( did not test this but from the documentation ) is because of device plugins that whitelist /dev/kvm.

kfox1111 · 2018-12-21T01:03:32Z

I don't believe you need any special privilege in linux to read from a block device. only unix permissions to the block device. You do need special privilege to mount a block device. If the storage driver plumbs through the device and gives it the right permissions, I think it works.

I believe /dev/kvm is an entirely different thing as it isn't a blockdev.

kfox1111 · 2018-12-21T01:10:48Z

Hmm... no. there seems to be a capability restricted in docker by default that normal users on the host dosn't have.

bluebeach · 2019-08-10T17:53:08Z

+1
I need access /dev/mem in my unprivileged container .
any help!!!

bluebeach · 2019-08-10T18:08:37Z

I Just find a plugin that can support add device /dev/mem without privileged !!!
https://github.com/honkiko/k8s-hostdev-plugin

shufanhao · 2019-10-31T06:04:59Z

I Just find a plugin that can support add device /dev/mem without privileged !!!
https://github.com/honkiko/k8s-hostdev-plugin

Actually, this solution also need run the daemonSet with securityContext: privileged: true

shufanhao · 2019-10-31T06:07:50Z

+1
also need access /dev/mem in unprivileged container and don't want to run any pod with securityContext: privileged: true

kfox1111 · 2019-10-31T15:54:44Z

What? According to the man page for /dev/mem (http://man7.org/linux/man-pages/man4/mem.4.html)

      "/dev/mem is a character device file that is an image of the main
       memory of the computer.  It may be used, for example, to examine (and
       even patch) the system."

If you can touch that file, you are privileged whether its flagged or not... That shoulnd't be handed over to unprivileged containers IMO.

xt94c4t9ce · 2020-09-21T20:54:53Z

I'm surprised this bug's closed because the original problem doesn't seem to be fixed.

On Kubernetes 1.18.8 with Docker 19.03.12, I'm not able to use a mapped host block device in a container without running the container in privileged mode.

The original problem here was that Docker's --device functionality wasn't available in Kubernetes, and that problem remains.

Or, is there a solution to this that I've missed? Thank you.

patrijua · 2020-09-22T12:02:36Z

I also find this surprising that there seems to not be a way to use host connected devices from containers without compromising security. We would need to access /dev/ttyUSB0 chardevice from container and we do not want to run anything as privileged. So if there's a solution, please share. Thanks!

YaShanBoy · 2020-12-07T14:28:10Z

It would be nice if the container api payload had support for exposing host devices to the container (like docker run --device does).

The kubelet could pass it go-dockerclient once they add support for it (fsouza/go-dockerclient#241), or create container with the docker remote api by passing an addition member in the /create HostConfig payload:
{
    "PathOnHost": "/dev/deviceName",
    "PathInContainer": "/dev/deviceName",
    "CgroupPermissions": "mrw"
}

How to integrate with k8s？？？？

pre · 2021-01-18T17:49:42Z

As of 01/2021 it doesn't seem to be possible to mount eg. /dev/fuse without privileged:true

The relevant issue seems to be #7890

pre · 2021-01-23T14:43:28Z

Mouting host devices without privileged: true is possible via the Kubelet device api using a Device Manager!

See details in #7890 (comment)

cjcullen added priority/backlog Higher priority than priority/awaiting-more-evidence. team/cluster labels Mar 18, 2015

erictune mentioned this issue Mar 31, 2015

Authorization and Capabilites #2502

Closed

bgrant0607 added the sig/node Categorizes an issue or PR as relevant to SIG Node. label May 17, 2016

ghost mentioned this issue Jun 17, 2016

Passing --device flag to docker run command (and its support in spec file) #17066

Closed

dustymabe mentioned this issue Oct 2, 2018

RFE: Add generic way to specify device nodes to attach to each pod cri-o/cri-o#1821

Closed

opskumu mentioned this issue Oct 30, 2018

学习周报「2018」 opskumu/issues#19

Closed

dgerd mentioned this issue Mar 7, 2019

Add conformance tests to validate OCI devices are disallowed knative/serving#2973

Closed

wuservices mentioned this issue Jul 3, 2019

Mount Bucket on the Kubernetes pod GoogleCloudPlatform/gcsfuse#328

Closed

johnmcollier mentioned this issue Jan 15, 2020

Unprivileged buildah bud in Kube: Overlay fails, but vfs does not containers/buildah#2084

Closed

kurtisvg mentioned this issue Sep 8, 2020

Running in -fuse mode doesn't work between containers/host GoogleCloudPlatform/cloud-sql-proxy#444

Closed

YangKeao mentioned this issue Oct 27, 2020

chaos daemon run with privileged = false chaos-mesh/chaos-mesh#1101

Closed

YaShanBoy mentioned this issue Dec 7, 2020

how to config devices of docker container by k8s? #97079

Closed

rolfw mentioned this issue Feb 23, 2021

[question] Run privileged necessary? OpenZWave/Zwave2Mqtt#913

Open

tyzbit mentioned this issue Mar 11, 2021

Add securityContext specification kfirfer/helm#1

Merged

add support for host devices #5607

add support for host devices #5607

Comments

proppy commented Mar 18, 2015

proppy commented Mar 19, 2015

therc commented Feb 10, 2016

therc commented Feb 10, 2016

osterman commented Jun 24, 2016

praoreo commented Sep 29, 2016 • edited Loading

maci0 commented Oct 17, 2016

farmdawgnation commented Nov 1, 2016

drekle commented Nov 2, 2016 • edited Loading

jbiel commented Nov 3, 2016

farmdawgnation commented Nov 3, 2016

thockin commented Nov 3, 2016

farmdawgnation commented Nov 7, 2016

maci0 commented Dec 9, 2016

tcf909 commented Jan 22, 2017

thockin commented Jan 23, 2017 via email

gavrie commented Jan 23, 2017

thockin commented Jan 23, 2017

thockin commented Jan 23, 2017

msau42 commented Jan 23, 2017 • edited Loading

maci0 commented Jan 24, 2017

yujuhong commented Jan 24, 2017

maci0 commented Jan 24, 2017

thockin commented Jan 24, 2017 via email

ConnorDoyle commented Jan 25, 2017

vishh commented Jan 25, 2017

guanyuding commented Aug 21, 2018

micw commented Nov 5, 2018

micw commented Nov 5, 2018

t3hmrman commented Nov 5, 2018 • edited Loading

OJFord commented Nov 27, 2018

dinathom commented Dec 20, 2018

kfox1111 commented Dec 20, 2018

dinathom commented Dec 21, 2018

kfox1111 commented Dec 21, 2018

kfox1111 commented Dec 21, 2018

bluebeach commented Aug 10, 2019

bluebeach commented Aug 10, 2019

shufanhao commented Oct 31, 2019

shufanhao commented Oct 31, 2019

kfox1111 commented Oct 31, 2019

xt94c4t9ce commented Sep 21, 2020

patrijua commented Sep 22, 2020

YaShanBoy commented Dec 7, 2020

pre commented Jan 18, 2021

pre commented Jan 23, 2021

praoreo commented Sep 29, 2016 •

edited

Loading

drekle commented Nov 2, 2016 •

edited

Loading

msau42 commented Jan 23, 2017 •

edited

Loading

t3hmrman commented Nov 5, 2018 •

edited

Loading