-
Notifications
You must be signed in to change notification settings - Fork 39.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support for host devices #5607
Comments
So it's now fixed on the go-dockerclient side fsouza/go-dockerclient@e4fcc92 |
We need to expose GPUs to the containers. I can write the PR and (given my previous experience with a types.go change) rebase it over and over. What are the odds of it getting accepted? The first thought that comes to mind is how to secure it, as in @erictune's linked issue. |
I could place it behind a kubelet or apiserver flag, which is off by default. |
I need this feature so we can run |
I don't think it has been implemented yet. But it seems what @therc wants to do has. It's still missing support for other devices. |
Hey what's the status here? We need this as well. I haven't contributed to Kubernetes before, but it looks like a lightweight way to provide some (initial) support for this is to support a container annotation for it. That looks like it would get piped into the PodSandboxContext which I could in turn use to pass the requisite arguments into the host config for the Docker container creation. |
I also believe that I need this. I will be trying to deploy a pod to a specific node which uses a napatech card or other networking devices. |
FWIW the following was working for me to pass through a sound card device ~3 months ago. Privileged mode was the key and according to the docs it looks like it should still work.
|
Yeah, that'll work with privileged mode. The rub is we run code in a multi-tenant environment so that's a non-starter for our security requirements. Mounting devices using |
The status of this is that we have not had a proposal for an API to capture this. The issue is that the API needs to be plausible across multiple runtimes. |
@thockin Got it. I'm not super familiar with the Kubernetes proposal process, but I'm willing to suggest some things. Does it need to be plausable across multiple runtimes or implementable across multiple runtimes? The latter would imply that if rkt doesn't support something, then we can't have any kind of support for it in Docker at all. I know that there's already some pattern of using container annotations for things that are vendor specific. Is that an option here? |
+1 |
+1 Currently hostDevice option requires full privileged security rather than cap_adds -- very uneasy about this vs segmented cap permissions. |
Sorry, I never replied!
When I say "plausible" I mean that at least the major runtimes maintainers
have looked at the API proposal and agreed that the semantic is doable,
even if they don't have the requisite support yet.
Annotations could work, but I worry that it becomes a de facto API, so I'd
rather think about that up front.
The issue for me is coupling - as soon as you allow users to specify
details like this, they are forced to understand the host machines in great
detail. Which machines have the device? What is the /dev name of the
device? How do I know that nobody else is using the device at the same
time? These are the questions that need answering, or rather - what can we
do automatically to prevent users from having to know the answers to these?
In a lot of cases, the upcoming opaque integer resources are better in
pretty much every way. If you say "this machine has 1 instance of '
mycompany.com/soundcard'" and provide the metadata for it, we can schedule
to that and map the device and provide mutual exclusion all automatically.
…On Mon, Nov 7, 2016 at 7:29 AM, Matt Farmer ***@***.***> wrote:
@thockin <https://github.com/thockin> Got it. I'm not super familiar with
the Kubernetes proposal process, but I'm willing to suggest some things.
Does it need to be *plausable* across multiple runtimes or *implementable*
across multiple runtimes? The latter would imply that if rkt doesn't
support something, then we can't have any kind of support for it in Docker
at all.
I know that there's already some pattern of using container annotations
for things that are vendor specific. Is that an option here?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5607 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFVgVG0hQrqKqaThYUYpFE3URzuVji5qks5q70PZgaJpZM4Dw6n_>
.
|
@thockin: It does sound interesting to use opaque integer resources for this. Is there a way to add metadata to such a resource? I couldn't find one in the documentation. Your concern about growing a de facto API with annotations is understandable. On the other hand, it might be useful to provide a way to access devices and see how people use it in the real world before designing an API that then captures those real world requirements in a clean way. Specifically, I'm interested in allocating node-local block devices to containers (or pods). If a node has a certain amount of local SSDs, I want to be able to use such an SSD directly from a pod. Metadata would include the capacity of the SSD, its device node, and maybe other fields such as device type. The resource model design proposal mentions this, but it seems to be way down the road. Would there be a simple way to allow usage of local devices on the short term and thereby gather real world requirements, without requiring the use of privileged containers? |
@ConnorDoyle for opaque resources @msau42 for local-storage stuff |
@dchen1107 @yujuhong I am anxious about an annotation for this, but it certainly has come up and we don't have a "real" answer yet. |
Local storage will be a long-term project. For the short term, the only ways now to utilize local SSDs are hostpath volumes or a distributed fs like glusterfs. |
I also want this so, I can mount /dev/kvm into an unprivileged container. |
@thockin, there are two APIs involved in this issue: the kubernetes api and the api between kubelet and the container runtime (a.k.a. CRI). For the former I think supporting opaque resources makes sense. As for the latter, the CRI already includes devices in its API in order to support the GPU devices in Alpha. The change was introduced in #35597. |
Exactly my point. Most of the code should be there, the kubernetes api just doesn't expose that functionality yet. |
Supporting named devices in the CRI layer is perhaps appropriate, but
supporting it at the cluster API layer isn't good.
…On Tue, Jan 24, 2017 at 11:23 AM, Marcel Wysocki ***@***.***> wrote:
Exactly my point. Most of the code should be there, the kubernetes api
just doesn't expose that functionality yet.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5607 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFVgVElBXIdI7bNe0L4DhMy2OdWAL6fiks5rVk_DgaJpZM4Dw6n_>
.
|
Coming into this late, just adding some notes about opaque resources. Opaque Integer Resources (OIRs) are alpha as of v1.5. The missing feature to support this use case is how to extend node-level isolation to an opaque resource. There are discussions happening this month about how to accomplish that. This is happening in sig-node and the resource management workgroup. At last mention, @derekwaynecarr is working on a proposal for isolation extensions. At the same time, @vishh @dchen1107 and @thockin have asked for a proposal to explore some kind of lifecycle hook to let operators execute extra steps during pod/container setup and teardown. Agree with @thockin on not putting device names into the pod spec. Unless Kubernetes will include specializations for all sorts of devices, users would need some external way (an API) to do the matchmaking from resource type to concrete device name on the host. |
I feel https://github.com/kubernetes/community/blob/master/contributors/design-proposals/volume-hostpath-qualifiers.md can be extended to have kubelet auto whitelist hostpath devices for the respective containers. |
+1 need for /dev/fuse |
@t3hmrman I found https://github.com/kubevirt/kubernetes-device-plugins/blob/master/docs/README.kvm.md which may solve your particular use case. |
It should be possible to create a generic device plug-in which gets a device node as argument as well as a number of allowed instances. |
Hi @micw Thanks for the suggestion! That definitely looks like it would solve my problem (and others'), and it explains how kubevirt can get the functionality they provide. Since posting here I've started using untrusted workload runtimes w/ As far as running QEMU inside an actual pod, it seems like |
@thockin, you wrote:
and @dixudx said similar, but this is not true in Docker - 'just' |
Is it true then that the only way in k8s to access host block device is to use a privileged container? So the volumeMode=Block would not mean anything (to read/write into the device) unless this is running in a privilege container? |
your application could read the block device directly too. stuff like, kvm would do that. or some databases. No privilege is needed in that case. |
@kfox1111 that seems different from what I read here: https://kubernetes.io/docs/concepts/policy/pod-security-policy/#privileged The reason kvm works ( did not test this but from the documentation ) is because of device plugins that whitelist /dev/kvm. |
I don't believe you need any special privilege in linux to read from a block device. only unix permissions to the block device. You do need special privilege to mount a block device. If the storage driver plumbs through the device and gives it the right permissions, I think it works. I believe /dev/kvm is an entirely different thing as it isn't a blockdev. |
Hmm... no. there seems to be a capability restricted in docker by default that normal users on the host dosn't have. |
+1 |
I Just find a plugin that can support add device /dev/mem without privileged !!! |
Actually, this solution also need run the daemonSet with |
+1 |
What? According to the man page for /dev/mem (http://man7.org/linux/man-pages/man4/mem.4.html)
If you can touch that file, you are privileged whether its flagged or not... That shoulnd't be handed over to unprivileged containers IMO. |
I'm surprised this bug's closed because the original problem doesn't seem to be fixed. On Kubernetes 1.18.8 with Docker 19.03.12, I'm not able to use a mapped host block device in a container without running the container in privileged mode. The original problem here was that Docker's --device functionality wasn't available in Kubernetes, and that problem remains. Or, is there a solution to this that I've missed? Thank you. |
I also find this surprising that there seems to not be a way to use host connected devices from containers without compromising security. We would need to access /dev/ttyUSB0 chardevice from container and we do not want to run anything as privileged. So if there's a solution, please share. Thanks! |
How to integrate with k8s???? |
As of 01/2021 it doesn't seem to be possible to mount eg. /dev/fuse without The relevant issue seems to be #7890 |
Mouting host devices without See details in #7890 (comment) |
It would be nice if the container api payload had support for exposing host devices to the container (like
docker run --device
does).The kubelet could pass it go-dockerclient once they add support for it (fsouza/go-dockerclient#241), or create container with the docker remote api by passing an addition member in the
/create
HostConfig
payload:The text was updated successfully, but these errors were encountered: