Skip to content
This repository has been archived by the owner on Mar 9, 2022. It is now read-only.

[WIP] adds option for setting default oci hooks #1248

Closed
wants to merge 1 commit into from
Closed

[WIP] adds option for setting default oci hooks #1248

wants to merge 1 commit into from

Conversation

mikebrow
Copy link
Member

To address issue containerd/containerd#6645 I've started a prototype. Actually this is a refactor'd prototype from #496 taking into account comments and updating to work with the current codebase. I closed the prior prototype PR because I had deleted that branch.

User must generate a json file for the hooks struct: https://github.com/opencontainers/runtime-spec/blob/master/specs-go/config.go#L114-L130

Hooks explained here: https://github.com/opencontainers/runtime-spec/blob/master/config.md#prestart

Signed-off-by: Mike Brown brownwm@us.ibm.com

@mikebrow
Copy link
Member Author

mikebrow commented Aug 27, 2019

Note: there is a PR churning in the OCI runtime spec that will deprecate certain container runtime hooks and add new ones... opencontainers/runtime-spec#1008

When that PR is merged the "hook" discussion should be complete at least at the OCI specification level.

It is my suggestion that we go with a pattern/model where containerd (and/or CRI) has the ability to specify default hooks for OCI compatible runtimes. Side note: cri-o includes the ability to set hooks already.

I also suggest that k8s would benefit from allowing pod/container specs to include hook management directives at the CRI API level for OCI images, and if they do we would likely let new CRI options manage/override the hooks used for PODs and/or containers. For example: these CRI hooks could override any default configs, to the point of allowing CRI to specify that NO hooks will be set, or adding additional hooks, or replacing hooks. I could see these hooks being used to enable sidecar patterns, monitoring, vetting containerd, .... Perhaps there would be interest in hooking to mounted k8s resources?

Copy link

@alban alban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work!

@@ -202,6 +202,10 @@ type PluginConfig struct {
// DisableProcMount disables Kubernetes ProcMount support. This MUST be set to `true`
// when using containerd with Kubernetes <=1.11.
DisableProcMount bool `toml:"disable_proc_mount" json:"disableProcMount"`
// DefaultOCIHooks (optional) is a path to a json file that specifies a
// default OCI spec Hooks struct. ** Note: The any hooks set by default can be
// overriden if/when the hooks are set via the CRI.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/The any/any/?

Can the CRI remove a default hook on poststop but not touch another default hook on prestart? I would prefer not, so that the hooks remain consistent and that any resources allocated in poststart are correctly released in poststop.

Copy link
Member Author

@mikebrow mikebrow Aug 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a map, just the lists... So I was thinking we would need some deterministic rules at the CRI level for whether they are resetting the hooks to a given spec or adding more hooks and don't care what the defaults are. Had they used a map in the OCI hooks it would've been easier. When I said "can" I should probably have said "MAY."

I was just putting a line in the sand ok to cross it :)

So that was what I was thinking... starts to become non-deterministic when mixing up two sets of hook providers. Or at least requires some deterministic rules.

Perhaps we could have overridable and non-overridable, or ***create our own map of oci-hook lists "default", "other", ... Hmm... let's mull on that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe make "container" and "runtime" reserved names in the map for container namespace and runtime namespace ...

return nil, nil
}
hooks := &runtimespec.Hooks{}
f, err := ioutil.ReadFile(profile)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could set a reasonable file size limit to avoid memory exhaustion, especially with user input.

I thought there was a previous Kubernetes security recommendation about this code pattern (ReadFile + Unmarshal) but I can't find it anymore.

Copy link
Member Author

@mikebrow mikebrow Sep 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const reasonable_filesize = initial_reasonable_filesize^(current_year - 1970) // must build me each year

@mikebrow
Copy link
Member Author

mikebrow commented Aug 28, 2019

State of k8s lifecycle hooks:
Uses poststart & prestop via exec commands model.
https://kubernetes.io/docs/tasks/configure-pod-container/attach-handler-lifecycle-event/

But, this is only for executing code in the container root via exec into the container (container namespace), and it's not very deterministic wrt when the exec happens (e.g. before or after the container's entry point is called on the pre-start). This can't be used for runtime namespace operations.

@Random-Liu Random-Liu self-assigned this Aug 29, 2019
@Random-Liu Random-Liu added this to the v1.4 milestone Aug 29, 2019
@bart0sh
Copy link
Contributor

bart0sh commented Sep 4, 2019

@mikebrow thank you for this PR!

Absence of oci hooks in containerd is a show stopper for our project. Without this PR it's impossible for us to use containerd as a CRI runtime.

I've tested the PR and it works just fine for our needs.

One small enhancement proposal: would it make sense to split hooks JSON into multiple files - one file per hook? This would make deploying hooks much easier.

I'll be happy to help with anything to get it merged.

@mikebrow
Copy link
Member Author

mikebrow commented Sep 4, 2019

@mikebrow thank you for this PR!

Absence of oci hooks in containerd is a show stopper for our project. Without this PR it's impossible for us to use containerd as a CRI runtime.

I've tested the PR and it works just fine for our needs.

One small enhancement proposal: would it make sense to split hooks JSON into multiple files - one file per hook? This would make deploying hooks much easier.

I'll be happy to help with anything to get it merged.

There will be a sig-node discussion on OCI hooks soon, it's in the future section of the agenda.

So you are/are you suggesting, the path be to a directory instead of a json file, where the directory is the the hook spec only splayed out onto/into a directory structure. Didn't the cri-o guys do something similar to that?

perhaps if the path is to a file.. parse it otherwise if to a directory it would be parsed as the root directory for a set of deployable pre-configured hook providers:

path_to_hooks/

Then under that key names for a map of various hook providers (I was thinking about adding a map as a prefix anyways):

path_to_hooks/k8s/
/* k8s would be reserved for kubernetes/kublet hooks, eg. the exec hooks might be replaceable with oci hooks? */

hooks go in here:

path_to_hooks/vendor_name/hooks/prestart /**deprecated**/
path_to_hooks/vendor_name/hooks/poststart
path_to_hooks/vendor_name/hooks/poststop

and with opencontainers/runtime-spec#1008:

path_to_hooks/vendor_name/hooks/prestart /**deprecated use mount and start**/
path_to_hooks/vendor_name/hooks/mount
path_to_hooks/vendor_name/hooks/start
path_to_hooks/vendor_name/hooks/poststart
path_to_hooks/vendor_name/hooks/poststop

@kad
Copy link

kad commented Sep 5, 2019

@mikebrow this sounds a bit over-complicated. Why not to have just directory with set of json files that are read in sorted alphabetically order, parsed to validate that it is a valid hook spec, and then just merged in memory to array that will be added to containers OCI specs?

CRI-O does similar discovery, with only one exception that they define in JSON schema filtering, that hook will be added to OCI spec only if certain criteria matches (e.g. some annotation present).

@mikebrow
Copy link
Member Author

mikebrow commented Sep 5, 2019

@mikebrow this sounds a bit over-complicated. Why not to have just directory with set of json files that are read in sorted alphabetically order, parsed to validate that it is a valid hook spec, and then just merged in memory to array that will be added to containers OCI specs?

CRI-O does similar discovery, with only one exception that they define in JSON schema filtering, that hook will be added to OCI spec only if certain criteria matches (e.g. some annotation present).

So that's one request for "one file per hook on disk" for ease of hook deployments and one request for "a set of hooks files on disk" to simplify things.

I don't like hook order by alphanumeric sort very much. Hook order is a pain, there will be some who want/need to be first or last on the pre start hooks and vice-versa for the post stop hooks.

Maybe instead of one file path to a hooks json file that is managed by the administrator... I should do a list of file paths to hooks json files and let the order specified therein, by the administrator, decide the order (I would propose to append the hooks in each hooks file in order at the back of the list for the start type hooks, then reverse that order and inject at the front of the list for post stop hooks). Otherwise if that doesn't work they can just define one hooks file and manually merge.

Hmm..

@kad
Copy link

kad commented Sep 5, 2019

@mikebrow yes, my comment was about to give administrator freedom to select order of the hooks by using different names of the json files in the hooks discovery directory.
And for cri-containerd it will be order in which they are added to the list / executed. I don't think that extreme of reversing list for stop hooks will make some sense. If administrator wants to do something in particular order for that stage, administrator would be able to re-order it by renaming json files (and of course, not mixing start with stop hooks into one json, if that order is so critical).

@mikebrow
Copy link
Member Author

mikebrow commented Sep 5, 2019

@mikebrow yes, my comment was about to give administrator freedom to select order of the hooks by using different names of the json files in the hooks discovery directory.
And for cri-containerd it will be order in which they are added to the list / executed. I don't think that extreme of reversing list for stop hooks will make some sense. If administrator wants to do something in particular order for that stage, administrator would be able to re-order it by renaming json files (and of course, not mixing start with stop hooks into one json, if that order is so critical).

Your definition of extreme over-complication might be different than mine :)

One could argue anything but a single file with the entirety of the desired OCI hooks is an over-complication for the admin.. and anything but individual hooks files in a directory is an over-complicated deployment solution for vendors that don't care about order or cross hook impacts by other vendors. Renaming the files to set alpha order presumably with files named 01_usefulname.json, 02_usefulname.json... dunno it still seems arbitrary.. the race to name stuff 0000000001 comes to mind.

Best to get the list of requirements out then decide what's over complicated and/or extreme.

I'm gravitating to a list of OCI hooks files in the config, in a desired order, that point to hooks files, vs the more arbitrary include anything in a directory by alpha. But that would force deployment to append to the list to add a newly deployed set of hooks.

@kad
Copy link

kad commented Sep 5, 2019

My opinion was mostly based on generic practices across the board. e.g. systemd drop-ins are coming first to mind. Or any other drop-in setups like /etc/sysctl.d/ or /etc/udev/rules.d

Race with vendors and order is in theory possible, but it is something that sysadmins are very familiar with.

@Random-Liu
Copy link
Member

Random-Liu commented Sep 6, 2019

The hooks run in the host namespace, so it is very specific to the node environment. I really doubt that the hook details will directly come from Kubernetes api in the future, so the hook details would very likely to be configured on the node.

There are 2 ways to configure all the hooks:

  1. Centralized in containerd. If this is the case, we need to support a directory to contain all the hooks, and a key to map to each of the hook. Maybe just put all hook files under one configurable directory, and use the file name as the key. For example:
containerd_hooks_directory/nvidia.io.gpu.hooks
containerd_hooks_directory/intel.io.fpga.hooks
contianerd_hooks_directory/custom.io.random.hooks

And users/device plugins can directly pass the hook name in CRI, e.g. nvidia.io.gpu.hooks.

  1. Not centralized in containerd. Basically vendors or users install a random hook to an arbitrary place on the node, and pass the absolute file path to containerd via CRI.

I prefer 1) more because:

  1. It is much easier to find out what hooks are installed/supported on a node;
  2. It avoids setting absolute path in the API. Today's seccomp api makes the pod less portable across different clusters, because we specify the absolute path to the seccomp profile in the seccomp annotation. This is annoying. Although the hook will mainly be used by device plugins on the node at the beginning, I think it is possible that we may expose the capability to users through Kubernetes api someday. When that happens, I don't want the api to be an absolute path on the node. :P

@mikebrow
Copy link
Member Author

mikebrow commented Sep 6, 2019

The hooks run in the host namespace, so it is very specific to the node environment. I really doubt that the hook details will directly come from Kubernetes api in the future, so the hook details would very likely to be configured on the node.

There are 2 ways to configure all the hooks:

  1. Centralized in containerd. If this is the case, we need to support a directory to contain all the hooks, and a key to map to each of the hook. Maybe just put all hook files under one configurable directory, and use the file name as the key. For example:
containerd_hooks_directory/nvidia.io.gpu.hooks
containerd_hooks_directory/intel.io.fpga.hooks
contianerd_hooks_directory/custom.io.random.hooks

And users/device plugins can directly pass the hook name in CRI, e.g. nvidia.io.gpu.hooks.

  1. Not centralized in containerd. Basically vendors or users install a random hook to an arbitrary place on the node, and pass the absolute file path to containerd via CRI.

I prefer 1) more because:

  1. It is much easier to find out what hooks are installed/supported on a node;
  2. It avoids setting absolute path in the API. Today's seccomp api makes the pod less portable across different clusters, because we specify the absolute path to the seccomp profile in the seccomp annotation. This is annoying. Although the hook will mainly be used by device plugins on the node at the beginning, I think it is possible that we may expose the capability to users through Kubernetes api someday. When that happens, I don't want the api to be an absolute path on the node. :P

I like #1 as well.

I'll make it so.

WRT absolute path & portability for any existing file pointers... what do you think about checking the path/filename and if it is absolute use it but log a warning that absolute paths are being deprecated to enable portability? otherwise treat the path/filename as a relative path?

@Random-Liu
Copy link
Member

Random-Liu commented Sep 6, 2019

WRT absolute path & portability for any existing file pointers... what do you think about checking the path/filename and if it is absolute use it but log a warning that absolute paths are being deprecated to enable portability? otherwise treat the path/filename as a relative path?

Actually there is such a thing in kubelet already for seccomp https://github.com/kubernetes/kubernetes/blob/887edd2273a688f3730244dc781b76b27003d005/cmd/kubelet/app/options/options.go#L397. If you specify an absolute path, I guess kubelet will use it; if you specify a relative path, kubelet will get it from the seccomp root. I think we can do similar things for the OCI hook, and disrecommend absolute path.

I'm not sure whether the hook will be moved into kubelet in the future. Kubelet doesn't understand OCI today and won't only support OCI. If the definition is moved up, we'll have to redefine the hook format in the CRI anyway.

Given so, I think it is fine to start from defining the hook in the CRI plugin. :)

@kad
Copy link

kad commented Sep 6, 2019

Colleagues, please let's not over complicate :) let's get something minimal implemented in cri/containerd, so other projects can start using it and then adjust based on real user experience.

I don't think that having full path will be anywhere in any sensible design of APIs between layers, especially for such sensitive things like hooks, executed on node namespace. And especially if it will be potentially exposed somehow to the end user.

Hardcoding any vendor specific constants like nvidia/intel or even to some degree kubernetes at the moment also is an overkill.

If we are thinking about how it could be later on exposed in CRI API spec, we have for that good example of "runtimeClass" field, which has conventions on the name + validation, but then on lower levels it can be expanded to whatever parameters configured on the node in CRI implementation.

Hooks are in some sense similar to the idea of runtimeClass: upper layers might want to add "for this pod/container run hooks $id", and on lower layers (containerd, cri-o, ...) it might be mapping that "$id" actually means "running xyz.sh with arguments as prestart", and "abc.sh as poststop".

@Random-Liu
Copy link
Member

Random-Liu commented Sep 6, 2019

Hooks are in some sense similar to the idea of runtimeClass: upper layers might want to add "for this pod/container run hooks $id", and on lower layers (containerd, cri-o, ...) it might be mapping that "$id" actually means "running xyz.sh with arguments as prestart", and "abc.sh as poststop".

Agree. And that is just what I proposed in option 1 in #1248 (comment). The "$id" is the well formatted hook (file) name, similar to today's "profile name" in the apparmor profile and seccomp profile design. https://github.com/njnikoo/Kubernetes/blob/master/docs/design/seccomp.md

As for whether we should support absolute path. I'm fine with either. We'll disrecommend it if we support it anyway.

@mikebrow
Copy link
Member Author

mikebrow commented Sep 6, 2019

Colleagues, please let's not over complicate :) let's get something minimal implemented in cri/containerd, so other projects can start using it and then adjust based on real user experience.

I don't think that having full path will be anywhere in any sensible design of APIs between layers, especially for such sensitive things like hooks, executed on node namespace. And especially if it will be potentially exposed somehow to the end user.

Hardcoding any vendor specific constants like nvidia/intel or even to some degree kubernetes at the moment also is an overkill.

If we are thinking about how it could be later on exposed in CRI API spec, we have for that good example of "runtimeClass" field, which has conventions on the name + validation, but then on lower levels it can be expanded to whatever parameters configured on the node in CRI implementation.

Hooks are in some sense similar to the idea of runtimeClass: upper layers might want to add "for this pod/container run hooks $id", and on lower layers (containerd, cri-o, ...) it might be mapping that "$id" actually means "running xyz.sh with arguments as prestart", and "abc.sh as poststop".

Again one person's over complicated overkill is another person's simplification. No one said let's hardcode vendor specific constants. Those were examples of relative file names with hooks in a list. Runtimes are exposed through the kubernetes api, hooks are not.

Right now i'm leaning to an ordered list of hooks files (well formed key names relatively located in a directory) specified within a runtime specification in cri config, I don't think these should be common for all runtimes and I don't think they should be stuffed into runc.options. Well that is my idea of clean and simple, provides lots of administrative control. An admin could thus create a custom runtime with custom hooks (that are configured by the admin for said custom runtime) and/or annotate the runc default with a list of 1 or more hooks files (absolute and/or relatively located for easy of migration... though at this point.. don't really see a benefit to absolute paths, esp. if we want to treat them as key names as well down the road.)

Later on we could let the upstream specs specify hook keys as well.

Thoughts?

@Random-Liu
Copy link
Member

Random-Liu commented Sep 6, 2019

Right now i'm leaning to an ordered list of hooks files (well formed key names relatively located in a directory) specified within a runtime specification in cri config, I don't think these should be common for all runtimes and I don't think they should be stuffed into runc.options.

@mikebrow The hook directory can be one, so that all hooks can be managed together, but each runtime can have default hook name/names configured I think.

Later on we could let the upstream specs specify hook keys as well.

We can start with an annotation in CRI I think, this needs to be discussed in signode.

@Random-Liu
Copy link
Member

Random-Liu commented Sep 6, 2019

I think I like the newest proposal. :)

@mikebrow Would you like to bring this up in the signode? We may want to present to people about:

  1. What are the planned use cases for these hooks. We may need input from @bart0sh @kad @RenaudWasTaken
  2. How is the hook planned to be supported in containerd. Basically what we discussed here, similar with today's seccomp profile and apparmor profile, because hook definition is very host environment specific.
  3. How we expect the hooks to be passed to the container runtime in the future:
  • Start with default hooks configured in containerd;
  • Accept well-known annotations defined in CRI from device plugin, and graduate to fields if it works well.
  • Expose to end user? (TBD, unlikely)

I think 1) is the most important. I've talked with @dchen1107 and @jiayingz about this. Both of them think that the use cases of the hooks are not clear. We all vaguely know that this can be useful for some devices, but we need some more detailed information for some real use cases. We also have some use cases in the comments here #496. :)

@bart0sh What do you plan to use this for? Can you provide a simple doc about the use case?

@bart0sh
Copy link
Contributor

bart0sh commented Sep 9, 2019

@Random-Liu

What do you plan to use this for? Can you provide a simple doc about the use case?

In short: we're using CRI prestart hook to reprogram FPGA devices. Simplified workflow is:

  • FPGA device plugin announces FPGA devices as a cluster resources
  • Users request FPGA device and desired function in their pod yaml
  • CRI hook re-programs the device with the requested function
  • User workload runs and uses the device programmed with the requested function

I'll prepare more detailed description as you've requested.

@bart0sh
Copy link
Contributor

bart0sh commented Sep 9, 2019

@Random-Liu Please review the document you've requested.

@jiayingz
Copy link

jiayingz commented Sep 9, 2019

@bart0sh Is it possible to use a central controller or webhook to observe user pod requests and pre-program FPGA, then having FPGA device plugin export the finer granularity resource names corresponding to the programed functions before those user pods land on those nodes? I think programming FPGA devices at the last point before starting containers may complicate the failure recovery process. Is the main motivation to program FPGA devices through prestart hooks to allow multiple containers in the same pod to request different FPGA functions? If so, is it a common use case to support that can't be worked around in other ways (like creaking those containers into different pods)?

@Random-Liu
Copy link
Member

@Random-Liu Please review the document you've requested.

Sorry, I don't have access. :P

@Random-Liu
Copy link
Member

Random-Liu commented Sep 9, 2019

@bart0sh Is it possible to use a central controller or webhook to observe user pod requests and pre-program FPGA, then having FPGA device plugin export the finer granularity resource names corresponding to the programed functions before those user pods land on those nodes? I think programming FPGA devices at the last point before starting containers may complicate the failure recovery process. Is the main motivation to program FPGA devices through prestart hooks to allow multiple containers in the same pod to request different FPGA functions? If so, is it a common use case to support that can't be worked around in other ways (like creaking those containers into different pods)?

@bart0sh Similar question here? Is it possible to program the FPGA within an init container?

Basically, we want to make sure that the use case is something can't be covered by today's Kubernetes design. If it can be covered, we prefer reusing existing mechanism; if not, we are open to adding new capabilities. :)

@kad
Copy link

kad commented Sep 9, 2019

@Random-Liu no, it is not possible to program FPGA in init container, due to security reasons.

@Random-Liu
Copy link
Member

@kad Ha, so we don't want to give direct access to the user pod/container environment the ability to program FPGA, but we do want to make it possible to program FPGA different for different pods/containers. This makes sense to me.

Another question is that is it possible to provide information about the pod/container to the device plugin, so that the device plugin can program the FPGA?

Just want to understand this more. :)

@bart0sh
Copy link
Contributor

bart0sh commented Sep 9, 2019

@Random-Liu

Sorry, I don't have access. :P

Sorry. Please, try this one.

@kad
Copy link

kad commented Sep 10, 2019

@kad Ha, so we don't want to give direct access to the user pod/container environment the ability to program FPGA, but we do want to make it possible to program FPGA different for different pods/containers. This makes sense to me.

Yes. The hook does validation on what bitstreams (firmware) that are programmed to FPGA are the only ones that are approved by system administrator and located in pre-defined locations. Users don't have direct access to bitstreams storage. (security, IP leakage protection).
FPGA management kernel interface is never exposed to user container, again, to prevent container to get more privileges on hardware than needed to consume accelerator functions.

In overall, what we are talking here, is not only for Kubernetes per se, but in general for docker+containerd usage. The hook(s) needed to prepare and then cleanly shutdown device.
Allowing hooks to be configured on containerd level means that we enable sane way also to docker CLI simplified usage of devices, where hooks is used to enforce device parameters.

Another question is that is it possible to provide information about the pod/container to the device plugin, so that the device plugin can program the FPGA?

That has been discussed in lengths last year in resource management working group. There are several PRs/KEPs about that, I can potentially dig out links from my notes about that. In short, in device plugins API there is no enough information about workload to support "statefull" devices, no ways to pass parameters. It is more or less ok for stateless devices, like Nvidia GPUs, but even for Nvidia, they went to the route of shipping "nvidia-runtime" which was patched version of runc that forcibly executes their hooks before and after of pivot_root into container.

Short summary you can find in my booth presentation at KubeCon Seattle https://speakerdeck.com/kad/extending-kubernetes-with-intel-r-accelerator-devices

@bart0sh
Copy link
Contributor

bart0sh commented Sep 13, 2019

@mikebrow, @Random-Liu Do you need any help with this PR? Any chances to get it updated/merged in near future?

@Random-Liu
Copy link
Member

We had a discussion in signode today, and will continue discussion next week.

People think that several questions need to be answered:

  1. What use cases can't be supported by today's device plugin interface?
  2. Is it possible to extend today's device plugin interface to support use cases?

If the answer to 2) is impossible or super complicated, and OCI hooks might be a better solution for those use cases.

@kad
Copy link

kad commented Oct 1, 2019

@Random-Liu as I recall, Dawn said that definition and unification of the configuration of OCI hooks between two popular CRI implementations is the thing that is needed to be done anyway.

@RenaudWasTaken
Copy link

RenaudWasTaken commented Oct 2, 2019

What are the planned use cases for these hooks. We may need input from @bart0sh @kad @RenaudWasTaken
How is the hook planned to be supported in containerd. Basically what we discussed here, similar with today's seccomp profile and apparmor profile, because hook definition is very host environment specific.

Sorry for the delay! I've been meaning to write a broader document of how I think it would make sense to support "vendor" devices across the stack.

To answer the specific question, there are a number of things we do in the hooks:

  • Run ldconfig
  • Check the container spec to see if it can run on this node (e.g: Needs driver >= 400 but node has driver 300)
  • Mount a number of files, proc entries, device nodes, IPCs, ...

The thing is that these operations will depend on the specific runtime, more accurately it will depend on whether or not you have a "regular" Linux container or a different kind of container (e.g: Kata, not to mention gVisor).

Thinking about exposing pre-start hooks in Kubernetes for devices, my mental model was to move the concept of device plugins down the stack. In other words have containerd / docker / podman / ... have a "device plugin mechanism" where the plugin knows what to do to isolate GPUs and then have the runtime be able to tell to kubelet that it supports GPUs / FPGAs / ...

This "Runtime Device Plugin" would know that a device maps to a specific pre-start hook. The "Kubernetes Device Plugin" would still have the role of informing Kubelet that there are X devices present on the nodes, whether they are healthy or not and performing prestart operations.

This would help solve some of the chicken and egg problems you usually have in the Kubernetes device plugin today. e.g: If you want to check for the health of the GPU then you need your Kubernetes Device Plugin to be GPU enabled, but the device plugin is the one telling kubelet how to create GPU containers.

I'll try to attend next week's sig-node meeting :)

@bart0sh
Copy link
Contributor

bart0sh commented Oct 2, 2019

Let me explicitly mention that for our use case of reprogramming FPGA devices it's enough to have pre-start and post-stop hooks that are executed in the host namespace. Ideally it would be great to have similar hook configuration for both CRI-O and Containerd.

@mikebrow
Copy link
Member Author

mikebrow commented Oct 3, 2019

just rebased... will address requested changes ... soon

@mikebrow
Copy link
Member Author

mikebrow commented Oct 6, 2019

/test pull-cri-containerd-verify

Signed-off-by: Mike Brown <brownwm@us.ibm.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants