agent: write OCI spec to config.json in bundle path #346

eguzman3 · 2018-08-30T18:52:18Z

In order to comply with the OCI specification, the spec must be written to a file named 'config.json' located in the bundle directory.

Signed-off-by: Edward Guzman eguzman@nvidia.com

codecov · 2018-08-30T21:26:39Z

Codecov Report

Merging #346 into master will decrease coverage by 0.58%.
The diff coverage is 0%.

@@            Coverage Diff             @@
##           master     #346      +/-   ##
==========================================
- Coverage   44.78%   44.19%   -0.59%     
==========================================
  Files          15       15              
  Lines        2414     2446      +32     
==========================================
  Hits         1081     1081              
- Misses       1188     1220      +32     
  Partials      145      145

grahamwhaley · 2018-08-31T08:17:07Z

Hi @eguzman3 Thanks for the contribution :-)
Did you have a specific use case or test that found or showed this up? That might help give us some context.
I'm wondering if this fix is best done in the agent or the runtime - but, others have a better view of that than I - so /cc @jodh-intel @bergwolf @WeiZhang555 for input.

bergwolf · 2018-08-31T08:30:15Z

@eguzman3 The OCI spec is already in the bundle path when they are passed to the OCI runtime implementation. I'm not sure what you are trying to fix here.

jodh-intel

Hi @eguzman3 - thanks for raising! This is looking good. Out of interest, how did you notice the issue?

Please could you add some unit tests for the new functions though. You might be able to reuse bits from https://github.com/kata-containers/runtime/blob/master/cli/utils_test.go#L270 possibly?)

Also, feel free to prod us directly (https://github.com/kata-containers/community#join-us) 😄

grpc.go

eguzman3 · 2018-08-31T19:17:54Z

@grahamwhaley @bergwolf
This issue was discovered when trying to use an existing OCI hook within the guest. This hook reads the bundle path from the HookState passed over stdin, then attempts to open /path/to/bundle/config.json which doesn't exist because the bundle path is wrong and because config.json is never written. The reason the cwd needs to be changed is that CreateLibcontainerConfig sets the bundle path to cwd:

agent/vendor/github.com/opencontainers/runc/libcontainer/specconv/spec_linux.go

Line 157 in 5765593

// runc's cwd will always be the bundle path

which in this case is / so this is what the OCI hooks end up getting instead of /run/kata-containers/shared/containers/<hash>/

In order to comply with the OCI specification, the spec must be written to a file named 'config.json' located in the bundle directory. To properly set the bundle path at the time of libcontainer config creation the cwd also needs to be changed to the bundle path. CreateContainer was refactored to reduce cyclomatic complexity. Fixes: kata-containers#349 Signed-off-by: Edward Guzman <eguzman@nvidia.com>

bergwolf · 2018-09-01T03:16:13Z

@eguzman3 I think there are a few things to clarify:

The bundle is never passed to the guest. We only send the rootfs, which is a sub-directory in the bundle.
The config.json is already in the bundle when that is passed to kata from either containerd or crio. Otherwise kata-runtime cannot read the OCI spec in the first place.
If you need the hooks, they are in the OCI spec passed in as create container arguments.

Another thing to consider is do we really want to run the hooks inside the container? The hooks are mostly used by docker to setup network configuration for a container and in that sense they are only useful when run on the host side. What use case do you have that would require running hooks in side the guest?

sboeuf · 2018-09-04T16:45:10Z

There is no reason to reproduce the whole OCI config.json thing on the guest OS and therefore the agent. The OCI spec only needs to be passed as a structure, and it's already done, which allows the agent to provide libcontainer with the appropriate data.
The hook issue that you're mentioning is separate from the config.json need, and is actually not needed since we don't want to run the hook from inside the guest as this makes no sense. The caller (Docker, CRI-O, or something else) will expect the hook to be executed on the host, and that's why this is handled directly by the kata-runtime CLI, and that's also the reason why the Hooks structure is set to nil by kata_agent.go before to be passed to the agent.

sboeuf · 2018-09-04T16:45:28Z

-1

amshinde · 2018-09-04T22:15:56Z

@sboeuf I think @eguzman3 has a use case that makes it necessary for running hooks inside the guest.
@eguzman3 Can you please describe your use case so that others can get a broader view of what you are trying to do.

eguzman3 · 2018-09-05T17:33:36Z

@sboeuf @amshinde
Yes the use case requires that the hooks be run inside the guest. These hooks are intended to be separate from those executed inside the guest, and are intended to be placed into the guest according to: #347. Specifically we are trying to leverage this existing OCI hook: https://github.com/NVIDIA/nvidia-container-runtime

sboeuf · 2018-09-05T18:30:19Z

@eguzman3 what is the hook doing exactly ? I'm trying to understand the real need for this.
Also, one way to workaround this more properly would be to use a sidecar, instead of relying on the pre-start hooks which are complex to manage in case of Kata. And if we can avoid to make too many assumptions, that would be great.
Are you using k8s to orchestrate your containers ?

eguzman3 · 2018-09-05T18:36:26Z

This hook configures GPU access for a container using libnvidia-container. This is intended to be used in conjunction with GPU passthrough which is why this hook needs to run inside the guest.

flx42 · 2018-09-05T18:47:51Z

As mentioned in #347, the guest hooks will not be part of the Kata API, we want them to be part of the guest rootfs and auto-discovered by the agent.

sboeuf · 2018-09-05T19:00:24Z

@eguzman3 @flx42, ok now I see.
That works, but I have one comment regarding the configuration of such a thing. As this is very specific to the fact of running some custom binaries on the guest directly, could you add some configuration to our CLI configuration, so that we don't end up trying to run some custom hooks on the agent everytime.
Most of the time, those hooks won't exist, and there's no need for the agent to look for them. Also, having the path being specified from the configuration, and not hardcoded inside the agent would be a more proper approach.

amshinde · 2018-09-05T20:38:55Z

Yes it would make sense to configure the path for guest rootfs hooks in the runtime configuration.toml and pass this to the agent as part of CreateSandbox request.

eguzman3 · 2018-09-05T21:14:35Z

@sboeuf Can I get some clarification on the need for this configuration option? Is this a performance hit concern? Also would this runtime configuration option be disabled when the option is commented out?

sboeuf · 2018-09-05T21:26:32Z

@eguzman3 yes it's about keeping the default path as simple and efficient as possible. And by default the agent would not try to load any custom hook.

sboeuf · 2018-09-10T16:22:28Z

Ok let's try to sort this out as I understand you have a very specific use case of getting the agent running hooks from inside the VM directly, those hooks being provided by the VM rootfs, so that the agent can find them. This is the purpose of the PR #347.

Now, regarding this PR, I don't get why we actually need to create a config.json. My understanding is that in case you want to leverage libcontainer to run those custom hooks, you simply need to add them to the ociSpec structure defined in the agent. This way, libcontainer will get information from the structure directly.

@eguzman3 Am I missing something?

jodh-intel · 2018-09-10T16:34:43Z

I think I can see why this might be required...

As specified in https://github.com/opencontainers/runtime-spec/blob/master/config.md#posix-platform-hooks,

The state of the container MUST be passed to hooks over stdin so that they may do work appropriate to the current state of the container.

But the state of the container must include:

bundle (string, REQUIRED) is the absolute path to the container's bundle directory. This is provided so that consumers can find the container's configuration and root filesystem on the host.

So the arbitrary hook binaries that need to be run inside the container must be passed the container state and must therefore have access to the bundle directory. And a bundle is "rootfs + config.json":

https://github.com/opencontainers/runtime-spec/blob/master/bundle.md

sboeuf · 2018-09-10T16:44:57Z

@jodh-intel thanks for looking into this. It's still not clear if this is needed from a libcontainer perspective though. I mean that libcontainer gets provided with the container status through a structure, and not through a CLI, so I don't know how far we need to stick to the OCI definition here.

sboeuf · 2018-09-10T17:19:06Z

Doing some research in libcontainer codebase, here's my understanding:
here is the code that gets called to handle OCI hook (case of pre-start, which is actually the one you're interested into).
If the list of Labels is empty, the function returning the bundle and the list of Annotations here will simply return an empty bundle string, and an empty list of Annotations.
So what I'm trying to say is that it's not a big deal from a libcontainer perspective, we won't get any error with that.
And then, it's simply up to your hook, not to use those fields. Is there any reason why the custom hook would need them ?

flx42 · 2018-09-10T17:58:57Z

The config.json file has two fields we need:

The path to the rootfs (despite what the spec says, it's not always in the same directory tree as the config file).
Environment variables/labels from the container image or set by the user. We use that to tweak the behavior of our prestart hook: https://github.com/nvidia/nvidia-container-runtime#environment-variables-oci-spec

sboeuf · 2018-09-10T18:14:06Z

@flx42

The path to the rootfs (despite what the spec says, it's not always in the same directory tree as the config file).

Why do you need the rootfs path here?
Also, just to mention, giving the rootfs path is orthogonal from creating config.json since the path can be provided simply through the lbcontainer Config structure.

Environment variables/labels from the container image or set by the user. We use that to tweak the behavior of our prestart hook: https://github.com/nvidia/nvidia-container-runtime#environment-variables-oci-spec

Environment variables should not be an issue since they will be passed from the initial config.json on the host side, and they'll be carried up to the agent to be applied by libcontainer.

flx42 · 2018-09-10T18:25:38Z

Why do you need the rootfs path here?

Our prestart hook needs to know the mount point of the rootfs, to check the content of the filesystem and add mounts.

Also, just to mention, giving the rootfs path is orthogonal from creating config.json since the path can be provided simply through the lbcontainer Config structure.

Sure, but for a hook binary, that's the only way we can get the rootfs path today.

Environment variables should not be an issue since they will be passed from the initial config.json on the host side, and they'll be carried up to the agent to be applied by libcontainer.

Yeah, the environment variables are correctly preserved for the executed process. But again, for a prestart hook, the config.json is the only way to get them (the hooks are usually not executed with the container's environment variables).

sboeuf · 2018-09-10T19:05:01Z

@flx42 this is an interesting problem that we have here :)

Ok I think here could be a solution that would be a good tradeoff on how handle those customized OCI hooks:

First, you detect from the agent that some custom hooks are available.
Once detected, the agent will build the appropriate Hook structure, including the Env that he knows from the OCI Spec structure. This way, the hook will be started with the appropriate environment variables.
In order to provide the rootfs path, I suggest that you modify slightly the NVIDIA hook so that it would look for the rootfs path through the enviroment variables. By default, it would rely on the bundle path (I don't want to break your current behavior), but if the path is empty, it should check if it can find an envvar CONTAINER_ROOTFS. This new environment variable should be added by the agent when creating the hook from scratch, so that it will provide the right amount of knowledge to the hook.

WDYT?

flx42 · 2018-09-10T19:39:03Z

@sboeuf We are flexible, we can definitely have some kind of glue layer that translates the environment variables to something else and then call into our nvidia-container-runtime-hook or even nvidia-container-cli. What you describe is closer to our LXC hook implementation than to a standard OCI hook:
https://github.com/lxc/lxc/blob/master/hooks/nvidia
Here we clearly needed a custom script since LXC does not follow the OCI runtime spec.

With that said, I'm not sure I understand why you don't want to provide support for standard OCI runtime hooks, what is the concern with this approach? We were hoping to leverage our existing OCI hook since we were missing only a small piece to make it happen.

sboeuf · 2018-09-10T19:59:28Z

@flx42

With that said, I'm not sure I understand why you don't want to provide support for standard OCI runtime hooks, what is the concern with this approach? We were hoping to leverage our existing OCI hook since we were missing only a small piece to make it happen.

The main concern that I have here is that we're trying to implement the agent as an implementation of the OCI specification, but that's not what the agent is.
Now that we've sorted out the need you have for your problem, it'd be nice to get more input from a few ppl on this topic.
/cc @bergwolf @amshinde @jodh-intel WDYT ?

flx42 · 2018-09-10T22:20:37Z

@sboeuf I see, I guess it would be fine to draw the line at the config.json:
https://github.com/opencontainers/runtime-spec/blob/master/config.md
and https://github.com/opencontainers/runtime-spec/blob/master/config-linux.md

But obviously not support the lifecycle and operations of a fully-compliant OCI runtime, as describe in
https://github.com/opencontainers/runtime-spec/blob/master/runtime.md
Or https://github.com/opencontainers/runtime-tools/blob/master/docs/command-line-interface.md

amshinde · 2018-09-11T22:58:15Z

Here is my take on this:
I am fine with writing the config.json inside the guest, that way OCI compliant hooks that need to be run inside the guest can be executed without any changes.
If we do things differently (lets say via environment variables), then an existing hook that is OCI compliant will need to be changed for running it inside the guest.

That said, we should write the config.json inside the guest only if hooks need to be run.
We can provide this information to the agent, using a runtime configuration such as guest_hooks_path which will be empty by default. This configuration can then be passed to the agent as part of the CreateSandbox API. Only if this path is non-empty, will the agent create the config.json file and execute the hooks at the provided path.

@bergwolf @WeiZhang555 @sboeuf What do you think?

bergwolf · 2018-09-12T04:00:19Z

I agree with @sboeuf that kata-agent is not an OCI compatible implementation on its own.

@eguzman3 @flx42 Can you do it with a sidecar container? AFAIU, with OCI hooks, you need to add whatever is needed in the guest rootfs image. That means if a user wants to use nvidia-container-runtime-hook, they need to install a customized guest rootfs image rather than the one shipped by kata containers. OTOH, if you do it with a sidecar container, it runs on all existing kata guest rootfs images.

And you need to pay attention that whatever you put in the OCI hooks, they are executed on the host as well. Do you really need the SAME hooks to be run both on the host and in the guest?

flx42 · 2018-09-12T04:13:58Z

Can you do it with a sidecar container?

Driver installation is possible with a container, but at the container runtime level, not really. And isn't that a Kubernetes construct? That won't work with Docker (or another OCI frontend).

That means if a user wants to use nvidia-container-runtime-hook, they need to install a customized guest rootfs image rather than the one shipped by kata containers. OTOH, if you do it with a sidecar container, it runs on all existing kata guest rootfs images.

Yes, we are fine with that, since the rootfs will also bake our user-space driver libraries, and precompiled objects for the device driver.

And you need to pay attention that whatever you put in the OCI hooks, they are executed on the host as well. Do you really need the SAME hooks to be run both on the host and in the guest?

We want to only execute OCI hooks on the "translated" OCI spec, by discovering them automatically. There is no way to inject hooks in the spec docker generates today.

bergwolf · 2018-09-12T04:37:54Z

Driver installation is possible with a container, but at the container runtime level, not really. And isn't that a Kubernetes construct? That won't work with Docker (or another OCI frontend).

I mean whatever initialization you do with current hooks, do it in a sidecar container, -- not running the same hooks there. And docker does support sidecars, it is just about constructing proper namespace configurations for different containers. Maybe it's too much complexity for your users though.

Yes, we are fine with that, since the rootfs will also bake our user-space driver libraries, and precompiled objects for the device driver.

In that case, how about having a builtin container inside the rootfs image? Then we can make kata containers create default sidecars when configured so. And such a design would solve the socat sidecar image issue @gnawux is trying to solve for k8s port forwarding.

We want to only execute OCI hooks on the "translated" OCI spec, by discovering them autoamtically. There is no way to inject hooks in the spec docker generates today.

Sorry, I kinda miss the whole picture here. If not docker, who will translate the OCI spec and how? Do you expect kata-runtime to do the translation and only put translated hooks in the spec when passing them to kata-agent? Please explain a bit more about the architecture you are expecting to work with.

flx42 · 2018-09-12T05:05:09Z

I mean whatever initialization you do with current hooks, do it in a sidecar container

It's not possible, it would require changing our whole implementation. Our hook performs bind mounts from the runtime's namespace to the container. For instance it looks for libcuda.so.390.55 in the runtime's mount namespace and bind-mount it under the container's rootfs mount point.
With a sidecar container, I guess you would have to share a mount point between the two containers, and point the application container to the libraries/binaries in this directory.
But that's not all, we also need to bind-mount devices in /dev, that's not possible with a side-car container.

Maybe it's too much complexity for your users though.

I agree. And the implementation is possible but would be hacky.

In that case, how about having a builtin container inside the rootfs image?

This would be overkill for our use case, and would not solve the problem as well as what we have today, IMO.

Sorry, I kinda miss the whole picture here. If not docker, who will translate the OCI spec and how?

Docker (or something else) gives you the OCI spec, and my understanding is that once in the guest you amend this spec before passing it to libcontainer. @sboeuf mentioned that hooks from the original OCI Spec are removed: #346 (comment)

Here, we want to add hooks from the guest rootfs, a possible way of doing that is presented in #347

bergwolf · 2018-09-12T07:36:03Z

@flx42

But that's not all, we also need to bind-mount devices in /dev, that's not possible with a side-car container.

As long as devices share the same major/minor pair, they are shared among different containers in the same guest, which means you do not need to bindmount them. Just pass the same major/minor pair in the OCI spec, and they will be treated as the same device by kernel.

OTOH, I agree in your use case the arbitrary rootfs hooks are much simpler than adding implicit sidecars. Thanks for the explanation. Please go ahead with the modification and I'm fine with saving the spec somewhere in the rootfs.

bergwolf · 2018-09-12T07:51:41Z

grpc.go

@@ -579,6 +608,13 @@ func (a *agentGRPC) CreateContainer(ctx context.Context, req *pb.CreateContainer
 		return emptyResp, err
 	}

+	// Change cwd because libcontainer sets the bundle path to cwd
+	oldcwd, err := pb.ChangeToBundlePath(ociSpec)


These are only useful when there are hooks in the rootfs. So please make it optional by:

scan for rootfs hooks when agent gets up

change cwd and save spec in container bundle only when necessary

eguzman3 · 2018-09-12T18:45:30Z

Consolidating this PR with #347

eguzman3 force-pushed the write-oci-spec branch 3 times, most recently from d92a539 to c692c40 Compare August 30, 2018 21:26

jodh-intel reviewed Aug 31, 2018

View reviewed changes

grpc.go Outdated Show resolved Hide resolved

grpc.go Outdated Show resolved Hide resolved

grpc.go Outdated Show resolved Hide resolved

grpc.go Outdated Show resolved Hide resolved

grpc.go Outdated Show resolved Hide resolved

grpc.go Outdated Show resolved Hide resolved

eguzman3 force-pushed the write-oci-spec branch 2 times, most recently from d8e7053 to 29a0fd6 Compare August 31, 2018 20:54

eguzman3 force-pushed the write-oci-spec branch from 29a0fd6 to c7efd9a Compare August 31, 2018 21:07

bergwolf suggested changes Sep 12, 2018

View reviewed changes

bergwolf mentioned this pull request Sep 12, 2018

agent: add support for custom OCI hooks inside guest #347

Closed

eguzman3 closed this Sep 12, 2018

eguzman3 deleted the write-oci-spec branch September 12, 2018 18:59

eguzman3 mentioned this pull request Sep 12, 2018

agent: add support for guest-hooks #365

Closed

amshinde mentioned this pull request Jul 13, 2020

oci: Fix running of OCI hooks #798

Merged

agent: write OCI spec to config.json in bundle path #346

agent: write OCI spec to config.json in bundle path #346

Conversation

eguzman3 commented Aug 30, 2018

codecov bot commented Aug 30, 2018 • edited Loading

Codecov Report

grahamwhaley commented Aug 31, 2018

bergwolf commented Aug 31, 2018

jodh-intel left a comment

Choose a reason for hiding this comment

eguzman3 commented Aug 31, 2018 • edited Loading

bergwolf commented Sep 1, 2018

sboeuf commented Sep 4, 2018

sboeuf commented Sep 4, 2018 • edited by amshinde Loading

amshinde commented Sep 4, 2018

eguzman3 commented Sep 5, 2018 • edited Loading

sboeuf commented Sep 5, 2018

eguzman3 commented Sep 5, 2018

flx42 commented Sep 5, 2018

sboeuf commented Sep 5, 2018

amshinde commented Sep 5, 2018

eguzman3 commented Sep 5, 2018

sboeuf commented Sep 5, 2018

sboeuf commented Sep 10, 2018

jodh-intel commented Sep 10, 2018

sboeuf commented Sep 10, 2018

sboeuf commented Sep 10, 2018

flx42 commented Sep 10, 2018 • edited Loading

sboeuf commented Sep 10, 2018

flx42 commented Sep 10, 2018

sboeuf commented Sep 10, 2018

flx42 commented Sep 10, 2018

sboeuf commented Sep 10, 2018

flx42 commented Sep 10, 2018 • edited Loading

amshinde commented Sep 11, 2018

bergwolf commented Sep 12, 2018

flx42 commented Sep 12, 2018 • edited Loading

bergwolf commented Sep 12, 2018

flx42 commented Sep 12, 2018

bergwolf commented Sep 12, 2018

bergwolf Sep 12, 2018

Choose a reason for hiding this comment

eguzman3 commented Sep 12, 2018

codecov bot commented Aug 30, 2018 •

edited

Loading

eguzman3 commented Aug 31, 2018 •

edited

Loading

sboeuf commented Sep 4, 2018 •

edited by amshinde

Loading

eguzman3 commented Sep 5, 2018 •

edited

Loading

flx42 commented Sep 10, 2018 •

edited

Loading

flx42 commented Sep 10, 2018 •

edited

Loading

flx42 commented Sep 12, 2018 •

edited

Loading