Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

agent: write OCI spec to config.json in bundle path #346

Closed
wants to merge 1 commit into from

Conversation

eguzman3
Copy link
Contributor

In order to comply with the OCI specification, the spec must be written to a file named 'config.json' located in the bundle directory.

Signed-off-by: Edward Guzman eguzman@nvidia.com

@eguzman3 eguzman3 force-pushed the write-oci-spec branch 3 times, most recently from d92a539 to c692c40 Compare August 30, 2018 21:26
@codecov
Copy link

codecov bot commented Aug 30, 2018

Codecov Report

Merging #346 into master will decrease coverage by 0.58%.
The diff coverage is 0%.

@@            Coverage Diff             @@
##           master     #346      +/-   ##
==========================================
- Coverage   44.78%   44.19%   -0.59%     
==========================================
  Files          15       15              
  Lines        2414     2446      +32     
==========================================
  Hits         1081     1081              
- Misses       1188     1220      +32     
  Partials      145      145

@grahamwhaley
Copy link
Contributor

Hi @eguzman3 Thanks for the contribution :-)
Did you have a specific use case or test that found or showed this up? That might help give us some context.
I'm wondering if this fix is best done in the agent or the runtime - but, others have a better view of that than I - so /cc @jodh-intel @bergwolf @WeiZhang555 for input.

@bergwolf
Copy link
Member

@eguzman3 The OCI spec is already in the bundle path when they are passed to the OCI runtime implementation. I'm not sure what you are trying to fix here.

Copy link
Contributor

@jodh-intel jodh-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @eguzman3 - thanks for raising! This is looking good. Out of interest, how did you notice the issue?

Please could you add some unit tests for the new functions though. You might be able to reuse bits from https://github.com/kata-containers/runtime/blob/master/cli/utils_test.go#L270 possibly?)

Also, feel free to prod us directly (https://github.com/kata-containers/community#join-us) 😄

grpc.go Outdated Show resolved Hide resolved
grpc.go Outdated Show resolved Hide resolved
grpc.go Outdated Show resolved Hide resolved
grpc.go Outdated Show resolved Hide resolved
grpc.go Outdated Show resolved Hide resolved
grpc.go Outdated Show resolved Hide resolved
@eguzman3
Copy link
Contributor Author

eguzman3 commented Aug 31, 2018

@grahamwhaley @bergwolf
This issue was discovered when trying to use an existing OCI hook within the guest. This hook reads the bundle path from the HookState passed over stdin, then attempts to open /path/to/bundle/config.json which doesn't exist because the bundle path is wrong and because config.json is never written. The reason the cwd needs to be changed is that CreateLibcontainerConfig sets the bundle path to cwd:

// runc's cwd will always be the bundle path

which in this case is / so this is what the OCI hooks end up getting instead of /run/kata-containers/shared/containers/<hash>/

@eguzman3 eguzman3 force-pushed the write-oci-spec branch 2 times, most recently from d8e7053 to 29a0fd6 Compare August 31, 2018 20:54
In order to comply with the OCI specification, the
spec must be written to a file named 'config.json'
located in the bundle directory. To properly set the
bundle path at the time of libcontainer config creation
the cwd also needs to be changed to the bundle path.
CreateContainer was refactored to reduce cyclomatic
complexity.

Fixes: kata-containers#349

Signed-off-by: Edward Guzman <eguzman@nvidia.com>
@bergwolf
Copy link
Member

bergwolf commented Sep 1, 2018

@eguzman3 I think there are a few things to clarify:

  1. The bundle is never passed to the guest. We only send the rootfs, which is a sub-directory in the bundle.
  2. The config.json is already in the bundle when that is passed to kata from either containerd or crio. Otherwise kata-runtime cannot read the OCI spec in the first place.
  3. If you need the hooks, they are in the OCI spec passed in as create container arguments.

Another thing to consider is do we really want to run the hooks inside the container? The hooks are mostly used by docker to setup network configuration for a container and in that sense they are only useful when run on the host side. What use case do you have that would require running hooks in side the guest?

@sboeuf
Copy link

sboeuf commented Sep 4, 2018

There is no reason to reproduce the whole OCI config.json thing on the guest OS and therefore the agent. The OCI spec only needs to be passed as a structure, and it's already done, which allows the agent to provide libcontainer with the appropriate data.
The hook issue that you're mentioning is separate from the config.json need, and is actually not needed since we don't want to run the hook from inside the guest as this makes no sense. The caller (Docker, CRI-O, or something else) will expect the hook to be executed on the host, and that's why this is handled directly by the kata-runtime CLI, and that's also the reason why the Hooks structure is set to nil by kata_agent.go before to be passed to the agent.

@sboeuf
Copy link

sboeuf commented Sep 4, 2018

-1

Rejected with PullApprove

@amshinde
Copy link
Member

amshinde commented Sep 4, 2018

@sboeuf I think @eguzman3 has a use case that makes it necessary for running hooks inside the guest.
@eguzman3 Can you please describe your use case so that others can get a broader view of what you are trying to do.

@eguzman3
Copy link
Contributor Author

eguzman3 commented Sep 5, 2018

@sboeuf @amshinde
Yes the use case requires that the hooks be run inside the guest. These hooks are intended to be separate from those executed inside the guest, and are intended to be placed into the guest according to: #347. Specifically we are trying to leverage this existing OCI hook: https://github.com/NVIDIA/nvidia-container-runtime

@sboeuf
Copy link

sboeuf commented Sep 5, 2018

@eguzman3 what is the hook doing exactly ? I'm trying to understand the real need for this.
Also, one way to workaround this more properly would be to use a sidecar, instead of relying on the pre-start hooks which are complex to manage in case of Kata. And if we can avoid to make too many assumptions, that would be great.
Are you using k8s to orchestrate your containers ?

@eguzman3
Copy link
Contributor Author

eguzman3 commented Sep 5, 2018

This hook configures GPU access for a container using libnvidia-container. This is intended to be used in conjunction with GPU passthrough which is why this hook needs to run inside the guest.

@flx42
Copy link
Contributor

flx42 commented Sep 5, 2018

As mentioned in #347, the guest hooks will not be part of the Kata API, we want them to be part of the guest rootfs and auto-discovered by the agent.

@sboeuf
Copy link

sboeuf commented Sep 5, 2018

@eguzman3 @flx42, ok now I see.
That works, but I have one comment regarding the configuration of such a thing. As this is very specific to the fact of running some custom binaries on the guest directly, could you add some configuration to our CLI configuration, so that we don't end up trying to run some custom hooks on the agent everytime.
Most of the time, those hooks won't exist, and there's no need for the agent to look for them. Also, having the path being specified from the configuration, and not hardcoded inside the agent would be a more proper approach.

@amshinde
Copy link
Member

amshinde commented Sep 5, 2018

Yes it would make sense to configure the path for guest rootfs hooks in the runtime configuration.toml and pass this to the agent as part of CreateSandbox request.

@eguzman3
Copy link
Contributor Author

eguzman3 commented Sep 5, 2018

@sboeuf Can I get some clarification on the need for this configuration option? Is this a performance hit concern? Also would this runtime configuration option be disabled when the option is commented out?

@sboeuf
Copy link

sboeuf commented Sep 5, 2018

@eguzman3 yes it's about keeping the default path as simple and efficient as possible. And by default the agent would not try to load any custom hook.

@sboeuf
Copy link

sboeuf commented Sep 10, 2018

Ok let's try to sort this out as I understand you have a very specific use case of getting the agent running hooks from inside the VM directly, those hooks being provided by the VM rootfs, so that the agent can find them. This is the purpose of the PR #347.

Now, regarding this PR, I don't get why we actually need to create a config.json. My understanding is that in case you want to leverage libcontainer to run those custom hooks, you simply need to add them to the ociSpec structure defined in the agent. This way, libcontainer will get information from the structure directly.

@eguzman3 Am I missing something?

@jodh-intel
Copy link
Contributor

I think I can see why this might be required...

As specified in https://github.com/opencontainers/runtime-spec/blob/master/config.md#posix-platform-hooks,

The state of the container MUST be passed to hooks over stdin so that they may do work appropriate to the current state of the container.

But the state of the container must include:

bundle (string, REQUIRED) is the absolute path to the container's bundle directory. This is provided so that consumers can find the container's configuration and root filesystem on the host.

So the arbitrary hook binaries that need to be run inside the container must be passed the container state and must therefore have access to the bundle directory. And a bundle is "rootfs + config.json":

@sboeuf
Copy link

sboeuf commented Sep 10, 2018

@jodh-intel thanks for looking into this. It's still not clear if this is needed from a libcontainer perspective though. I mean that libcontainer gets provided with the container status through a structure, and not through a CLI, so I don't know how far we need to stick to the OCI definition here.

@sboeuf
Copy link

sboeuf commented Sep 10, 2018

Doing some research in libcontainer codebase, here's my understanding:
here is the code that gets called to handle OCI hook (case of pre-start, which is actually the one you're interested into).
If the list of Labels is empty, the function returning the bundle and the list of Annotations here will simply return an empty bundle string, and an empty list of Annotations.
So what I'm trying to say is that it's not a big deal from a libcontainer perspective, we won't get any error with that.
And then, it's simply up to your hook, not to use those fields. Is there any reason why the custom hook would need them ?

@flx42
Copy link
Contributor

flx42 commented Sep 10, 2018

The config.json file has two fields we need:

@sboeuf
Copy link

sboeuf commented Sep 10, 2018

@flx42

The path to the rootfs (despite what the spec says, it's not always in the same directory tree as the config file).

Why do you need the rootfs path here?
Also, just to mention, giving the rootfs path is orthogonal from creating config.json since the path can be provided simply through the lbcontainer Config structure.

Environment variables/labels from the container image or set by the user. We use that to tweak the behavior of our prestart hook: https://github.com/nvidia/nvidia-container-runtime#environment-variables-oci-spec

Environment variables should not be an issue since they will be passed from the initial config.json on the host side, and they'll be carried up to the agent to be applied by libcontainer.

@flx42
Copy link
Contributor

flx42 commented Sep 10, 2018

Why do you need the rootfs path here?

Our prestart hook needs to know the mount point of the rootfs, to check the content of the filesystem and add mounts.

Also, just to mention, giving the rootfs path is orthogonal from creating config.json since the path can be provided simply through the lbcontainer Config structure.

Sure, but for a hook binary, that's the only way we can get the rootfs path today.

Environment variables should not be an issue since they will be passed from the initial config.json on the host side, and they'll be carried up to the agent to be applied by libcontainer.

Yeah, the environment variables are correctly preserved for the executed process. But again, for a prestart hook, the config.json is the only way to get them (the hooks are usually not executed with the container's environment variables).

@sboeuf
Copy link

sboeuf commented Sep 10, 2018

@flx42 this is an interesting problem that we have here :)

Ok I think here could be a solution that would be a good tradeoff on how handle those customized OCI hooks:

  • First, you detect from the agent that some custom hooks are available.
  • Once detected, the agent will build the appropriate Hook structure, including the Env that he knows from the OCI Spec structure. This way, the hook will be started with the appropriate environment variables.
  • In order to provide the rootfs path, I suggest that you modify slightly the NVIDIA hook so that it would look for the rootfs path through the enviroment variables. By default, it would rely on the bundle path (I don't want to break your current behavior), but if the path is empty, it should check if it can find an envvar CONTAINER_ROOTFS. This new environment variable should be added by the agent when creating the hook from scratch, so that it will provide the right amount of knowledge to the hook.

WDYT?

@flx42
Copy link
Contributor

flx42 commented Sep 10, 2018

@sboeuf We are flexible, we can definitely have some kind of glue layer that translates the environment variables to something else and then call into our nvidia-container-runtime-hook or even nvidia-container-cli. What you describe is closer to our LXC hook implementation than to a standard OCI hook:
https://github.com/lxc/lxc/blob/master/hooks/nvidia
Here we clearly needed a custom script since LXC does not follow the OCI runtime spec.

With that said, I'm not sure I understand why you don't want to provide support for standard OCI runtime hooks, what is the concern with this approach? We were hoping to leverage our existing OCI hook since we were missing only a small piece to make it happen.

@sboeuf
Copy link

sboeuf commented Sep 10, 2018

@flx42

With that said, I'm not sure I understand why you don't want to provide support for standard OCI runtime hooks, what is the concern with this approach? We were hoping to leverage our existing OCI hook since we were missing only a small piece to make it happen.

The main concern that I have here is that we're trying to implement the agent as an implementation of the OCI specification, but that's not what the agent is.
Now that we've sorted out the need you have for your problem, it'd be nice to get more input from a few ppl on this topic.
/cc @bergwolf @amshinde @jodh-intel WDYT ?

@flx42
Copy link
Contributor

flx42 commented Sep 10, 2018

@amshinde
Copy link
Member

Here is my take on this:
I am fine with writing the config.json inside the guest, that way OCI compliant hooks that need to be run inside the guest can be executed without any changes.
If we do things differently (lets say via environment variables), then an existing hook that is OCI compliant will need to be changed for running it inside the guest.

That said, we should write the config.json inside the guest only if hooks need to be run.
We can provide this information to the agent, using a runtime configuration such as guest_hooks_path which will be empty by default. This configuration can then be passed to the agent as part of the CreateSandbox API. Only if this path is non-empty, will the agent create the config.json file and execute the hooks at the provided path.

@bergwolf @WeiZhang555 @sboeuf What do you think?

@bergwolf
Copy link
Member

I agree with @sboeuf that kata-agent is not an OCI compatible implementation on its own.

@eguzman3 @flx42 Can you do it with a sidecar container? AFAIU, with OCI hooks, you need to add whatever is needed in the guest rootfs image. That means if a user wants to use nvidia-container-runtime-hook, they need to install a customized guest rootfs image rather than the one shipped by kata containers. OTOH, if you do it with a sidecar container, it runs on all existing kata guest rootfs images.

And you need to pay attention that whatever you put in the OCI hooks, they are executed on the host as well. Do you really need the SAME hooks to be run both on the host and in the guest?

@flx42
Copy link
Contributor

flx42 commented Sep 12, 2018

Can you do it with a sidecar container?

Driver installation is possible with a container, but at the container runtime level, not really. And isn't that a Kubernetes construct? That won't work with Docker (or another OCI frontend).

That means if a user wants to use nvidia-container-runtime-hook, they need to install a customized guest rootfs image rather than the one shipped by kata containers. OTOH, if you do it with a sidecar container, it runs on all existing kata guest rootfs images.

Yes, we are fine with that, since the rootfs will also bake our user-space driver libraries, and precompiled objects for the device driver.

And you need to pay attention that whatever you put in the OCI hooks, they are executed on the host as well. Do you really need the SAME hooks to be run both on the host and in the guest?

We want to only execute OCI hooks on the "translated" OCI spec, by discovering them automatically. There is no way to inject hooks in the spec docker generates today.

@bergwolf
Copy link
Member

Driver installation is possible with a container, but at the container runtime level, not really. And isn't that a Kubernetes construct? That won't work with Docker (or another OCI frontend).

I mean whatever initialization you do with current hooks, do it in a sidecar container, -- not running the same hooks there. And docker does support sidecars, it is just about constructing proper namespace configurations for different containers. Maybe it's too much complexity for your users though.

Yes, we are fine with that, since the rootfs will also bake our user-space driver libraries, and precompiled objects for the device driver.

In that case, how about having a builtin container inside the rootfs image? Then we can make kata containers create default sidecars when configured so. And such a design would solve the socat sidecar image issue @gnawux is trying to solve for k8s port forwarding.

We want to only execute OCI hooks on the "translated" OCI spec, by discovering them autoamtically. There is no way to inject hooks in the spec docker generates today.

Sorry, I kinda miss the whole picture here. If not docker, who will translate the OCI spec and how? Do you expect kata-runtime to do the translation and only put translated hooks in the spec when passing them to kata-agent? Please explain a bit more about the architecture you are expecting to work with.

@flx42
Copy link
Contributor

flx42 commented Sep 12, 2018

I mean whatever initialization you do with current hooks, do it in a sidecar container

It's not possible, it would require changing our whole implementation. Our hook performs bind mounts from the runtime's namespace to the container. For instance it looks for libcuda.so.390.55 in the runtime's mount namespace and bind-mount it under the container's rootfs mount point.
With a sidecar container, I guess you would have to share a mount point between the two containers, and point the application container to the libraries/binaries in this directory.
But that's not all, we also need to bind-mount devices in /dev, that's not possible with a side-car container.

Maybe it's too much complexity for your users though.

I agree. And the implementation is possible but would be hacky.

In that case, how about having a builtin container inside the rootfs image?

This would be overkill for our use case, and would not solve the problem as well as what we have today, IMO.

Sorry, I kinda miss the whole picture here. If not docker, who will translate the OCI spec and how?

Docker (or something else) gives you the OCI spec, and my understanding is that once in the guest you amend this spec before passing it to libcontainer. @sboeuf mentioned that hooks from the original OCI Spec are removed: #346 (comment)

Here, we want to add hooks from the guest rootfs, a possible way of doing that is presented in #347

@bergwolf
Copy link
Member

@flx42

But that's not all, we also need to bind-mount devices in /dev, that's not possible with a side-car container.

As long as devices share the same major/minor pair, they are shared among different containers in the same guest, which means you do not need to bindmount them. Just pass the same major/minor pair in the OCI spec, and they will be treated as the same device by kernel.

OTOH, I agree in your use case the arbitrary rootfs hooks are much simpler than adding implicit sidecars. Thanks for the explanation. Please go ahead with the modification and I'm fine with saving the spec somewhere in the rootfs.

@@ -579,6 +608,13 @@ func (a *agentGRPC) CreateContainer(ctx context.Context, req *pb.CreateContainer
return emptyResp, err
}

// Change cwd because libcontainer sets the bundle path to cwd
oldcwd, err := pb.ChangeToBundlePath(ociSpec)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are only useful when there are hooks in the rootfs. So please make it optional by:

  1. scan for rootfs hooks when agent gets up
  2. change cwd and save spec in container bundle only when necessary

@eguzman3
Copy link
Contributor Author

Consolidating this PR with #347

@eguzman3 eguzman3 closed this Sep 12, 2018
@eguzman3 eguzman3 deleted the write-oci-spec branch September 12, 2018 18:59
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants