Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pod annotations to container #4368

Merged

Conversation

haircommander
Copy link
Collaborator

We have the annotations SandboxID and SandboxName, let's use them. This also allows kata containers to be created in pods

partial fix to #4353 (we don't need to add the annotations manually, but still need to disable selinux labeling manually)

Signed-off-by: Peter Hunt pehunt@redhat.com

@haircommander
Copy link
Collaborator Author

PTAL @zer0def

@TomSweeneyRedHat
Copy link
Member

LGTM, assuming that's the right spot to retain the info. I'll let other chime in on that part.

@@ -322,7 +322,7 @@ func configurePod(c *GenericCLIResults, runtime *libpod.Runtime, namespaces map[
if (namespaces["uts"] == cc.Pod) || (!c.IsSet("uts") && pod.SharesUTS()) {
namespaces["uts"] = fmt.Sprintf("container:%s", podInfraID)
}
return namespaces, nil
return namespaces, pod.ID(), nil
Copy link

@zer0def zer0def Oct 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be either of these, in running state:

  • infra container's ID
  • if we don't have an infra container, the ID of whichever container is running with ContainerType == sandbox (perhaps the first attached container to the pod could become the sandbox, but this may have drawbacks I'm absolutely oblivious to)

Only then the bundle config under /var/lib/containers/storage/overlay-containers/${INFRA_CONTAINER_ID}/userdata/config.json exists for other containers in the same pod to latch onto. As far as I can tell, any container created without ContainerType=container and an appropriate SandboxID will be created as a separate QEMU VM sandbox, to the degree Kata is concerned.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the sounds of it @zer0def, it seems like we should only allow this kind of sharing if the sandbox container is an infra container. There is already mechanisms in place to guarentee the infra container isn't deleted before other containers in the pod, which seems like something that could effect kata.

For instance, if the first pod in the container is created, another is added (and attached to its vm) and the first is deleted, I'm pretty sure that container's config.json is going to be removed. Would that negatively effect the other containers in the pod?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, the intended idea would be to have the infra container be the pod's sandbox container, but I wouldn't immediately dismiss other ideas, if there's sufficient motivation for doing so with careful execution.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, I'll stick to an infra container for now

@haircommander
Copy link
Collaborator Author

/hold
this needs some thinking

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 30, 2019
@haircommander haircommander force-pushed the pod-annotations branch 3 times, most recently from 563fc76 to 1342d71 Compare October 30, 2019 19:41
@zer0def
Copy link

zer0def commented Oct 31, 2019

As of 1342d713dc6bd2aca15039f323ae87a99abfda98 along with the patch at the end of this message:

  • flow from Podman pod logic misbehaves when Kata is the runtime #4353 point 2.2. works, even if it causes the sandbox/VM to SIGABRT on pod destruction (which might be a separate issue outside of the scope of this one)
  • flow from 2.1 of the same issue fails without explicitly stating the annotations that would attach the container to a given sandbox, but given how this was used as a workaround to just get any working result, I'd say it's fine to leave it be as-is
  • play kube still fails along the same lines as 2.2 did, but that should be an easy port

I've also removed the complementing condition for podInfraID != "" that explicitly set ContainerType=sandbox just to err on the safe side in case there's ever a runtime whose behavior would conflict with this annotation in some way. After testing, it worked just fine (though I'm not running SELinux, so that might be something worth looking into). Sorry for the confusing language in the review.

diff --git a/cmd/podman/shared/create.go b/cmd/podman/shared/create.go
index b9c1aa86..283b91a5 100644
--- a/cmd/podman/shared/create.go
+++ b/cmd/podman/shared/create.go
@@ -12,7 +12,7 @@ import (
 	"syscall"
 	"time"
 
-	"github.com/containers/image/v4/manifest"
+	"github.com/containers/image/v5/manifest"
 	"github.com/containers/libpod/cmd/podman/shared/parse"
 	"github.com/containers/libpod/libpod"
 	"github.com/containers/libpod/libpod/image"
@@ -577,17 +577,16 @@ func ParseCreateOpts(ctx context.Context, c *GenericCLIResults, runtime *libpod.
 	}
 
 	// in the event this container is in a pod, and the pod has an infra container
-	// we will want to configure it as a type "container" instead of "sandbox"
-	// For kata containers, "sandbox" is the annotation that denotes the container should
-	// use its own VM, whereas "container" denotes the container should join the VM of the
-	// SandboxID (the infra container).
-	// counterintuatively, if a container is not in a pod, it needs to be "sandbox", so its own
-	// VM is created
+	// we will want to configure it as a type "container" instead defaulting to
+	// the behavior of a "sandbox" container
+	// In Kata containers:
+	// - "sandbox" is the annotation that denotes the container should use its own
+	//   VM, which is the default behavior
+	// - "container" denotes the container should join the VM of the SandboxID
+	//   (the infra container)
 	if podInfraID != "" {
 		annotations[ann.SandboxID] = podInfraID
 		annotations[ann.ContainerType] = ann.ContainerTypeContainer
-	} else {
-		annotations[ann.ContainerType] = ann.ContainerTypeSandbox
 	}
 
 	if data != nil {

@rhatdan
Copy link
Member

rhatdan commented Nov 5, 2019

@haircommander Any update on this PR?

@haircommander
Copy link
Collaborator Author

@rhatdan @zer0def whoops, this fell through the cracks. Added the changes as suggested. PTAL

@rhatdan
Copy link
Member

rhatdan commented Nov 7, 2019

/lgtm
/approve

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 7, 2019
@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: haircommander, rhatdan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 7, 2019
@zer0def
Copy link

zer0def commented Nov 8, 2019

This inevitably sets itself up for "why does it work on it's own and not through play kube?" remarks, but as far as I'm concerned, looks good.

@rhatdan
Copy link
Member

rhatdan commented Nov 8, 2019

@zer0def could we make it work with play kube?

@rh-atomic-bot
Copy link
Collaborator

☔ The latest upstream changes (presumably #4265) made this pull request unmergeable. Please resolve the merge conflicts.

@zer0def
Copy link

zer0def commented Nov 8, 2019

@rhatdan I'm not much of a golang coder, but since current changes revolve around cmd/podman/shared/create.go:ParseCreateOpts that returns a CreateConfig, my guess would be that similar logic could be introduced to pkg/adapter/pods.go:kubeContainerToCreateConfig. It would require confirmation from someone a bit better versed with the language and codebase than I current am, though.

@haircommander
Copy link
Collaborator Author

I can take care of that later today

@zer0def
Copy link

zer0def commented Nov 8, 2019

Something like this does the trick, though I would deem it more of a hack than a proper solution, with code duplication and all.

diff --git a/pkg/adapter/pods.go b/pkg/adapter/pods.go
index 6648edc8..ae118c79 100644
--- a/pkg/adapter/pods.go
+++ b/pkg/adapter/pods.go
@@ -17,6 +17,7 @@ import (
        "github.com/containers/libpod/libpod"
        "github.com/containers/libpod/libpod/image"
        "github.com/containers/libpod/pkg/adapter/shortcuts"
+       ann "github.com/containers/libpod/pkg/annotations"
        ns "github.com/containers/libpod/pkg/namespaces"
        createconfig "github.com/containers/libpod/pkg/spec"
        "github.com/containers/libpod/pkg/util"
@@ -791,6 +792,19 @@ func kubeContainerToCreateConfig(ctx context.Context, containerYAML v1.Container
        }
        containerConfig.Env = envs
 
+       annotations := make(map[string]string)
+       pod, err := runtime.LookupPod(podID)
+       if err != nil {
+               return nil, err
+       }
+       podInfraID, err := pod.InfraContainerID()
+       if err != nil {
+               return nil, err
+       }
+       annotations[ann.SandboxID] = podInfraID
+       annotations[ann.ContainerType] = ann.ContainerTypeContainer
+       containerConfig.Annotations = annotations
+
        for _, volume := range containerYAML.VolumeMounts {
                hostPath, exists := volumes[volume.Name]
                if !exists {

@openshift-ci-robot openshift-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed lgtm Indicates that a PR is ready to be merged. labels Nov 8, 2019
@haircommander
Copy link
Collaborator Author

@zer0def thanks, your approach was basically what I was going to do :)

We have the annotations SandboxID, let's use them. This also allows kata containers to be created in pods and share a VM with the infra container. Note: as of now, this sharing only works if the pod has an infra container

Signed-off-by: Peter Hunt <pehunt@redhat.com>
Signed-off-by: Peter Hunt <pehunt@redhat.com>
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 8, 2019
@zer0def
Copy link

zer0def commented Nov 9, 2019

Can confirm both cases of creating a pod with an infra container and play kube run just fine with this PR.

@rhatdan
Copy link
Member

rhatdan commented Nov 9, 2019

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 9, 2019
@haircommander
Copy link
Collaborator Author

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 15, 2019
@openshift-merge-robot openshift-merge-robot merged commit 51c08f3 into containers:master Nov 15, 2019
@haircommander haircommander deleted the pod-annotations branch November 15, 2019 19:03
@amshinde
Copy link

@haircommander Thanks for the PR.
I verified that the containers for a pod are launched in a single qemu instance for Kata with this PR provided that the infra container is present.
Afaik, Kata should not have a dependency for the infra container, (we had one earlier for sharing PID namespaces)
So, I wanted to try to see if that would work with podman, but what I am seeing with podman is that without the infra container, containers in a pod have different IP addresses and hostname, which surprised me.
Aren't containers in a pod supposed to have the same IP address and share network,uts and ipc namespaces?

cc @giuseppe

@haircommander
Copy link
Collaborator Author

@amshinde right now, we're kind of using the infra container as a crutch (mostly because I don't really understand the needs of kata).

Let's imagine pods and kata work without infra. if a pod is created without an infra container, the first container would become the owner of the vm (annotated as sandbox) and the other containers that are added to the pod attach to that vm (annotated as container).

If the first container in the pod is removed (which it can be if it's not marked an infra), what happens to that VM? who should the other containers add as a reference for sandboxID (to attach to the VM)

Relying on the infra container here is the most clear way to make sure there is a container running that holds the VM, that the other containers can always rely on joining to.

is my understanding correct for how the kata VM works?

@amshinde
Copy link

Let's imagine pods and kata work without infra. if a pod is created without an infra container, the first container would become the owner of the vm (annotated as sandbox) and the other containers that are added to the pod attach to that vm (annotated as container).

You are right there. But Kata also does not rely on the pause container being the last one to exit iirc.
So, I thought this could work. (The sandboxId could be the id of the first container or a uniquely generated one)
But given the limitation of pods without infra container in podman today(#4531), this would definitely not work with Kata. Kata does rely on containers sharing the same namespaces and having a single IP.

@zer0def
Copy link

zer0def commented Nov 19, 2019

@amshinde You can still create a pod without an infra container (which is also what would be, as you put it, "a uniquely generated one") by following these instructions coutresy of @mheon and substituting images and commands as you see fit. It's just unnecessarily tideous, on top of not providing namespace guarantees for other containers in the pod to rely on, as @haircommander outlined.

Do mind that this PR hasn't yet been part of a stable release, in which case the behavior you're describing is expected as of the current latest stable, 1.6.3, because each container will be started as a separate VM due to not receiving appropriate container-type and sandbox-id annotations, unless explicitly specified by the end user.

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 26, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 26, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants