Skip to content
This repository has been archived by the owner on Apr 3, 2018. It is now read-only.

shim: Start shims inside appropriate namespaces #637

Merged
merged 4 commits into from
Mar 2, 2018
Merged

Conversation

sboeuf
Copy link
Collaborator

@sboeuf sboeuf commented Feb 24, 2018

This PR first creates a new package nsenter so that we can spawn any code into a set of namespaces. Then, the virtcontainers code is modified to rely on this new package. At last, it starts shims from agents within the right PID namespace. In details, a shim representing a container process should enter its own PID namespace so that we can spawn other shims corresponding to exec'ed processes inside the same namespace. This is needed if we don't want some race conditions to happen, due to the exec'ed shims to never return their exit codes if the container process returned first.

Fixes #613

@sboeuf sboeuf force-pushed the shim_enter_ns branch 2 times, most recently from b206a54 to a3b5c9b Compare February 24, 2018 01:47
}

func getNSPathFromPID(pid int, nsType NSType) string {
return fmt.Sprintf("/proc/%d/ns/%s", pid, nsType)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: (file)path.Join() maybe (here and below)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I will replace it.

@@ -0,0 +1,183 @@
//
// Copyright (c) 2018 Intel Corporation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit mentions "Based on the CNI implementation of the
package 'ns' on the plugins repository"
, so do you need to reference some of those details here? Or is this a clean-room re-implementation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really a brand new re-implementation here. I have just got inspiration from what the ns package is doing for CNI.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's anything that was loosely copied from the ns package, we have to include the right copyrights.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I understand, then I'll add them.


// List of namespace types.
const (
NSTypeCGroup NSType = "cgroup"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bonus points for sorting the NSType's and CloneFlagsTable alphabetically 😄

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

;)

return file, nil
}

func sortNSList(nsList []Namespace) []Namespace {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a unit-test for this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely !


// NsEnter executes the passed closure under the given namespace,
// restoring the original namespace afterwards.
func NsEnter(unsortedNSList []Namespace, toRun func() error) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the importance and delicacy of this change, we really could do with some unit tests. I think you should be able to write unit tests for these functions that for examle create a new ns, switch to it and record their (hopefully new) namespace, assuming they all start with:

if os.Geteuid() != 0 {
    t.Skip(testDisabledAsNonRoot)
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was planning on implementing some testing here. I'll work on that today.

NSTypeCGroup NSType = "cgroup"
NSTypeIPC = "ipc"
NSTypeNet = "net"
NSTypeMount = "mnt"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sboeuf As discussed I had tried doing the exact same thing taking inspiration from the CNI implementation for net namespaces. Unfortunately setns in golang does not work for mount namespaces due to multithreaded nature of golang. We may have to resort to C code to be able to switch to mount namespaces, similar to what libcontainer does.

If we are trying to solve the shim exit bug for exec, we may get away with solving this by just switching to PID namespaces.

@amshinde
Copy link
Collaborator

@sboeuf Did some digging up on the issues filed surrounding the mount namespace issue when I was implementing support for devices. See this:
golang/go#8676

Also, that issue has code that checks if the namespace change was successful, we could do something similar in our tests:
https://storage.googleapis.com/go-attachment/8676/0/checkns.c

@amshinde
Copy link
Collaborator

@sboeuf Doing things the way you are doing currently works (most of the time) and provides for a simple solution (instead of reexecing a new process), but I came across this article which does state that it is not 100% foolproof : https://www.weave.works/blog/linux-namespaces-and-go-don-t-mix

Basically golang may spawn a new thread for a blocking syscall even after LockOsThread()
We have an issue in vishwanda/netlink repo related to the above:
vishvananda/netns#17

The issue seems to have been resolved in golang 1.10, but we may see an unsuccessful setns call in earlier versions.

Doing things in the shim code, may not be such a bad approach after all.

@sboeuf sboeuf force-pushed the shim_enter_ns branch 2 times, most recently from cbf2a11 to acfe46d Compare February 27, 2018 01:02
@sboeuf
Copy link
Collaborator Author

sboeuf commented Feb 27, 2018

@jodh-intel PR updated with almost full coverage for unit tests and more than that a real test that we're entering namespaces.
As @amshinde explained, we have some limitations with Go, even if this code is supposed to be foolproof.

  • About limitations, both mount and user namespaces cannot be entered with the Setns() syscall from a Go program because it's multithreaded. I honestly don't get the "why" of this limitation but after all I have read and my local testing, I can confirm we cannot enter those namespaces if we don't involve some C code, or the use of an external binary such as nsenter.
  • About being foolproof, I think this is safe since we know that we should not pass a callback involving a go routine through this package, otherwise it won't be executed in the expected namespace. Also, the second issue about having some new threads being created in case the current one takes too long should not happen in our case and will be solved with Go 1.10.

Basically, this package does the most that a Go package can do with netns. So my proposal is that we could take it in since it will help spawning our shims into expected PID and network namespaces, and worst case, it won't do any harm in case we hit the issues explained above once every 10000000000 times.

At least this whole PR brings the structure for the shims to be switched to the expected namespaces.

I think we should try to look into a better way to solve this, but this will involve more researches.

@sboeuf
Copy link
Collaborator Author

sboeuf commented Feb 27, 2018

@sameo @jodh-intel @amshinde PR is ready for another review ;)

return nil
}

func getFileFromNS(nsPath string) (*os.File, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this function is not exported, but it is not very intuitive what this function does without looking at the code. Could you add some documentation comments here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

go func() {
defer wg.Done()
runtime.LockOSThread()
innerError = containedCall()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't there be an UnlockOsThread() call here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sure I think we can do that. But it won't change anything since this goroutine ends after that.

assert.Nil(t, err, "Should not fail since closure should return nil: %v", err)
}

func areNSIdentical(nsPath1, nsPath2 string) (bool, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think you are calling this function anywhere in the tests.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah you're right, that was a temp function that I should have removed !


evalExpectedNSPath, err := filepath.EvalSymlinks(ns.Path)
if err != nil {
evalExpectedNSPath = err.(*os.PathError).Path
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be useful to add comments here describing how you are doing the comparison.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes good point

@@ -314,7 +315,22 @@ func (h *hyper) exec(pod *Pod, c Container, cmd Cmd) (*Process, error) {
Process: *hyperProcess,
}

process, err := prepareAndStartShim(pod, h.shim, c.id, token, h.state.URL, cmd)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer passing additional arguments here rather than duplicating the code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough ! I have removed it.

Sebastien Boeuf added 4 commits February 28, 2018 14:59
This package goal is to make very simple the execution of any code
inside any set of namespaces. Based on the CNI implementation of the
package 'ns' on the plugins repository, this new package takes a list
of namespaces as input and a routine that should be called once all
those namespaces have been entered.

This way, this package provide a generic way to spawn any binary into
the expected set of namespaces. It exports only one function NsEnter()
making the usage very simple.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit gives the ability to start a new shim process inside
a set of new namespaces. This will be useful to start the container
process which should be "init" process of its PID namespace.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This new commit adds to the shim the ability to be run within any
existing namespace. This will be very useful in order to start shims
representing exec'ed processes related to an existing container. In
this case, we will need to start those shims in the namespaces of the
container process shim.

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The shim representing the container process needs to be in its own
PID namespace as the "init" process. The reason is that we need to
make sure it will wait for its exec'ed processes in the same PID
namespace to be killed before to return. This happens inside each
agent, but if this is not reproduced on the host, we get some race
conditions leading the shim container process to return before its
exec'ed processes are killed. And this leads those shims to never
return their exit codes.

This commit solves this issue by making sure a new shim representing
a container process enters a new PID namespace, and any exec'ed
process related to this container enters this PID namespace previously
created.

Fixes #613

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
@sboeuf
Copy link
Collaborator Author

sboeuf commented Feb 28, 2018

@amshinde PR updated, PTAL

Copy link
Collaborator

@amshinde amshinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@sboeuf
Copy link
Collaborator Author

sboeuf commented Mar 2, 2018

@grahamwhaley @jodh-intel ready to be merged

@grahamwhaley grahamwhaley merged commit 24ee778 into master Mar 2, 2018
@sboeuf sboeuf deleted the shim_enter_ns branch March 2, 2018 18:03
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants