-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support alternate runtime for host privileged operations #367
Conversation
0/0 passed on RHEL - Passed. |
server/container_create.go
Outdated
nsPath := fmt.Sprintf("/proc/%d/ns/%s", podInfraState.Pid, nsFile) | ||
if err := specgen.AddOrReplaceLinuxNamespace((string)(nsType), nsPath); err != nil { | ||
return nil, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is correct. Different containers in a Pod do not share mount
, pid
or cgroup
namespaces. And I'm fairly sure they don't share uts
or user
namespaces either.
What is the reason for this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That nsPath looks like it should use filepath.Join instead? Also string(nsType)
instead of (string)(nsType)
? (I didn't even realize the latter is even valid Go code?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(string)
-> string
is why it works.
But that's not the main point I'm making. I'm saying that the logic behind the change doesn't make sense to me (different containers in a sandbox do not share all of their namespaces).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the direction is to by default have containers in a pod share network, PID and IPC namespaces.
Caught me by surprise the other day as well. But it's switchable. Gives pods a higher level purpose I suppose.
I'm a bit confused why this logic has to live in
|
You're correct, the default k8s behavior is to share the PID, IPC and networking namespace, I'll fix that patch. |
94408d0
to
09415b9
Compare
Good points. Solution 1 is something I thought about. However, running privileged containers with any hypervisor based runtime is not possible (You can't see or change the host network namespace for example) so this would basically mean calling another runtime (e.g. |
+1 for the direction.
I think we should only switch to privileged runtime in these two conditions:
|
Ok, then we assume that the container security context will always be the same as the pod one. As you commented on k8s issue 41848, kuberuntime currently sets all containers security contexts to be the same as their pod. Right now this PR checks for both the containers and the pod security contexts, I'll fix the code to follow the assumption that they will always be identical. |
After further private discussions, this is incorrect. The security context will not always be the same, only the And as you described, we should run a privileged runtime for all containers if With that understanding, I'll fix my PR. |
f1583a9
to
404de17
Compare
@feiskyer PR updated according to the latest comments. |
404de17
to
6143f04
Compare
server/server.go
Outdated
@@ -50,6 +52,64 @@ type Server struct { | |||
appArmorProfile string | |||
} | |||
|
|||
func ociPrivileged(spec *rspec.Spec) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a comment here? Also, shouldn't we also check for uid = 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't --privileged
different from just running as uid 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mrunalp I added some comments to the routine, hopefully they make sense.
server/container_create.go
Outdated
if err := specgen.AddOrReplaceLinuxNamespace("ipc", ipcNsPath); err != nil { | ||
return nil, err | ||
for nsType, nsFile := range map[rspec.NamespaceType]string{ | ||
rspec.PIDNamespace: "pid", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can take this out for now as k8s is still discussing viability and it may be pushed to next release.
How do we handle the case where sandbox privileged is not triggered but a container is privileged? Isn't it too late? |
@mrunalp Kubelet ensures sandbox privileged is set when at least one of containers belonging to it is privileged. |
42bc143
to
5dce9d6
Compare
@@ -34,24 +34,26 @@ const ( | |||
) | |||
|
|||
// New creates a new Runtime with options provided | |||
func New(runtimePath string, conmonPath string, conmonEnv []string, cgroupManager string) (*Runtime, error) { | |||
func New(runtimePath string, runtimeHostPrivilegedPath string, conmonPath string, conmonEnv []string, cgroupManager string) (*Runtime, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Design question: would it make sense to pass in a struct with configuration instead? Every time a new parameter is added (as today), the method and all its callers have to be adjusted, and it's annoying for optional parameters (because all the call sites will need to be changed to pass in zero values like "")
Have we considered passing in configuration as a special struct type, like:
type RuntimeConfig struct {
Name string
Path string
...
}
func NewRuntime(config *RuntimeConfig) (*Runtime, error) {
r := &Runtime{
config: config,
}
return r, nil
}
(sidenote, what's the point in error
as a second return type, if errors can never occur?)
I think so. We should only check sandbox config to decide the privileged runtime. |
Not all runtimes are able to handle some of the kubelet security context options, in particular the ones granting host privileges to containers. By adding a host privileged runtime path configuration, we allow ocid to use a different runtime for host privileged operations like e.g. host namespaces access. Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
We add a privileged flag to the container and sandbox structures and can now select the appropriate runtime path for any container operations depending on that flag. Here again, the default runtime will be used for non privileged containers and for privileged ones in case there are no privileged runtime defined. Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The sandbox privileged flag is set to true only if either the pod configuration privileged flag is set to true or when any of the pod namespaces are the host ones. A container inherit its privileged flag from its sandbox, and will be run by the privileged runtime only if it's set to true. In other words, the privileged runtime (when defined) will be when one of the below conditions is true: - The sandbox will be asked to run at least one privileged container. - The sandbox requires access to either the host IPC or networking namespaces. Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
By factorizing the bind mounts generation code. Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
5dce9d6
to
f7eee71
Compare
@mrunalp Simplified the code through a pod annotation, as discussed during the ocid meeting. |
LGTM |
1 similar comment
LGTM |
Some CRI-O runtimes (like cc-oci-runtime) do not support many host privileged operations like e.g. giving access to the host namespaces or running fully privileged containers (with access to all host devices). Those runtimes usually provide a higher level of container security by e.g. running container workloads within virtual machines and therefore running host privileged containers with them makes little sense.
This pull request tries to overcome that problem by allowing
ocid
users to define a host privileged capable runtime path in addition to the default runtime path. Whenocid
gets a request for creating a container or a sandbox with either the privileged flag set or access to at least one of the host namespaces (PID, IPC or networking), it will check if a host privileged runtime is defined and use it if it is. It will obviously use the default runtime if the host privileged one is not defined.This PR only checks for the pod security context and each container within a given pod will inherit its pod privileged flags. In other words, that means we will run the privileged runtime in either one of those 2 cases:
Note
I realize this might be fixed with CRI multiple runtimes support. If/when this happens, we should be able to remove part of this PR code.
cc @feiskyer @mcastelino