-
Notifications
You must be signed in to change notification settings - Fork 476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Agent K8S workload attestor support for cgroups v2 #4004
Comments
Thank you for opening this @tihhoni , and for sharing the links - they're very helpful I see that kubelet will use cgroups v2 by default any time that it is enabled. Do you have a sense for which or how many distributions are shipping with cgroups v2 enabled currently? Did you bump into this issue directly? |
Here's some info that I was able to find regarding which distros are already being shipped with groups v2 enabled by default:
Ref: https://kubernetes.io/blog/2022/08/31/cgroupv2-ga-1-25/#how-do-you-use-cgroup-v2 |
Hi @evan2645 From various sources i've got the feeling that cgroup v1 will be deprecated EOY 2023. So presumably new kernel OS updates will come exclusively with cgroup v2 support. We've run into this issue in staging cluster, where OS upgrade automatically rolled out new linux distribution and tests started to fail. I am still running tests to understand what impact that will cause. What i know so far:
So it seems that PID cgroup might have container/pod uid info in some cases, i just did not figure out what that condition is. If i get more results i will surely post it here. -- |
After some testing it seems that issue is not that dire: spire-agent run PID = 302967 Both run pods
So from within the container if you try to read your own cgroup - its root without container id. But if you read other process cgroup its fine. As a result spire agent can attest other k8s pids but not itself. This might be different depending on the cluster setup. |
Thank you very much @tihhoni for looking into this further ❤️ Can you tell if the problem only affects "self", or if it affects e.g. all pods running as root? I'm marking this as priority/backlog for now, but if it affects all root pods, we may consider upgrading it to urgent. Thanks again! |
We've noticed similar changes will be needed for the dockerd-workloadattestor to support cgroups v2. We have a project that will require cgroups v2 support in dockerd-workloadattestor beginning in May 2023. After a brief discussion in the contributor sync today, we reached a consensus that it makes sense to introduce logic to add this support across several PRs:
|
@evan2645 my impression was only "self" is the problem. I tested full ingress/egress envoy setup and attestation works fine (spire server can successfully get envoy pids container ids) Forgot to mention: we run things in containerd 1.6.* and k8s 1.25.* i did not check other runtimes |
v1 will also need to be supported for some time. This is probably what you meant, but want to make sure. |
I am again running into this issue, but now from different perspective. just wanted to share (more related to cgroupsns but in all cases running on cgroups v2) Seems that tools are slowly moving away from cgroup v1 and also using cgroupns=private (e.g. last docker daemon and kind releases) this creates a problem for running some integration tests where you have cgroupns hardcoded if we run tests with dind in k8s (cgroupns=private) if we run tests with dind cgroupsns=host + --cgroup-parent=/actions_job then they work fine but then we face issues with kind cluster tests that mess up main k8s kubelet cgroup we can patch CGROUP_MATCHERS for us for tests of course (maybe its good to allow users to set it too, default matcher just a heads up for now. if your GA VM will switch docker daemon to cgroupns=private on cgroups v2 |
@nikotih thank you very much for reporting this ❤️ would you mind filing it as a new issue specifically around cgroupns=private? I think that warrants separate attention and it will be good to triage it independently (but still in relation to this one) |
Sure thing i can. However, i think cgroups v2 and cgroups namespace private go hand in hand. cgroups v2 enabled kubelet and docker daemon will automatically start with namespace private (unless explicitly changed) original issue that i opened here, actual root cause is also namespace private:
Discalimer: Non systemd environment
aka for spire agent on cgroups v2 and namespace private, it cannot deduce its own container starting 2 containers with
PIDS:
First image:
Second image:
Same thing:
|
Ok, I appreciate this @nikotih . I don't have the time to dig in right now but I'll move this issue back to triage so we can reassess the impact. |
The docker and k8s workload attestors work backwards from pid to container by inspecting the proc filesystem. Today, this happens by inspecting the cgroup file. Identifying the container ID (and pod UID) from the cgroup file has been a continual arms race. The k8s and docker workload attestors grew different mechanisms for trying to deal with the large variety in the output. Further, with cgroups v2 and private namespaces, the cgroup file might not have the container ID or pod UID information within it. This PR unifies the container ID (and pod UID) extraction for both the docker and k8s workload attestors. The new implementation searches the mountinfo file first for cgroups mounts. If not found, it will fall back to the cgroup file (typically necessary only when the workload is running in the same container as the agent). The extraction algorithm is the same for both mountinfo and cgroup entries, and is as follows: 1. Iterator over each entry in the file being searched, extracting either the cgroup mount root (mountinfo) or the cgroup group path (cgroup) as the source path. 2. Walk backwards through the segments in the source path looking for the 64-bit hex digit container ID. 3. If looking for the pod UID (K8s only), then walk backwards through the segments in the path looking for the pod UID pattern used by kubelet. Start with the segment the container ID was found in (truncated to remove the container ID portion). 4. If there are pod UID/container ID conflicts after searching these files then log and abort. Entries that have a pod UID override those that don't. The container ID is very often contained in the last segment in the path but there are situations where it isn't. This new functionality is NOT enabled by default, but opted in using the `use_new_container_locator` configurable in each plugin. In 1.10, we can consider enabling it by default. The testing for the new code is spread out a little bit. The cgroups fallback functionality is mostly tested by the existing tests in the k8s and docker plugin tests. The mountinfo tests are only in the new containerinfo package. In the long term, I'd like to see all of the container info extraction related tests moved solely to the containerinfo package and removed from the individual plugins. Resolves spiffe#4004, resolves spiffe#4682, resolves spiffe#4917. Signed-off-by: Andrew Harding <azdagron@gmail.com>
The docker and k8s workload attestors work backwards from pid to container by inspecting the proc filesystem. Today, this happens by inspecting the cgroup file. Identifying the container ID (and pod UID) from the cgroup file has been a continual arms race. The k8s and docker workload attestors grew different mechanisms for trying to deal with the large variety in the output. Further, with cgroups v2 and private namespaces, the cgroup file might not have the container ID or pod UID information within it. This PR unifies the container ID (and pod UID) extraction for both the docker and k8s workload attestors. The new implementation searches the mountinfo file first for cgroups mounts. If not found, it will fall back to the cgroup file (typically necessary only when the workload is running in the same container as the agent). The extraction algorithm is the same for both mountinfo and cgroup entries, and is as follows: 1. Iterator over each entry in the file being searched, extracting either the cgroup mount root (mountinfo) or the cgroup group path (cgroup) as the source path. 2. Walk backwards through the segments in the source path looking for the 64-bit hex digit container ID. 3. If looking for the pod UID (K8s only), then walk backwards through the segments in the path looking for the pod UID pattern used by kubelet. Start with the segment the container ID was found in (truncated to remove the container ID portion). 4. If there are pod UID/container ID conflicts after searching these files then log and abort. Entries that have a pod UID override those that don't. The container ID is very often contained in the last segment in the path but there are situations where it isn't. This new functionality is NOT enabled by default, but opted in using the `use_new_container_locator` configurable in each plugin. In 1.10, we can consider enabling it by default. The testing for the new code is spread out a little bit. The cgroups fallback functionality is mostly tested by the existing tests in the k8s and docker plugin tests. The mountinfo tests are only in the new containerinfo package. In the long term, I'd like to see all of the container info extraction related tests moved solely to the containerinfo package and removed from the individual plugins. Resolves spiffe#4004, resolves spiffe#4682, resolves spiffe#4917. Signed-off-by: Andrew Harding <azdagron@gmail.com>
* New container locator for docker/k8s on linux The docker and k8s workload attestors work backwards from pid to container by inspecting the proc filesystem. Today, this happens by inspecting the cgroup file. Identifying the container ID (and pod UID) from the cgroup file has been a continual arms race. The k8s and docker workload attestors grew different mechanisms for trying to deal with the large variety in the output. Further, with cgroups v2 and private namespaces, the cgroup file might not have the container ID or pod UID information within it. This PR unifies the container ID (and pod UID) extraction for both the docker and k8s workload attestors. The new implementation searches the mountinfo file first for cgroups mounts. If not found, it will fall back to the cgroup file (typically necessary only when the workload is running in the same container as the agent). The extraction algorithm is the same for both mountinfo and cgroup entries, and is as follows: 1. Iterator over each entry in the file being searched, extracting either the cgroup mount root (mountinfo) or the cgroup group path (cgroup) as the source path. 2. Walk backwards through the segments in the source path looking for the 64-bit hex digit container ID. 3. If looking for the pod UID (K8s only), then walk backwards through the segments in the path looking for the pod UID pattern used by kubelet. Start with the segment the container ID was found in (truncated to remove the container ID portion). 4. If there are pod UID/container ID conflicts after searching these files then log and abort. Entries that have a pod UID override those that don't. The container ID is very often contained in the last segment in the path but there are situations where it isn't. This new functionality is NOT enabled by default, but opted in using the `use_new_container_locator` configurable in each plugin. In 1.10, we can consider enabling it by default. The testing for the new code is spread out a little bit. The cgroups fallback functionality is mostly tested by the existing tests in the k8s and docker plugin tests. The mountinfo tests are only in the new containerinfo package. In the long term, I'd like to see all of the container info extraction related tests moved solely to the containerinfo package and removed from the individual plugins. Resolves #4004, resolves #4682, resolves #4917. Signed-off-by: Andrew Harding <azdagron@gmail.com> * missing new arg Signed-off-by: Andrew Harding <azdagron@gmail.com> * fix windows tests Signed-off-by: Andrew Harding <azdagron@gmail.com> * fix windows tests and lint Signed-off-by: Andrew Harding <azdagron@gmail.com> * address pr comments Signed-off-by: Andrew Harding <azdagron@gmail.com> * markdown lint Signed-off-by: Andrew Harding <azdagron@gmail.com> * add agent full conf Signed-off-by: Andrew Harding <azdagron@gmail.com> * fix labels Signed-off-by: Andrew Harding <azdagron@gmail.com> * change log to warn Signed-off-by: Andrew Harding <azdagron@gmail.com> * use new locator in it Signed-off-by: Andrew Harding <azdagron@gmail.com> --------- Signed-off-by: Andrew Harding <azdagron@gmail.com>
Currently k8s workload attestor is extracting podUID and containerUID from /proc/<...pid...>/cgroup
https://github.com/spiffe/spire/blob/main/pkg/agent/plugin/workloadattestor/k8s/k8s.go#L206
https://github.com/spiffe/spire/blob/main/pkg/agent/common/cgroups/cgroups.go#L31
This works for cgroups v1, but it does not work with cgroup v2
For v2 /proc/<...pid...>/cgroup will always be:
0::/
Which results in no containerUID extracted and thus no k8s workload attestation possible
Are you planning to migrate it to support cgroups v2? e.g. looking at some solutions it seems its possible to extract containerUID from /proc/self/mountinfo
Background:
The text was updated successfully, but these errors were encountered: