Agent K8S workload attestor support for cgroups v2 #4004

tihhoni · 2023-03-21T15:52:38Z

Version: 1.5.5+
Platform: K8S with amd64 linux kernel using cgroups v2
Subsystem: Agent k8s workload attestor

Currently k8s workload attestor is extracting podUID and containerUID from /proc/<...pid...>/cgroup
https://github.com/spiffe/spire/blob/main/pkg/agent/plugin/workloadattestor/k8s/k8s.go#L206
https://github.com/spiffe/spire/blob/main/pkg/agent/common/cgroups/cgroups.go#L31

This works for cgroups v1, but it does not work with cgroup v2
For v2 /proc/<...pid...>/cgroup will always be:
0::/
Which results in no containerUID extracted and thus no k8s workload attestation possible

Are you planning to migrate it to support cgroups v2? e.g. looking at some solutions it seems its possible to extract containerUID from /proc/self/mountinfo

Background:

evan2645 · 2023-03-21T19:10:10Z

Thank you for opening this @tihhoni , and for sharing the links - they're very helpful

I see that kubelet will use cgroups v2 by default any time that it is enabled. Do you have a sense for which or how many distributions are shipping with cgroups v2 enabled currently? Did you bump into this issue directly?

guilhermocc · 2023-03-22T19:21:27Z

Here's some info that I was able to find regarding which distros are already being shipped with groups v2 enabled by default:

Container Optimized OS (since M97)
Ubuntu (since 21.10)
Debian GNU/Linux (since Debian 11 Bullseye)
Fedora (since 31)
Arch Linux (since April 2021)
RHEL and RHEL-like distributions (since 9)

Ref: https://kubernetes.io/blog/2022/08/31/cgroupv2-ga-1-25/#how-do-you-use-cgroup-v2

tihhoni · 2023-03-22T19:41:21Z

Hi @evan2645

From various sources i've got the feeling that cgroup v1 will be deprecated EOY 2023. So presumably new kernel OS updates will come exclusively with cgroup v2 support.

We've run into this issue in staging cluster, where OS upgrade automatically rolled out new linux distribution and tests started to fail.

I am still running tests to understand what impact that will cause. What i know so far:

spire agent run PID does not have any container/pod details in /proc/<...pid...>/cgroup
this makes spire agent unaware that it runs in k8s, so attempt to use CLI e.g. ./bin/spire-agent api fetch will always result in "No identity issued" with empty selectors
surprisingly when i use a "client" from tutorial that does "watch" svids that one has container/pod UID in /proc/<...pid...>/cgroup
https://github.com/spiffe/spire-tutorials/blob/main/k8s/quickstart/client-deployment.yaml

So it seems that PID cgroup might have container/pod uid info in some cases, i just did not figure out what that condition is.

If i get more results i will surely post it here.

--
Nikolai

tihhoni · 2023-03-23T09:38:40Z

After some testing it seems that issue is not that dire:

spire-agent run PID = 302967
busybox sleep PID = 329596

Both run pods

hostPID: true
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet

k exec -it busybox -- /bin/sh
/ # cat /proc/302967/cgroup
0::/../../kubepods-besteffort-pod02b5aa9b_9013_49b5_9755_efeb06552752.slice/cri-containerd-222a5371beed8c0ad58e23d1cd39e13bc113915a385d3fea4510e1cd404868c2.scope
/ # cat /proc/329596/cgroup
0::/

k exec -it ds/spire-agent -c spire-agent -- /bin/sh
/opt/spire # cat /proc/302967/cgroup
0::/
/opt/spire # cat /proc/329596/cgroup
0::/../../kubepods-besteffort-podb2dce23b_8b3a_4356_b903_56152d488916.slice/cri-containerd-cf6d9de603ab1f665fc84f88c22f761678fcaae86b98a3936e0b9717740146aa.scope

So from within the container if you try to read your own cgroup - its root without container id. But if you read other process cgroup its fine.

As a result spire agent can attest other k8s pids but not itself.

This might be different depending on the cluster setup.

evan2645 · 2023-03-28T14:02:18Z

Thank you very much @tihhoni for looking into this further ❤️

Can you tell if the problem only affects "self", or if it affects e.g. all pods running as root? I'm marking this as priority/backlog for now, but if it affects all root pods, we may consider upgrading it to urgent. Thanks again!

zmt · 2023-03-28T19:21:32Z

We've noticed similar changes will be needed for the dockerd-workloadattestor to support cgroups v2. We have a project that will require cgroups v2 support in dockerd-workloadattestor beginning in May 2023. After a brief discussion in the contributor sync today, we reached a consensus that it makes sense to introduce logic to add this support across several PRs:

introduce logic to support cgroups v2
migrate k8s workload attestor to use the cgroups v2 logic
migrate dockerd workload attestor to use the cgroups v2 logic

tihhoni · 2023-03-31T09:00:33Z

@evan2645 my impression was only "self" is the problem. I tested full ingress/egress envoy setup and attestation works fine (spire server can successfully get envoy pids container ids)

Forgot to mention: we run things in containerd 1.6.* and k8s 1.25.* i did not check other runtimes

kfox1111 · 2023-04-11T16:32:21Z

migrate k8s workload attestor to use the cgroups v2 logic

v1 will also need to be supported for some time. This is probably what you meant, but want to make sure.

nikotih · 2024-01-18T12:07:21Z

I am again running into this issue, but now from different perspective. just wanted to share (more related to cgroupsns but in all cases running on cgroups v2)

Seems that tools are slowly moving away from cgroup v1 and also using cgroupns=private (e.g. last docker daemon and kind releases)
https://docs.docker.com/engine/reference/commandline/dockerd/#daemon
https://github.com/kubernetes-sigs/kind/releases/tag/v0.20.0

this creates a problem for running some integration tests where you have cgroupns hardcoded
https://github.com/spiffe/spire/blob/main/test/integration/suites/nested-rotation/00-setup#L32

if we run tests with dind in k8s (cgroupns=private)
docker cgroups that agent workload attester sees looks like /../<id> (--cgroup-parent=/actions_job on docker deamon does nothing here)

if we run tests with dind cgroupsns=host + --cgroup-parent=/actions_job then they work fine but then we face issues with kind cluster tests that mess up main k8s kubelet cgroup

we can patch CGROUP_MATCHERS for us for tests of course (maybe its good to allow users to set it too, default matcher /docker/<id> does not work in this case too)

just a heads up for now. if your GA VM will switch docker daemon to cgroupns=private on cgroups v2

evan2645 · 2024-01-18T16:20:41Z

@nikotih thank you very much for reporting this ❤️ would you mind filing it as a new issue specifically around cgroupns=private? I think that warrants separate attention and it will be good to triage it independently (but still in relation to this one)

nikotih · 2024-01-19T06:29:00Z

Sure thing i can. However, i think cgroups v2 and cgroups namespace private go hand in hand.

cgroups v2 enabled kubelet and docker daemon will automatically start with namespace private (unless explicitly changed)

original issue that i opened here, actual root cause is also namespace private:

docker info
Client:
 Version:    24.0.7
 Context:    desktop-linux
...
Server:
 Cgroup Driver: cgroupfs
 Cgroup Version: 2
...

Discalimer: Non systemd environment

docker run --rm alpine cat /proc/1/cgroup
0::/

docker run --rm --cgroupns=private alpine cat /proc/1/cgroup
0::/

docker run --rm --cgroupns=host alpine cat /proc/1/cgroup
0::/docker/46f093fa43676f4cfda0fe1fbf49a6c959f56e22808e2592e1d1e7bd0366ed0c

aka for spire agent on cgroups v2 and namespace private, it cannot deduce its own container

starting 2 containers with

docker run -it --rm --cgroupns=private --pid=host alpine /bin/sh

PIDS:

2809 root      0:00 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 3c217b7f3a40352850d757d3f5f0e263731425b0350
2835 root      0:00 /bin/sh

2977 root      0:00 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 67b44b40c304559915db0b688f60e83555c40f26e1d
3003 root      0:00 /bin/sh

First image:

/ # cat /proc/2835/cgroup
0::/
/ # cat /proc/3003/cgroup
0::/../67b44b40c304559915db0b688f60e83555c40f26e1d4712f0b7dc91dafe3ddd4

Second image:

/ # cat /proc/2835/cgroup
0::/../3c217b7f3a40352850d757d3f5f0e263731425b03505dd73420d9825ec76d9b2
/ # cat /proc/3003/cgroup
0::/

Same thing:

its own container it sees as root
other containers it sees not as /docker/<id> but as /../<id>

evan2645 · 2024-01-19T06:55:53Z

Ok, I appreciate this @nikotih . I don't have the time to dig in right now but I'll move this issue back to triage so we can reassess the impact.

The docker and k8s workload attestors work backwards from pid to container by inspecting the proc filesystem. Today, this happens by inspecting the cgroup file. Identifying the container ID (and pod UID) from the cgroup file has been a continual arms race. The k8s and docker workload attestors grew different mechanisms for trying to deal with the large variety in the output. Further, with cgroups v2 and private namespaces, the cgroup file might not have the container ID or pod UID information within it. This PR unifies the container ID (and pod UID) extraction for both the docker and k8s workload attestors. The new implementation searches the mountinfo file first for cgroups mounts. If not found, it will fall back to the cgroup file (typically necessary only when the workload is running in the same container as the agent). The extraction algorithm is the same for both mountinfo and cgroup entries, and is as follows: 1. Iterator over each entry in the file being searched, extracting either the cgroup mount root (mountinfo) or the cgroup group path (cgroup) as the source path. 2. Walk backwards through the segments in the source path looking for the 64-bit hex digit container ID. 3. If looking for the pod UID (K8s only), then walk backwards through the segments in the path looking for the pod UID pattern used by kubelet. Start with the segment the container ID was found in (truncated to remove the container ID portion). 4. If there are pod UID/container ID conflicts after searching these files then log and abort. Entries that have a pod UID override those that don't. The container ID is very often contained in the last segment in the path but there are situations where it isn't. This new functionality is NOT enabled by default, but opted in using the `use_new_container_locator` configurable in each plugin. In 1.10, we can consider enabling it by default. The testing for the new code is spread out a little bit. The cgroups fallback functionality is mostly tested by the existing tests in the k8s and docker plugin tests. The mountinfo tests are only in the new containerinfo package. In the long term, I'd like to see all of the container info extraction related tests moved solely to the containerinfo package and removed from the individual plugins. Resolves spiffe#4004, resolves spiffe#4682, resolves spiffe#4917. Signed-off-by: Andrew Harding <azdagron@gmail.com>

* New container locator for docker/k8s on linux The docker and k8s workload attestors work backwards from pid to container by inspecting the proc filesystem. Today, this happens by inspecting the cgroup file. Identifying the container ID (and pod UID) from the cgroup file has been a continual arms race. The k8s and docker workload attestors grew different mechanisms for trying to deal with the large variety in the output. Further, with cgroups v2 and private namespaces, the cgroup file might not have the container ID or pod UID information within it. This PR unifies the container ID (and pod UID) extraction for both the docker and k8s workload attestors. The new implementation searches the mountinfo file first for cgroups mounts. If not found, it will fall back to the cgroup file (typically necessary only when the workload is running in the same container as the agent). The extraction algorithm is the same for both mountinfo and cgroup entries, and is as follows: 1. Iterator over each entry in the file being searched, extracting either the cgroup mount root (mountinfo) or the cgroup group path (cgroup) as the source path. 2. Walk backwards through the segments in the source path looking for the 64-bit hex digit container ID. 3. If looking for the pod UID (K8s only), then walk backwards through the segments in the path looking for the pod UID pattern used by kubelet. Start with the segment the container ID was found in (truncated to remove the container ID portion). 4. If there are pod UID/container ID conflicts after searching these files then log and abort. Entries that have a pod UID override those that don't. The container ID is very often contained in the last segment in the path but there are situations where it isn't. This new functionality is NOT enabled by default, but opted in using the `use_new_container_locator` configurable in each plugin. In 1.10, we can consider enabling it by default. The testing for the new code is spread out a little bit. The cgroups fallback functionality is mostly tested by the existing tests in the k8s and docker plugin tests. The mountinfo tests are only in the new containerinfo package. In the long term, I'd like to see all of the container info extraction related tests moved solely to the containerinfo package and removed from the individual plugins. Resolves #4004, resolves #4682, resolves #4917. Signed-off-by: Andrew Harding <azdagron@gmail.com> * missing new arg Signed-off-by: Andrew Harding <azdagron@gmail.com> * fix windows tests Signed-off-by: Andrew Harding <azdagron@gmail.com> * fix windows tests and lint Signed-off-by: Andrew Harding <azdagron@gmail.com> * address pr comments Signed-off-by: Andrew Harding <azdagron@gmail.com> * markdown lint Signed-off-by: Andrew Harding <azdagron@gmail.com> * add agent full conf Signed-off-by: Andrew Harding <azdagron@gmail.com> * fix labels Signed-off-by: Andrew Harding <azdagron@gmail.com> * change log to warn Signed-off-by: Andrew Harding <azdagron@gmail.com> * use new locator in it Signed-off-by: Andrew Harding <azdagron@gmail.com> --------- Signed-off-by: Andrew Harding <azdagron@gmail.com>

MarcosDY added the triage/needed Issue is in need of triage label Mar 21, 2023

rturner3 added triage/in-progress Issue triage is in progress and removed triage/needed Issue is in need of triage labels Mar 21, 2023

rturner3 assigned evan2645 Mar 21, 2023

evan2645 added help wanted Issues with this label are ready to start work but are in need of someone to do it priority/backlog Issue is approved and in the backlog and removed triage/in-progress Issue triage is in progress labels Mar 28, 2023

evan2645 removed their assignment Jun 1, 2023

zmt mentioned this issue Jun 12, 2023

Support cgroups v2 matching in docker #4251

Closed

3 tasks

evan2645 added triage/in-progress Issue triage is in progress and removed help wanted Issues with this label are ready to start work but are in need of someone to do it priority/backlog Issue is approved and in the backlog labels Jan 19, 2024

rturner3 added help wanted Issues with this label are ready to start work but are in need of someone to do it priority/urgent Issue is approved and is must be completed in the assigned milestone and removed triage/in-progress Issue triage is in progress labels Feb 1, 2024

rturner3 added this to the 1.9.1 milestone Feb 1, 2024

nikotih mentioned this issue Feb 28, 2024

SPIRE agent cannot verify identity within own pod #4917

Closed

amartinezfayo assigned azdagron Feb 29, 2024

MarcosDY mentioned this issue Mar 4, 2024

Update workflow Ubuntu to 22.04 #4882

Merged

3 tasks

MarcosDY modified the milestones: 1.9.2, 1.9.3 Mar 14, 2024

amartinezfayo modified the milestones: 1.9.3, 1.9.4, 1.9.5 Apr 4, 2024

azdagron mentioned this issue Apr 18, 2024

New container locator for docker/k8s on linux #5076

Merged

azdagron closed this as completed in #5076 Apr 24, 2024

amartinezfayo modified the milestones: 1.9.5, 1.9.6 May 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent K8S workload attestor support for cgroups v2 #4004

Agent K8S workload attestor support for cgroups v2 #4004

tihhoni commented Mar 21, 2023 •

edited

Loading

evan2645 commented Mar 21, 2023

guilhermocc commented Mar 22, 2023

tihhoni commented Mar 22, 2023 •

edited

Loading

tihhoni commented Mar 23, 2023

evan2645 commented Mar 28, 2023

zmt commented Mar 28, 2023

tihhoni commented Mar 31, 2023 •

edited

Loading

kfox1111 commented Apr 11, 2023

nikotih commented Jan 18, 2024 •

edited

Loading

evan2645 commented Jan 18, 2024

nikotih commented Jan 19, 2024 •

edited

Loading

evan2645 commented Jan 19, 2024

Agent K8S workload attestor support for cgroups v2 #4004

Agent K8S workload attestor support for cgroups v2 #4004

Comments

tihhoni commented Mar 21, 2023 • edited Loading

evan2645 commented Mar 21, 2023

guilhermocc commented Mar 22, 2023

tihhoni commented Mar 22, 2023 • edited Loading

tihhoni commented Mar 23, 2023

evan2645 commented Mar 28, 2023

zmt commented Mar 28, 2023

tihhoni commented Mar 31, 2023 • edited Loading

kfox1111 commented Apr 11, 2023

nikotih commented Jan 18, 2024 • edited Loading

evan2645 commented Jan 18, 2024

nikotih commented Jan 19, 2024 • edited Loading

evan2645 commented Jan 19, 2024

tihhoni commented Mar 21, 2023 •

edited

Loading

tihhoni commented Mar 22, 2023 •

edited

Loading

tihhoni commented Mar 31, 2023 •

edited

Loading

nikotih commented Jan 18, 2024 •

edited

Loading

nikotih commented Jan 19, 2024 •

edited

Loading