Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support autodetection of cgroup mode in docker workloadattestor #4682

Closed
zmt opened this issue Nov 21, 2023 · 0 comments · Fixed by #5076
Closed

support autodetection of cgroup mode in docker workloadattestor #4682

zmt opened this issue Nov 21, 2023 · 0 comments · Fixed by #5076
Labels
help wanted Issues with this label are ready to start work but are in need of someone to do it priority/backlog Issue is approved and in the backlog

Comments

@zmt
Copy link
Contributor

zmt commented Nov 21, 2023

Proposal

We should be able to detect from a few checks on filesystem types in /sys/fs and adjust cgroup matchers appropriately since only one mode can operate at a time on a node (or in a container). This would allow decoupling of cgroup version migration on a given host from SPIRE Agent reconfiguration/rollout on the same host. Effectively, the detection logic should be identical to systemd detection. Essentially, this means go code roughly equivalent to:

% stat -fc %T /sys/fs/cgroup
tmpfs
% stat -fc %T /sys/fs/cgroup/systemd
cgroupfs
% stat -fc %T /sys/fs/cgroup/unified
stat: cannot read file system information for '/sys/fs/cgroup/unified': No such file or directory

Background

There are apparently 3 modes in which cgroup operates:

  • cgroup v1 with cgroup fs driver
  • cgroup v2 with systemd fs driver
  • "hybrid" where they sort of coexist

The hybrid mode was introduced around systemd v232 release to address incompatibilities after cgroup v2 was initially released.

Please see prior hacky proposal, prior research, and more details in #4251.

  • Version: 1.7.4 (and before)
  • Platform: linux
  • Subsystem: docker workloadattestor
@MarcosDY MarcosDY added triage/in-progress Issue triage is in progress help wanted Issues with this label are ready to start work but are in need of someone to do it priority/backlog Issue is approved and in the backlog and removed triage/in-progress Issue triage is in progress labels Nov 28, 2023
azdagron added a commit to azdagron/spire that referenced this issue Apr 18, 2024
The docker and k8s workload attestors work backwards from pid to
container by inspecting the proc filesystem. Today, this happens by
inspecting the cgroup file. Identifying the container ID (and pod UID)
from the cgroup file has been a continual arms race. The k8s and docker
workload attestors grew different mechanisms for trying to deal with the
large variety in the output.

Further, with cgroups v2 and private namespaces, the cgroup file might
not have the container ID or pod UID information within it.

This PR unifies the container ID (and pod UID) extraction for both the
docker and k8s workload attestors. The new implementation searches the
mountinfo file first for cgroups mounts. If not found, it will fall back
to the cgroup file (typically necessary only when the workload is
running in the same container as the agent).

The extraction algorithm is the same for both mountinfo and cgroup
entries, and is as follows:
1. Iterator over each entry in the file being searched, extracting
   either the cgroup mount root (mountinfo) or the cgroup group
   path (cgroup) as the source path.
2. Walk backwards through the segments in the source path looking for
   the 64-bit hex digit container ID.
3. If looking for the pod UID (K8s only), then walk backwards through
   the segments in the path looking for the pod UID pattern used by
   kubelet. Start with the segment the container ID was found in
   (truncated to remove the container ID portion).
4. If there are pod UID/container ID conflicts after searching these
   files then log and abort. Entries that have a pod UID override those
   that don't.

The container ID is very often contained in the last segment in the path
but there are situations where it isn't.

This new functionality is NOT enabled by default, but opted in using the
`use_new_container_locator` configurable in each plugin. In 1.10, we can
consider enabling it by default.

The testing for the new code is spread out a little bit. The cgroups
fallback functionality is mostly tested by the existing tests in the
k8s and docker plugin tests. The mountinfo tests are only in the new
containerinfo package.

In the long term, I'd like to see all of the container info extraction
related tests moved solely to the containerinfo package and removed from
the individual plugins.

Resolves spiffe#4004, resolves spiffe#4682, resolves spiffe#4917.

Signed-off-by: Andrew Harding <azdagron@gmail.com>
azdagron added a commit to azdagron/spire that referenced this issue Apr 18, 2024
The docker and k8s workload attestors work backwards from pid to
container by inspecting the proc filesystem. Today, this happens by
inspecting the cgroup file. Identifying the container ID (and pod UID)
from the cgroup file has been a continual arms race. The k8s and docker
workload attestors grew different mechanisms for trying to deal with the
large variety in the output.

Further, with cgroups v2 and private namespaces, the cgroup file might
not have the container ID or pod UID information within it.

This PR unifies the container ID (and pod UID) extraction for both the
docker and k8s workload attestors. The new implementation searches the
mountinfo file first for cgroups mounts. If not found, it will fall back
to the cgroup file (typically necessary only when the workload is
running in the same container as the agent).

The extraction algorithm is the same for both mountinfo and cgroup
entries, and is as follows:
1. Iterator over each entry in the file being searched, extracting
   either the cgroup mount root (mountinfo) or the cgroup group
   path (cgroup) as the source path.
2. Walk backwards through the segments in the source path looking for
   the 64-bit hex digit container ID.
3. If looking for the pod UID (K8s only), then walk backwards through
   the segments in the path looking for the pod UID pattern used by
   kubelet. Start with the segment the container ID was found in
   (truncated to remove the container ID portion).
4. If there are pod UID/container ID conflicts after searching these
   files then log and abort. Entries that have a pod UID override those
   that don't.

The container ID is very often contained in the last segment in the path
but there are situations where it isn't.

This new functionality is NOT enabled by default, but opted in using the
`use_new_container_locator` configurable in each plugin. In 1.10, we can
consider enabling it by default.

The testing for the new code is spread out a little bit. The cgroups
fallback functionality is mostly tested by the existing tests in the
k8s and docker plugin tests. The mountinfo tests are only in the new
containerinfo package.

In the long term, I'd like to see all of the container info extraction
related tests moved solely to the containerinfo package and removed from
the individual plugins.

Resolves spiffe#4004, resolves spiffe#4682, resolves spiffe#4917.

Signed-off-by: Andrew Harding <azdagron@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Issues with this label are ready to start work but are in need of someone to do it priority/backlog Issue is approved and in the backlog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants