Skip to content

Commit

Permalink
Merge pull request #45178 from kinvolk/rata/userns-1.30
Browse files Browse the repository at this point in the history
User namespaces doc changes for 1.30
  • Loading branch information
k8s-ci-robot authored Mar 21, 2024
2 parents 753073b + 69b9e71 commit c7cd6c5
Show file tree
Hide file tree
Showing 3 changed files with 108 additions and 27 deletions.
90 changes: 78 additions & 12 deletions content/en/docs/concepts/workloads/pods/user-namespaces.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ min-kubernetes-server-version: v1.25
---

<!-- overview -->
{{< feature-state for_k8s_version="v1.25" state="alpha" >}}
{{< feature-state for_k8s_version="v1.30" state="beta" >}}

This page explains how user namespaces are used in Kubernetes pods. A user
namespace isolates the user running inside the container from the one
Expand Down Expand Up @@ -46,7 +46,26 @@ tmpfs, Secrets use a tmpfs, etc.)
Some popular filesystems that support idmap mounts in Linux 6.3 are: btrfs,
ext4, xfs, fat, tmpfs, overlayfs.

In addition, support is needed in the
In addition, the container runtime and its underlying OCI runtime must support
user namespaces. The following OCI runtimes offer support:

* [crun](https://github.com/containers/crun) version 1.9 or greater (it's recommend version 1.13+).

<!-- ideally, update this if a newer minor release of runc comes out, whether or not it includes the idmap support -->
{{< note >}}
Many OCI runtimes do not include the support needed for using user namespaces in
Linux pods. If you use a managed Kubernetes, or have downloaded it from packages
and set it up, it's likely that nodes in your cluster use a runtime that doesn't
include this support. For example, the most widely used OCI runtime is `runc`,
and version `1.1.z` of runc doesn't support all the features needed by the
Kubernetes implementation of user namespaces.

If there is a newer release of runc than 1.1 available for use, check its
documentation and release notes for compatibility (look for idmap mounts support
in particular, because that is the missing feature).
{{< /note >}}

To use user namespaces with Kubernetes, you also need to use a CRI
{{< glossary_tooltip text="container runtime" term_id="container-runtime" >}}
to use this feature with Kubernetes pods:

Expand Down Expand Up @@ -137,20 +156,67 @@ use, see `man 7 user_namespaces`.

## Set up a node to support user namespaces

It is recommended that the host's files and host's processes use UIDs/GIDs in
the range of 0-65535.
By default, the kubelet assigns pods UIDs/GIDs above the range 0-65535, based on
the assumption that the host's files and processes use UIDs/GIDs within this
range, which is standard for most Linux distributions. This approach prevents
any overlap between the UIDs/GIDs of the host and those of the pods.

Avoiding the overlap is important to mitigate the impact of vulnerabilities such
as [CVE-2021-25741][CVE-2021-25741], where a pod can potentially read arbitrary
files in the host. If the UIDs/GIDs of the pod and the host don't overlap, it is
limited what a pod would be able to do: the pod UID/GID won't match the host's
file owner/group.

The kubelet can use a custom range for user IDs and group IDs for pods. To
configure a custom range, the node needs to have:

* A user `kubelet` in the system (you cannot use any other username here)
* The binary `getsubids` installed (part of [shadow-utils][shadow-utils]) and
in the `PATH` for the kubelet binary.
* A configuration of subordinate UIDs/GIDs for the `kubelet` user (see
[`man 5 subuid`](https://man7.org/linux/man-pages/man5/subuid.5.html) and
[`man 5 subgid`](https://man7.org/linux/man-pages/man5/subgid.5.html)).

This setting only gathers the UID/GID range configuration and does not change
the user executing the `kubelet`.

You must follow some constraints for the subordinate ID range that you assign
to the `kubelet` user:

* The subordinate user ID, that starts the UID range for Pods, **must** be a
multiple of 65536 and must also be greater than or equal to 65536. In other
words, you cannot use any ID from the range 0-65535 for Pods; the kubelet
imposes this restriction to make it difficult to create an accidentally insecure
configuration.

* The subordinate ID count must be a multiple of 65536

* The subordinate ID count must be at least `65536 x <maxPods>` where `<maxPods>`
is the maximum number of pods that can run on the node.

* You must assign the same range for both user IDs and for group IDs, It doesn't
matter if other users have user ID ranges that don't align with the group ID
ranges.

* None of the assigned ranges should overlap with any other assignment.

* The subordinate configuration must be only one line. In other words, you can't
have multiple ranges.

The kubelet will assign UIDs/GIDs higher than that to pods. Therefore, to
guarantee as much isolation as possible, the UIDs/GIDs used by the host's files
and host's processes should be in the range 0-65535.
For example, you could define `/etc/subuid` and `/etc/subgid` to both have
these entries for the `kubelet` user:

Note that this recommendation is important to mitigate the impact of CVEs like
[CVE-2021-25741][CVE-2021-25741], where a pod can potentially read arbitrary
files in the hosts. If the UIDs/GIDs of the pod and the host don't overlap, it
is limited what a pod would be able to do: the pod UID/GID won't match the
host's file owner/group.
```
# The format is
# name:firstID:count of IDs
# where
# - firstID is 65536 (the minimum value possible)
# - count of IDs is 110 (default limit for number of) * 65536
kubelet:65536:7208960
```

[CVE-2021-25741]: https://github.com/kubernetes/kubernetes/issues/104980
[shadow-utils]: https://github.com/shadow-maint/shadow

## Integration with Pod security admission checks

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,12 @@ _build:
render: false

stages:
- stage: alpha
- stage: alpha
defaultValue: false
fromVersion: "1.28"
toVersion: "1.29"
- stage: beta
defaultValue: false
fromVersion: "1.30"
---
Enable user namespace support for Pods.
39 changes: 25 additions & 14 deletions content/en/docs/tasks/configure-pod-container/user-namespaces.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ min-kubernetes-server-version: v1.25
---

<!-- overview -->
{{< feature-state for_k8s_version="v1.25" state="alpha" >}}
{{< feature-state for_k8s_version="v1.30" state="beta" >}}

This page shows how to configure a user namespace for pods. This allows you to
isolate the user running inside the container from the one in the host.
Expand Down Expand Up @@ -57,10 +57,6 @@ If you have a mixture of nodes and only some of the nodes provide user namespace
Pods, you also need to ensure that the user namespace Pods are
[scheduled](/docs/concepts/scheduling-eviction/assign-pod-node/) to suitable nodes.

Please note that **if your container runtime doesn't support user namespaces, the
`hostUsers` field in the pod spec will be silently ignored and the pod will be
created without user namespaces.**

<!-- steps -->

## Run a Pod that uses a user namespace {#create-pod}
Expand All @@ -82,27 +78,42 @@ to `false`. For example:
kubectl attach -it userns bash
```

And run the command. The output is similar to this:
Run this command:

```none
```shell
readlink /proc/self/ns/user
```

The output is similar to:

```shell
user:[4026531837]
```

Also run:

```shell
cat /proc/self/uid_map
0 0 4294967295
```

Then, open a shell in the host and run the same command.
The output is similar to:
```shell
0 833617920 65536
```

Then, open a shell in the host and run the same commands.

The `readlink` command shows the user namespace the process is running in. It
should be different when it is run on the host and inside the container.

The output must be different. This means the host and the pod are using a
different user namespace. When user namespaces are not enabled, the host and the
pod use the same user namespace.
The last number of the `uid_map` file inside the container must be 65536, on the
host it must be a bigger number.

If you are running the kubelet inside a user namespace, you need to compare the
output from running the command in the pod to the output of running in the host:

```none
```shell
readlink /proc/$pid/ns/user
user:[4026534732]
```

replacing `$pid` with the kubelet PID.

0 comments on commit c7cd6c5

Please sign in to comment.