Better support for rootless containers (#636)

# Short summary Running containers under rootless docker/podman lets us: - avoid permission issues in mounted volumes without any extra run setting (root in container == user at the host) - avoid granting privileges to host users, enhancing system security - simplify container images (just run them with root, it's safe because it's actually the user without extra privileges!) For backwards compatibility, this pull request assumes users still don't run rootless docker by default, so we require users running rootless containers to define `-e RUNROOTLESS=true`. Until a method is devised to detect if a container is running rootless or not from within the container. - https://docs.docker.com/engine/security/rootless/ - https://github.com/containers/podman/blob/main/docs/tutorials/rootless_tutorial.md In this pull request, I modify the userconf script so if we are running rootless `-e RUNROOTLESS=true` we do not create users or change sudoers or do any of those things. With this change, we can run rstudio server under an unprivileged user without issues. ``` mkdir $HOME/work podman run -ti -e PASSWORD=helloworld -e RUNROOTLESS=true -v $HOME/work:/root/work -p 10000:8787 <img-name> ``` Visit localhost:10000 and login using root and helloworld. Thanks for considering merging this. ## Alternative As an alternative solution we could have different images: Something like `rocker/rstudio-rootless` or `rocker-rootless/rstudio` ``` FROM rocker/rstudio COPY init_userconf.sh /etc/cont-init.d/02_userconf ``` But I think it is easier to just merge this # Historic background Here is some longer story of why things are the way they are, to bring some context to all this user mess that exists in docker images. It may not be fully accurate, but good enough ### Stage 1: We are root - Docker is a client-server application. - The docker daemon runs as the root user. - docker users need to run docker commands under root or using `sudo` (e.g. `sudo docker run...`). This situation is problematic because: - docker users have root access to the host - docker images may create/modify files as root in the host (even accidentally if some host directory is mounted!) - files created by docker are owned by the root user, this easily becomes a permission mess - There are audit limitations, since all docker users connect to the same docker daemon typically without authentication. We don't know who does what. People wants to use docker so much that a `docker` user group is created, and all users in that `docker` group do not need to type `sudo` to run docker anymore, although effectively it is as if they did. The risk is still there, only hidden. ### Stage 2: Images drop root privileges Docker is a complex piece of software that relies on namespaces and control groups, features of the linux kernel that are under heavy development. Therefore the fastest and easiest solution to address some of these issues comes from the image builders. The container starts running as root, but image builders following best-practices drop those permissions as soon as possible and read environment variables set when the container is created so users can choose the user id they would like to be used to create files, avoiding the file permission mess. This depends on the good-will of the image builder, but "works". This adds a lot of complexity, because images now have to consider multiple users and permissions (allow root inside the container for apt-get install, use sudoers...) ### Stage 3: Run with `docker run --user` Docker allows to specify the user the image will run as. The Docker daemon runs as root, but the container is started as running with a user id, and that user in the container typically does not have root privileges anymore. Docker here avoids file permission issues, but at some cost. Since now the image starting scripts do not have root access in the container, allowing for apt-get inside the container becomes far more tricky. To my knowledge, this option does not get a lot of adoption in rocker and jupyter notebook images. ### Stage 4: User namespaces `docker run --user-ns` The linux kernel starts having support for user namespaces. Basically we can map user ids in the container to a range of user ids in the host. Depending on how this is used, this can lead to files created by the image not being owned by root anymore, but by a super high user ID. To my knowledge, this option does not get a lot of adoption in rocker and jupyter notebooks. ### Stage 5: Rootless docker Enough namespace and cgroup solutions exist in the kernel for the docker daemon to be able to run containers without root permissions. Running the docker daemon as root is a security risk, and it is also an auditability issue, so this becomes an actual better-designed solution to the permission problem. Now each user can run its own docker daemon. Since the docker daemon runs as the user, it does not have any special permissions, no damage to the host can be made. Therefore we do not need images to drop privileges or follow any best practices to be responsible, since they are not allowed to do any damage by default anymore. Things can be simple again! However we must still be backwards compatible with those who have or want to use docker as root. Hi **podman**! Podman was designed to run rootless by default, and maybe even was able to do so before docker (I don't really know). podman does not even need a daemon, although podman supports a daemon to increase compatibility with docker. How does this work? Using user namespaces in a more transparent way: - alice, with user id 1000, has a rootless docker daemon running, so she can run docker daemons without any root privilege. - alice creates a container (podman run..., docker run...), mounting some directories she has access to (`--volume`) without caring for permissions or user ids - docker/podman creates the container and runs the entrypoint. The entrypoint seems to run as `root` from within the container, but from the host it appears to run as `alice`. If the entrypoint scripts do not do anything weird with the users, the entrypoint and commands that run afterwards run as well as root/alice. Files created on the mounted volume appear to be owned by alice. If alice tries to mount a volume she can't write on (e.g. --volume /sbin:/somewhere) the root/alice user in the container won't be able to write on it, because alice does not have permissions to do that. That's all we want and need! But... backwards compatiblity! Until someone finds a convention to determine if a container is running under rootless docker, we, image builders, can't tell if we should drop privileges or simply use them. So, I would like to ask alice to set an environment variable to tell me if she is running rootless, so I can just use the root/alice user without a care for setting up users and permissions and sudoers. I'll have to ask alice to use an environment variable for now...
rocker-org · May 13, 2023 · 8ab4e7d · 8ab4e7d
1 parent c5bc84a
commit 8ab4e7d
Showing 1 changed file with 124 additions and 17 deletions.
diff --git a/scripts/init_userconf.sh b/scripts/init_userconf.sh
@@ -10,6 +10,98 @@ ROOT=${ROOT:=FALSE}
 UMASK=${UMASK:=022}
 LANG=${LANG:=en_US.UTF-8}
 TZ=${TZ:=Etc/UTC}
+RUNROOTLESS=${RUNROOTLESS:=auto}
+
+if [ "${RUNROOTLESS}" = "auto" ]; then
+    RUNROOTLESS=$(grep 4294967295 /proc/self/uid_map >/dev/null && echo "false" || echo "true")
+fi
+
+USERHOME="/home/${USER}"
+
+if [ "${RUNROOTLESS}" = "true" ]; then
+    printf "Assuming the container runs under rootless mode\n"
+    printf "Under rootless mode,\n"
+    printf " - You will log in using 'root' as user\n"
+    printf " - You will have root privileges within the container (e.g. apt)\n"
+    printf " - The files you create as root on mounted volumes will appear at the host as owned by the user who started the container\n"
+    printf " - You can't modify host files you don't have permission to\n"
+    printf " - You should NOT run in RUNROOTLESS=true if you are using the container with privileges (e.g. sudo docker run... or sudo podman run...)\n"
+    # The container was started asking to login as the root user.
+    # This is a good approach when running docker or podman rootless
+    # https://docs.docker.com/engine/security/rootless/
+    #
+    # When running docker rootless or podman rootless, the root user in
+    # the container has the capabilities of the actual host user. Nothing else.
+    #
+    # All files modified inside the container by the root user that are mapped
+    # to the host will appear in the host as modified by the user who runs the
+    # container. However from inside the container they appear to be modified by
+    # root.
+    #
+    # So, the user can run apt-get as the root user inside the container. No
+    # need for handling sudoers, since to the container the user is root.
+    #
+    # Higher user ids in the container (e.g. 1000) get mapped to very high user
+    # ids at the host. We don't need that and it just confuses things
+    USER="root"
+    USERID=0
+    GROUPID=0
+    USERHOME="/root"
+
+    # Keep all groups that have been set:
+    # When running rootless podman, podman may set the groups of the host user
+    # to the process running in the container with the option
+    # podman run --group-add keep-groups
+    #
+    # This option has the caveat that the GIDs which have not been mapped to
+    # the container in the namespace will appear as the overflow_gid (65534).
+    #
+    # While this process has the GID assigned (and therefore it has the
+    # privileges granted by that GID, this process cannot internally refer
+    # to that GID, because it is not mapped, and it appears as nobody/nogroup.
+    #
+    # This lack of mapping becomes a problem when we need to be able
+    # to assign those same groups to the processes created when a user logs
+    # in through the web interface. There, we are not able to setgroups() as
+    # podman did when the initial process in the container was started.
+    #
+    # What can we do?
+    # A solution goes through a sysadmin in the host allowing users to
+    # impersonate the target GID in /etc/subgid.
+    #
+    # For instance, if you have a "university_data" group, that has GID 2000
+    # and you have PhD students "alice" and "bob" who are in that group, you will
+    # need to have an additional entry in /etc/subgid for each of them:
+    # alice:2000:1
+    # bob:2000:1
+    #
+    # That entry reads as
+    # > Grant {alice/bob} the ability to become GID 2000.
+    #
+    # Those entries should be **additional** to the already existing entries that
+    # grant a big number of unused GIDs.
+    #
+    # Then use `podman system migrate` to refresh podman configuration.
+    #
+    # Podman will then be able to see those groups, although unfortunately
+    # the group name in the container will not be "university_data" but it will
+    # instead look like "adm" or "sys" or "bin".
+    #
+    # I'm trying to suggest an improvement to podman to address this a bit better
+    # at:
+    # https://github.com/containers/podman/issues/18333
+    #
+    ROOT_IN_GROUPS="$(id -G)"
+    OVERFLOWGID=$(cat "/proc/sys/kernel/overflowgid")
+    for g in ${ROOT_IN_GROUPS}; do
+        if [ "$g" -eq 0 ] || [ "$g" -eq "${OVERFLOWGID}" ]; then
+            # 0 is already our GID
+            # 65534 is nogroup (the overflow_gid)
+            continue
+        fi
+        usermod -aG "$g" "${USER}"
+    done
+fi
 
 if [[ ${DISABLE_AUTH,,} == "true" ]]; then
     cp /etc/rstudio/disable_auth_rserver.conf /etc/rstudio/rserver.conf
@@ -29,18 +121,28 @@ elif [ -z "$PASSWORD" ]; then
     printf "\n\n"
 fi
 
-if [ "$USERID" -lt 1000 ]; then # Probably a macOS user, https://github.com/rocker-org/rocker/issues/205
+if [ "${RUNROOTLESS}" = "true" ]; then
+    check_user_id=$(grep -F "auth-minimum-user-id" /etc/rstudio/rserver.conf)
+    if [[ -n $check_user_id ]]; then
+        echo "minimum authorised user already exists in /etc/rstudio/rserver.conf: $check_user_id"
+        echo "RUNROOTLESS=true mode requires setting minimum authorised user to 0. Exiting"
+        exit 1
+    else
+        echo "setting minimum authorised user to 0 (RUNROOTLESS=true)"
+        echo auth-minimum-user-id=0 >>/etc/rstudio/rserver.conf
+    fi
+elif [ "$USERID" -lt 1000 ]; then # Probably a macOS user, https://github.com/rocker-org/rocker/issues/205
     echo "$USERID is less than 1000"
     check_user_id=$(grep -F "auth-minimum-user-id" /etc/rstudio/rserver.conf)
     if [[ -n $check_user_id ]]; then
-        echo "minumum authorised user already exists in /etc/rstudio/rserver.conf: $check_user_id"
+        echo "minimum authorised user already exists in /etc/rstudio/rserver.conf: $check_user_id"
     else
-        echo "setting minumum authorised user to 499"
+        echo "setting minimum authorised user to 499"
         echo auth-minimum-user-id=499 >>/etc/rstudio/rserver.conf
     fi
 fi
 
-if [ "$USER" != "$DEFAULT_USER" ]; then
+if [ "${RUNROOTLESS}" != "true" ] && [ "$USER" != "$DEFAULT_USER" ]; then
     printf "\n\n"
     tput bold
     printf "Settings by \e[31m\`-e USER=<new username>\`\e[39m is now deprecated and will be removed in the future.\n"
@@ -49,53 +151,58 @@ if [ "$USER" != "$DEFAULT_USER" ]; then
     printf "\n\n"
 fi
 
-if [ "$USERID" -ne 1000 ]; then ## Configure user with a different USERID if requested.
+if [ "${RUNROOTLESS}" = "true" ]; then
+    echo "deleting the default user ($DEFAULT_USER) since it is not needed."
+    userdel "$DEFAULT_USER"
+elif [ "$USERID" -ne 1000 ]; then ## Configure user with a different USERID if requested.
     echo "deleting the default user"
     userdel "$DEFAULT_USER"
     echo "creating new $USER with UID $USERID"
-    useradd -m "$USER" -u $USERID
-    mkdir -p /home/"$USER"
-    chown -R "$USER" /home/"$USER"
+    useradd -m "$USER" -u "$USERID"
+    mkdir -p "${USERHOME}"
+    chown -R "$USER" "${USERHOME}"
     usermod -a -G staff "$USER"
 elif [ "$USER" != "$DEFAULT_USER" ]; then
     ## cannot move home folder when it's a shared volume, have to copy and change permissions instead
-    cp -r /home/"$DEFAULT_USER" /home/"$USER"
+    cp -r /home/"$DEFAULT_USER" "${USERHOME}"
     ## RENAME the user
     usermod -l "$USER" -d /home/"$USER" "$DEFAULT_USER"
     groupmod -n "$USER" "$DEFAULT_USER"
     usermod -a -G staff "$USER"
-    chown -R "$USER":"$USER" /home/"$USER"
+    chown -R "$USER":"$USER" "${USERHOME}"
     echo "USER is now $USER"
 fi
 
-if [ "$GROUPID" -ne 1000 ]; then ## Configure the primary GID (whether rstudio or $USER) with a different GROUPID if requested.
+if [ "${RUNROOTLESS}" != "true" ] && [ "$GROUPID" -ne 1000 ]; then ## Configure the primary GID (whether rstudio or $USER) with a different GROUPID if requested.
     echo "Modifying primary group $(id "${USER}" -g -n)"
-    groupmod -o -g $GROUPID "$(id "${USER}" -g -n)"
+    groupmod -o -g "$GROUPID" "$(id "${USER}" -g -n)"
     echo "Primary group ID is now custom_group $GROUPID"
 fi
 
 ## Add a password to user
 echo "$USER:$PASSWORD" | chpasswd
 
 # Use Env flag to know if user should be added to sudoers
-if [[ ${ROOT,,} == "true" ]]; then
+if [ "${RUNROOTLESS}" = "true" ]; then
+    echo "No sudoers changes needed when running rootless"
+elif [[ ${ROOT,,} == "true" ]]; then
     adduser "$USER" sudo && echo '%sudo ALL=(ALL) NOPASSWD:ALL' >>/etc/sudoers
     echo "$USER added to sudoers"
 fi
 
 ## Change Umask value if desired
 if [ "$UMASK" -ne 022 ]; then
     echo "server-set-umask=false" >>/etc/rstudio/rserver.conf
-    echo "Sys.umask(mode=$UMASK)" >>/home/"$USER"/.Rprofile
+    echo "Sys.umask(mode=$UMASK)" >>"${USERHOME}"/.Rprofile
 fi
 
 ## Next one for timezone setup
 if [ "$TZ" != "Etc/UTC" ]; then
-    ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ >/etc/timezone
+    ln -snf /usr/share/zoneinfo/"$TZ" /etc/localtime && echo "$TZ" >/etc/timezone
 fi
 
 ## Update Locale if needed
 if [ "$LANG" != "en_US.UTF-8" ]; then
-    /usr/sbin/locale-gen --lang $LANG
-    /usr/sbin/update-locale --reset LANG=$LANG
+    /usr/sbin/locale-gen --lang "$LANG"
+    /usr/sbin/update-locale --reset LANG="$LANG"
 fi