-
-
Notifications
You must be signed in to change notification settings - Fork 15k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
systemd: switch to unified cgroup hierarchy by default #104094
systemd: switch to unified cgroup hierarchy by default #104094
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have this in my config:
boot.kernelParams = (if config.virtualisation.docker.enable then [] else [ "cgroup_no_v1=all" "systemd.unified_cgroup_hierarchy=yes" ]);
maybe we should automatically specify this for the docker services?
b50247e
to
26541b9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have this in my config:
boot.kernelParams = (if config.virtualisation.docker.enable then [] else [ "cgroup_no_v1=all" "systemd.unified_cgroup_hierarchy=yes" ]);
maybe we should automatically specify this for the docker services?
Yes, adding "systemd.unified_cgroup_hierarchy=0
if docker is enabled might make sense, at least as long as docker doesn't support cgroupv2.
We probably should add link to the upstream tracking issues.
It seems it's supported in runc: opencontainers/runc#2315 (comment) and should be supported in docker 20, which was "not officially released yet, and its ETA still unpublished" in September 2020: moby/moby#40360 (comment)
Docker 20.10.0-rc1 (https://github.com/moby/moby/releases) was released today which should add cgroupsv2 and apparmor 3.0-beta support. Maybe we can wait until it is released? |
@SuperSandro2000 are you aware of any timelines on docker 20.* getting released? Would you be up to packaging their rc, to see how it behaves? ;-) I surely want to add/extend some more tests about this before merging, and see how config options for other container runtimes might need to look like (if it requires special configuration). |
This will make user@ services fail when using with the hardened profile. Specifically this line of the hardened profile and this issue marked as "not supported". There are several solutions to this and it definitely should not block this, but I thought it was worth noting. cc @joachifm (because of the hardened profile). |
26541b9
to
aa64c18
Compare
I pushed a new version exposing a I still want to some checks for |
I also did some more experiments:
I made sure the unified cgroup hierarchy is disabled in these cases and the tests succeed again. |
I added some more tests on the accounting parts. I couldn't reliably get IO Accounting on individual units tested, but the test ensures This is ready for review, PTAL. |
No but I assume it could take a month.
I can get it to build but for testing I miss experience. |
Okay. So I propose we keep this PR as-is (disable unified cgroup when you enable docker), and we can remove this once docker has landed and supports it.
|
KVM does not want to work with nix tests on my machine. Not sure why. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, in general. Some notes, although they're not too important.
See https://www.redhat.com/sysadmin/fedora-31-control-group-v2 for details on why this is desirable, and how it impacts containers. Users that need to keep using the old cgroup hierarchy can re-enable it by setting `systemd.unifiedCgroupHierarchy` to `false`. Well-known candidates not supporting that hierarchy, like docker and hidepid=… will disable it automatically. Fixes NixOS#73800
This gets automatically disabled by docker if the docker backend is used, but the bundled containerd also doesn't seem to support cgroupsv2, so disable it explicitly here, too.
For now, testing IO Accounting is skipped, as it seems to be either broken, or hard to reproduce in a VM.
d5d53d6
to
f683297
Compare
nixos/tests/podman.nix
Outdated
podman.succeed( | ||
su_cmd( | ||
"podman run --runtime=crun -d --name=sleeping -v /nix/store:/nix/store -v /run/current-system/sw/bin:/bin scratchimg /bin/sleep 10" | ||
"podman run -d --name=sleeping -v /nix/store:/nix/store -v /run/current-system/sw/bin:/bin scratchimg /bin/sleep 10" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we keep the explicit --runtime=crun
/ Run container rootless with crun
please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on why? podman
seems to default to it with cgroupsv2, and if we view this test from a user perspective, they just want to podman run
as non-root, without explicitly configuring a specific runtime.
also cc @adisbladis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just want to say I agree with @flokli on this one. It's better that we test the users default config.
Also I seem to recall seeing some intent to switch back to runc
by default at some point once it's gained the required cgroupsv2 support (don't quote me on this one).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do this instead?
Edit: just realised that the original diff was incomplete.
diff --git a/nixos/tests/podman.nix b/nixos/tests/podman.nix
index cd8c2b4308c..3b54d793b1b 100644
--- a/nixos/tests/podman.nix
+++ b/nixos/tests/podman.nix
@@ -53,15 +53,24 @@ import ./make-test-python.nix (
podman.succeed("podman stop sleeping")
podman.succeed("podman rm sleeping")
+ with subtest("Run container as root with default runtime"):
+ podman.succeed("tar cv --files-from /dev/null | podman import - scratchimg")
+ podman.succeed(
+ "podman run -d --name=sleeping -v /nix/store:/nix/store -v /run/current-system/sw/bin:/bin scratchimg /bin/sleep 10"
+ )
+ podman.succeed("podman ps | grep sleeping")
+ podman.succeed("podman stop sleeping")
+ podman.succeed("podman rm sleeping")
+
with subtest("Run container rootless with runc"):
podman.succeed(su_cmd("tar cv --files-from /dev/null | podman import - scratchimg"))
- podman.succeed(
+ podman.fail(
su_cmd(
"podman run --runtime=runc -d --name=sleeping -v /nix/store:/nix/store -v /run/current-system/sw/bin:/bin scratchimg /bin/sleep 10"
)
)
- podman.succeed(su_cmd("podman ps | grep sleeping"))
- podman.succeed(su_cmd("podman stop sleeping"))
+ podman.fail(su_cmd("podman ps | grep sleeping"))
+ podman.fail(su_cmd("podman stop sleeping"))
podman.succeed(su_cmd("podman rm sleeping"))
with subtest("Run container rootless with crun"):
@@ -74,6 +83,17 @@ import ./make-test-python.nix (
podman.succeed(su_cmd("podman ps | grep sleeping"))
podman.succeed(su_cmd("podman stop sleeping"))
podman.succeed(su_cmd("podman rm sleeping"))
+
+ with subtest("Run container rootless with default runtime"):
+ podman.succeed(su_cmd("tar cv --files-from /dev/null | podman import - scratchimg"))
+ podman.succeed(
+ su_cmd(
+ "podman run -d --name=sleeping -v /nix/store:/nix/store -v /run/current-system/sw/bin:/bin scratchimg /bin/sleep 10"
+ )
+ )
+ podman.succeed(su_cmd("podman ps | grep sleeping"))
+ podman.succeed(su_cmd("podman stop sleeping"))
+ podman.succeed(su_cmd("podman rm sleeping"))
'';
}
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think our test should suddenly fail if podmans runc backend starts working in rootless scenarios again.
I'm fine adding a test for the default backend in as-root scenarios, but made the rootless runc subtest a comment, which we can re-add once it gets implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think our test should suddenly fail if podmans runc backend starts working in rootless scenarios again.
I'm fine adding a test for the default backend in as-root scenarios, but made the rootless runc subtest a comment, which we can re-add once it gets implemented.
All of these packages have passthru.tests
so the failure will be noticed when it does start working.
Can you move the default tests after the explict runtime tests please? If it does fail because of a runtime problem having it fail on the explict test would be clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can move the test order, sure, but I still don't think the test should suddenly fail if more things work. podman
is still functional if a backend that was known to be broken starts working again. We shouldn't then fail the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expect the maintainers of this package to read the release notes while bumping, and updating/extending the test accordingly once new features are added.
16ef7f1
to
8de38fa
Compare
The runc backend doesn't work with unified cgroup hierarchy, and it failing is a known issue. However, the default backends should work in both rootless and as-root scenarios, so make sure we test these.
8de38fa
to
90d5bdb
Compare
@@ -155,6 +155,9 @@ in | |||
users.groups.docker.gid = config.ids.gids.docker; | |||
systemd.packages = [ cfg.package ]; | |||
|
|||
# TODO: remove once docker 20.10 is released | |||
systemd.enableUnifiedCgroupHierarchy = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to make this conditional on the docker version? Users can set the docker package at runtime and defaulting this to the right version automatically might be a nice perk that we can implement rather quickly in NixOS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't know yet if docker 20.10 really works. I'm happy to take a closer look at this once it is in Nixpkgs. There currently is no docker working with enabled unified cgroup hierarchy, so I'm not much in favor of code for the eventuality of it being there ;-)
Let's get this in. If there's other applications that don't really work with cgroupsv2, disabling it for them is pretty trivial by setting |
Since NixOS#104094 (d22b3ed), NixOS is using the unified cgroup hierarchy by default (aka cgroupv2). This means the blkio controller isn't there, so we should test for something else (e.g. the presence of the io controller). Fixes NixOS#105581.
Other distros, like Fedora 32 have already switched to it.
Users that need to keep using the old cgroup hierarchy (namely because
they want to run docker, and not some of the alternatives supporting
cgroupsv2 can re-enable it by setting
systemd.enableUnifiedCgroupHierarchy = false;
.Fixes #73800
Motivation for this change
Things done
sandbox
innix.conf
on non-NixOS linux)nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
./result/bin/
)nix path-info -S
before and after)