Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable rootless cgroup v2 tests + some fixes #2944

Merged

Conversation

kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented May 7, 2021

TL;DR: this is mostly tests/integration simplifications and improvements for systemd + cgroup v2 + rootless case, addressing #2942. The only exception is the second commit, which fixes the real bug uncovered by tests that are no longer skipped thanks to this PR.

  1. tests/int/helpers: rm old code

    This was added by commit ca4f427, together with set_resources_limit,
    and was needed for the latter, since it used sed to update resources.

    Now when we switched from sed to jq in commit 79fe41d, this kludge
    is no longer needed.

  2. libct/cg/sd/v2: call initPath from Path

    Sometimes Path() is called before m.path is initialized (in particular,
    this happens from (*linuxContainer).newInitConfig), so we do need to
    make sure to call initPath.

    This fixes the following integration tests (for cgroup v2 + systemd case,
    currently not enabled -- to be enabled by further commits):

    • runc run (blkio weight)
    • runc run (cgroupv2 mount inside container)

    Fixes: ff692f2 ("Fix cgroup2 mount for rootless case") aka PR Fix cgroup2 mount for rootless case #2818.

  3. tests/int/update.bats: don't set cpuset in setup

    Setting cpuset.cpus requires cpuset controller, which might
    not be available in case of cgroup v2 + systemd < 244.

    Rather than require cpuset support from every test case, do not set the initial
    cpuset value from setup(), and do not check it.

  4. tests/int/helpers: generalize require cgroups_freezer

    With this, something like cgroups_pids can be used (to be added
    by the next commit).

  5. tests/int: enable/use requires cgroups_<ctrl>

    1. In case of cgroup v2 + systemd + rootless, get the list of controllers from
      the own cgroup, rather than from the root one (as systemd cgroup delegation
      is required in this case, and it might not be set or fully working).

    2. Use requires cgroups_<controller> in tests that need those
      controllers.

    Tested on Fedora 31 (which does not support delegation of cpuset controller).

  6. tests/int/cgroups: don't check for hugetlb

    Systemd is not able to delegate hugetlb controller, and it is needed for
    cgroup v2 + systemd + rootless case (which is currently skipped because
    of "requires rootless_cgroup", but will be enabled by a later commit).

    The failure being fixed looks like this:

    not ok 4 runc create (limits + cgrouppath + permission on the cgroup dir) succeeds
    (from function check_cgroup_value' in file /vagrant/tests/integration/helpers.bash, line 188, in test file /vagrant/tests/integration/cgroups.bats, line 53) check_cgroup_value "cgroup.controllers" "$(cat /sys/fs/cgroup/user.slice/user-$(id -u).slice/cgroup.controllers)"' failed
    <....>
    /sys/fs/cgroup/user.slice/user-2000.slice/user@2000.service/machine.slice/runc-cgroups-integration-test-20341.scope/cgroup.controllers
    current cpuset cpu io memory pids !? cpuset cpu io memory hugetlb pids

  7. tests/int: run rootless_cgroup tests for v2+systemd

    Before this commit, require rootless_cgroup feature check required cgroup
    to be present in $ROOTLESS_FEATURES. The idea of the requirement,
    though, is to ensure that rootless runc can manage cgroups.

    In case of systemd + cgroup v2, rootless runc can manage cgroups,
    thanks to systemd delegation, so modify the feature check accordingly.

    Next, convert (simplify) some of the existing users to the modified check,
    essentially enabling some tests that were disabled for systemd + cgroup v2 +
    rootless case before.

    Fixes: CI seems skipping cgroupv2 with rootless (systemd) #2942.

  8. Vagrantfile.fedora: set Delegate=yes

    Instead of listing all individual controllers we want to be delegated,
    just say "yes" which means "all the controllers that are available".

@kolyshkin
Copy link
Contributor Author

Failure in Fedora is caused by the fact systemd do not support delegation of hugetlb controller.

not ok 4 runc create (limits + cgrouppath + permission on the cgroup dir) succeeds
# (from function `check_cgroup_value' in file /vagrant/tests/integration/helpers.bash, line 188,
#  in test file /vagrant/tests/integration/cgroups.bats, line 53)
#   `check_cgroup_value "cgroup.controllers" "$(cat /sys/fs/cgroup/user.slice/user-$(id -u).slice/cgroup.controllers)"' failed
# runc spec --rootless (status=0):
# 
# runc run -d --console-socket /tmp/bats-run-49422/runc.Jkng1u/tty/sock test_cgroups_permissions (status=0):
# 
# /sys/fs/cgroup/user.slice/user-2000.slice/user@2000.service/machine.slice/runc-cgroups-integration-test-20341.scope/cgroup.controllers
# current cpuset cpu io memory pids !? cpuset cpu io memory hugetlb pids

Will probably disable the check.

@kolyshkin
Copy link
Contributor Author

CentOS failure is a known flake (#2760). I have a half-ready fix but no time to finish it :(

@kolyshkin kolyshkin force-pushed the enable_rootless_cgroup_v2_tests branch 2 times, most recently from f95897e to 589795f Compare May 7, 2021 03:03
# so check the controllers that the current user has, not the top one.
# NOTE: delegation of cpuset requires systemd >= 244 (Fedora >= 32, Ubuntu >= 20.04).
if [[ "$ROOTLESS" -ne 0 && -n "$RUNC_USE_SYSTEMD" ]]; then
controllers="/sys/fs/cgroup/user.slice/user-$(id -u).slice/user@$(id -u).service/cgroup.controllers"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to check cgroup.subtree_control?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically, if systemd did the job of delegation right, it does not matter.

Fixed nevertheless.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, it seems that was wrong to change cgroup.controllers to cgroup.subtree_control. The first one contains all the controllers that systemd can delegate, while the second one only has a few (pids and memory, if I remember correctly).

Now, as far as I understand, fs2.CreateCgroupPath (called from unifiedManager's Apply) fixes that, enabling all supported controllers and subtree_control, but the tests check it earlier.

Anyway, when I changed it back to cgroups.controllers, it is working, and the tests that require cpuset are now not skipped on cgroup v2 + systemd + rootless.

@AkihiroSuda AkihiroSuda added this to the 1.0.0-rc94 milestone May 7, 2021
@kolyshkin kolyshkin force-pushed the enable_rootless_cgroup_v2_tests branch from 589795f to ad3f47e Compare May 7, 2021 07:15
@kolyshkin kolyshkin requested a review from AkihiroSuda May 7, 2021 07:40
mrunalp
mrunalp previously approved these changes May 7, 2021
@AkihiroSuda
Copy link
Member

AkihiroSuda commented May 7, 2021

We have Fedora 34, but cpuset tests seems skipped?


ok 105 update cgroup v1/v2 common limits # skip test requires cgroups_memory
242
ok 106 update cgroup cpu limits
243
ok 107 set cpu period with no quota
244
ok 108 set cpu quota with no period
245
ok 109 update cpu period with no previous period/quota set
246
ok 110 update cpu quota with no previous period/quota set
247
ok 111 update cgroup v2 resources via unified map
248
ok 112 update cpuset parameters via resources.CPU # skip test requires cgroups_cpuset
249
ok 113 update cpuset parameters via v2 unified map # skip test requires cgroups_cpuset
250
ok 114 update rt period and runtime # skip test requires cgroups_v1
251
ok 115 update devices [minimal transition rules] # skip test requires cgroups_v1
252
ok 116 update paused container # skip test requires cgroups_freezer
253
ok 117 runc version
254
Connection to localhost closed.

@kolyshkin kolyshkin force-pushed the enable_rootless_cgroup_v2_tests branch from ad3f47e to 3ae0981 Compare May 7, 2021 19:37
@kolyshkin
Copy link
Contributor Author

We have Fedora 34, but cpuset tests seems skipped?

That was a typo in tests/int: enable/use requires cgroups_<ctrl> commit 🥵

Thanks for catching this, should be fixed now.

@kolyshkin
Copy link
Contributor Author

CentOS 7 failure:

ok 100 runc exec [tty consolesize]
ok 101 runc create [terminal=false]
not ok 102 runc run [terminal=false]
# (from function `retry' in file tests/integration/helpers.bash, line 399,
#  from function `wait_for_container' in file tests/integration/helpers.bash, line 407,
#  in test file tests/integration/tty.bats, line 238)
#   `wait_for_container 15 1 test_busybox running' failed
# runc spec (status=0):
# 
# time="2021-05-07T19:51:27Z" level=warning msg="unable to get oom kill count" error="no directory specified for memory.oom_control"
# time="2021-05-07T19:51:27Z" level=error msg="container_linux.go:380: starting container process caused: process_linux.go:385: applying cgroup configuration for process caused: Unit runc-test_busybox.scope already exists."
# Command "eval __runc state test_busybox | grep -qw running" failed 15 times. Output: time="2021-05-07T19:51:51Z" level=error msg="container \"test_busybox\" does not exist"

This is probably caused by the fact that some earlier container was not removed, which in turn is probably caused by a very slow VM, a small (1 second) timeout in stopUnit, and the fact that it does not return an error if the timeout is hit. This was fixed before for startUnit (commit 3844789, PR #2614) but not for stopUnit.

In any case look unrelated to the PR. I'll look into it.

CI restarted.

@kolyshkin
Copy link
Contributor Author

In any case look unrelated to the PR. I'll look into it.

Proposed fix (unrelated to this PR): #2946

@kolyshkin
Copy link
Contributor Author

Weird, I still see this in fedora + rootless + systemd:

ok 105 update cgroup v1/v2 common limits # skip test requires cpuset

Looking.

@kolyshkin kolyshkin force-pushed the enable_rootless_cgroup_v2_tests branch from 3ae0981 to 89e648e Compare May 8, 2021 00:47
@kolyshkin
Copy link
Contributor Author

ok 105 update cgroup v1/v2 common limits # skip test requires cpuset

Might be caused by the last commit (Delegate=yes). Removed, let's see...

@kolyshkin kolyshkin marked this pull request as draft May 8, 2021 01:09
@kolyshkin kolyshkin force-pushed the enable_rootless_cgroup_v2_tests branch 2 times, most recently from 7f709da to 2ca3ba9 Compare May 8, 2021 04:48
kolyshkin added 3 commits May 7, 2021 22:05
This was added by commit ca4f427, together with set_resources_limit,
and was needed for the latter, since sed was used to update resources.

Now when we switched to jq in commit 79fe41d, this kludge is no
longer needed.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Sometimes Path() is called before m.path is initialized (in particular,
this happens from (*linuxContainer).newInitConfig), so we do need to
make sure to call initPath.

This fixes the following integration tests (for cgroup v2 + systemd case,
currently not enabled -- to be enabled by further commits):

 *  runc run (blkio weight)
 * runc run (cgroupv2 mount inside container)

Fixes: ff692f2 ("Fix cgroup2 mount for rootless case")
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Setting cpuset.cpus requires cpuset controller, which might
not be available in case of cgroup v2 + systemd < 244.

Rather than require cpuset support from every test case, do not set the initial
cpuset value from setup(), and do not check it.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added 5 commits May 7, 2021 22:06
With this, something like cgroups_pids can be used (to be added
by the next commit).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
1. In case of cgroup v2 + systemd + rootless, get the list of controllers from
   the own cgroup, rather than from the root one (as systemd cgroup delegation
   is required in this case, and it might not be set or fully working).

2. Use "requires cgroups_<controller>" in tests that need those
   controllers.

Tested on Fedora 31 (which does not support delegation of cpuset controller).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Systemd is not able to delegate hugetlb controller, and it is needed for
cgroup v2 + systemd + rootless case (which is currently skipped because
of "requires rootless_cgroup", but will be enabled by a later commit).

The failure being fixed looks like this:

> not ok 4 runc create (limits + cgrouppath + permission on the cgroup dir) succeeds
> # (from function `check_cgroup_value' in file /vagrant/tests/integration/helpers.bash, line 188,
> #  in test file /vagrant/tests/integration/cgroups.bats, line 53)
> #   `check_cgroup_value "cgroup.controllers" "$(cat /sys/fs/cgroup/user.slice/user-$(id -u).slice/cgroup.controllers)"' failed
> # <....>
> # /sys/fs/cgroup/user.slice/user-2000.slice/user@2000.service/machine.slice/runc-cgroups-integration-test-20341.scope/cgroup.controllers
> # current cpuset cpu io memory pids !? cpuset cpu io memory hugetlb pids

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Before this commit, "require rootless_cgroup" feature check required "cgroup"
to be present in ROOTLESS_FEATURES. The idea of the requirement, though, is
to ensure that rootless runc can manage cgroups.

In case of systemd + cgroup v2, rootless runc can manage cgroups,
thanks to systemd delegation, so modify the feature check accordingly.

Next, convert (simplify) some of the existing users to the modified check.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Instead of listing all individual controllers we want to be delegated,
just say "yes" which means "all the controllers that are available".

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@kolyshkin kolyshkin force-pushed the enable_rootless_cgroup_v2_tests branch from 2ca3ba9 to 12e9cac Compare May 8, 2021 05:07
@kolyshkin kolyshkin marked this pull request as ready for review May 8, 2021 05:47
@kolyshkin
Copy link
Contributor Author

No longer a draft. The last issue that was fixed is described at #2944 (comment)

@AkihiroSuda AkihiroSuda mentioned this pull request May 8, 2021
7 tasks
Copy link
Member

@cyphar cyphar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@cyphar cyphar closed this in 62250c1 May 10, 2021
@cyphar cyphar merged commit 62250c1 into opencontainers:master May 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI seems skipping cgroupv2 with rootless (systemd)
4 participants