Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Makefile: disable kernel memory accounting on RHEL 7 3.10 kernels #2594

Closed
wants to merge 4 commits into from

Conversation

thaJeztah
Copy link
Member

Kernel-memory accounting is known to be broken on RHEL 7 kernels. This patch automatically disables kernel-memory accounting when building on a 3.10 kernel.

The original behavior is preserved if a custom BUILDTAGS is passed;

make BUILDTAGS='seccomp selinux' ...

I recalled we had a similar patch in Moby (moby/moby@8972aa9), and thought I'd might as well upstream this.

Also note that moby/moby#41254 (comment)

Kernel memory limit is not supported on cgroup v2. Even on cgroup v1, kernel memory limit (kmem.limit_in_bytes) has been deprecated since kernel 5.4. torvalds/linux@0158115

So perhaps either add a similar check for kernel 5.4+, or disabling kmem by default, and making it an opt-on could make sense (@AkihiroSuda wdyt?)

@thaJeztah
Copy link
Member Author

@kolyshkin ptal

@cyphar
Copy link
Member

cyphar commented Sep 18, 2020

TIL that kmemcg is now deprecated, neither the kernel containers ML nor the cgroup ML was on the Cc list for that change.

@thaJeztah
Copy link
Member Author

Thanks for Akihiro, who noticed that. I guess it has been somewhat problematic overall, but I was surprised as well.

Given its deprecation, perhaps a decision should be made before runc 1.0.0 wether or not it should be enabled by default (or supported at all; it could be (silently) "ignored" to not break compatibility)

@kolyshkin
Copy link
Contributor

kolyshkin commented Sep 19, 2020

This is very somewhat similar to how I did this initially in #1920. Someone raised the concern that it should be done by packagers rather than during runtime, and we eventually ended up with #1921.

I think we're good, meaning whoever packages runc for RHEL7 should set nokmem.

@kolyshkin
Copy link
Contributor

kolyshkin commented Sep 19, 2020

TIL that kmemcg is now deprecated, neither the kernel containers ML nor the cgroup ML was on the Cc list for that change.

I learned it recently, too. As much as I like it to be a separate limit, I don't think the majority of users are able to figure out their kmem limits. Currently (at least in cgroupv2) kmem is still accounted and [indirectly] limited by cgroup.memory settings, it's just there are no more direct controls.

@kolyshkin
Copy link
Contributor

Back in the day in OpenVZ we used to have 20+ limits (https://wiki.openvz.org/Proc/user_beancounters) and users were confused. Ended up having just two primary knobs: ram and swap (https://wiki.openvz.org/VSwap).

@cyphar
Copy link
Member

cyphar commented Sep 22, 2020

We will need to remove it from the runtime-spec too...

@kolyshkin
Copy link
Contributor

This CI failure (on CentOS 7) needs to be fixed if we want to proceed:

=== RUN   TestMemorySetKernelMemory
1290    memory_test.go:207: kernel memory accounting disabled in this runc build
1291--- FAIL: TestMemorySetKernelMemory (0.00s)

@thaJeztah
Copy link
Member Author

Added a t.Skip() for those tests if kernel-memory is disabled

@kolyshkin
Copy link
Contributor

I'm not sure what to do about this one. RHEL7 is old, most probably no one is using runc installed from source on it, and whoever is packaging runc for RHEL7 is already taking care of adding nokmem build tag (or so I hope).

@thaJeztah
Copy link
Member Author

whoever is packaging runc for RHEL7 is already taking care of adding nokmem build tag (or so I hope).

Yeah, it's a bit tricky. Agreed that packagers should take this into account, OTOH, it's nice to have make XX "do the right thing" out-of-the-box, and this one is easy to miss (things don't fail directly if you missed setting this option).

Even more so if more recent kernels are used (and users may not be aware of it being deprecated);

So perhaps either add a similar check for kernel 5.4+, or disabling kmem by default, and making it an opt-on could make

@kolyshkin
Copy link
Contributor

The only problem I see it's now impossible to build runc on RHEL7 with kmem. But maybe it's for good, and I'm pretty sure kmem accounting won't be fixed there.

@thaJeztah can you please re-base to bring in recent CI?

@cyphar
Copy link
Member

cyphar commented Feb 4, 2021

I think it doesn't hurt to include this, especially since we have other (actual code) workarounds for RHEL7 (the first thing that comes to mind is the last_cap capabilities thing).

@kolyshkin
Copy link
Contributor

kolyshkin commented Mar 4, 2021

I took a liberty to

  • rebase;
  • simplify the kernel version check a bit;
  • fix adding the nokmem tag (it has to be set before GO_BUILD :=, not after).

It was not working before:

[kir@centos70-ne runc]$ make
go build "-mod=vendor" "-buildmode=pie"  -tags "seccomp selinux apparmor" -ldflags "-X main.gitCommit="5ec32f266c90ecb8da36282a63a6340edf718a18" -X main.version=1.0.0-rc92+dev " -o runc .
[kir@centos70-ne runc]$ uname -a
Linux centos70-ne 3.10.0-1160.6.1.el7.x86_64 #1 SMP Tue Nov 17 13:59:11 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
[kir@centos70-ne runc]$ if uname -r | grep -q '^3\.10\.0.*\.el7\.'; then echo "yes"; fi
yes

It is working now:

[kir@centos70-ne runc]$ make
go build -trimpath "-mod=vendor" "-buildmode=pie"  -tags "seccomp nokmem" -ldflags "-X main.gitCommit="fd7f880166e3b57393e42a905d953ead07683866" -X main.version=1.0.0-rc93+dev " -o runc .

The changes are

diff --git a/Makefile b/Makefile
index c283b9a6..5ff44f35 100644
--- a/Makefile
+++ b/Makefile
@@ -25,16 +25,17 @@ ifeq ($(shell $(GO) env GOOS),linux)
                endif
        endif
 endif
+
+# Disable kmem if building on RHEL7 (kernel 3.10.0 el7)
+ifneq ($(shell uname -r | grep '^3\.10\.0.*\.el7\.'),)
+       BUILDTAGS += nokmem
+endif
+
 GO_BUILD := $(GO) build -trimpath $(MOD_VENDOR) $(GO_BUILDMODE) $(EXTRA_FLAGS) -tags "$(BUILDTAGS)" \
        -ldflags "-X main.gitCommit=$(COMMIT) -X main.version=$(VERSION) $(EXTRA_LDFLAGS)"
 GO_BUILD_STATIC := CGO_ENABLED=1 $(GO) build -trimpath $(MOD_VENDOR) $(EXTRA_FLAGS) -tags "$(BUILDTAGS) netgo osusergo" \
        -ldflags "-w -extldflags -static -X main.gitCommit=$(COMMIT) -X main.version=$(VERSION) $(EXTRA_LDFLAGS)"
 
-ifeq ($(shell if uname -r | grep -q '^3\.10\.0.*\.el7\.'; then echo "yes"; fi),yes)
-       # Disable kmem accounting/limiting on RHEL7 kernels (3.10.0 el7)
-       BUILDTAGS += nokmem
-endif
-
 .DEFAULT: runc
 
 runc:

@kolyshkin
Copy link
Contributor

kolyshkin commented Mar 4, 2021

Now once it's working...

=== RUN   TestRunWithKernelMemory
    exec_test.go:701: runContainer failed with kernel memory limit: container_linux.go:367: starting container process caused: process_linux.go:520: container init caused: process_linux.go:483: setting cgroup config for procHooks process caused: kernel memory accounting disabled in this runc build
--- FAIL: TestRunWithKernelMemory (0.29s)
=== RUN   TestRunWithKernelMemorySystemd
    exec_test.go:701: runContainer failed with kernel memory limit: container_linux.go:367: starting container process caused: process_linux.go:520: container init caused: process_linux.go:483: setting cgroup config for procHooks process caused: kernel memory accounting disabled in this runc build

--- FAIL: TestRunWithKernelMemorySystemd (0.29s)

..more tests need to be disabled

@kolyshkin
Copy link
Contributor

kolyshkin commented Mar 4, 2021

Added a commit to disable libct/int kmem-related tests if nokmem build tag is set, to fix failures from the previous comment.

I have also changed kmem_disabled to kmemDisabled to follow the Go naming guidelines.

@thaJeztah PTAL

@kolyshkin
Copy link
Contributor

(it has to be set before GO_BUILD :=, not after)

This roots in a difference between make's = and :=. The latter substitutes all variables in place (i.e. right here right now), while the former keeps the $(VARIABLE) as is and they are only expanded when used. For more info, see https://www.gnu.org/software/make/manual/html_node/Flavors.html#Flavors

@kolyshkin
Copy link
Contributor

Finally, I think we need to amend the README to say nokmem is auto-set when compiling on 3.0.0-el7 kernel.

@thaJeztah
Copy link
Member Author

Thanks Kir, sorry, I forgot about this one 🤗

@kolyshkin
Copy link
Contributor

kolyshkin commented Mar 4, 2021

Make is fun -- depending on whether the variable is coming from the environment or CLI arguments, the effect is different:

[kir@centos70-ne runc]$ make
go build -trimpath "-mod=vendor" "-buildmode=pie"  -tags "seccomp nokmem" -ldflags "-X main.gitCommit="0b2de072c311e8095e3810a9efb69171d06974c2" -X main.version=1.0.0-rc93+dev " -o runc .

[kir@centos70-ne runc]$ BUILDTAGS="" make
go build -trimpath "-mod=vendor" "-buildmode=pie"  -tags " nokmem" -ldflags "-X main.gitCommit="0b2de072c311e8095e3810a9efb69171d06974c2" -X main.version=1.0.0-rc93+dev " -o runc .

[kir@centos70-ne runc]$ make BUILDTAGS=""
go build -trimpath "-mod=vendor" "-buildmode=pie"  -tags "" -ldflags "-X main.gitCommit="0b2de072c311e8095e3810a9efb69171d06974c2" -X main.version=1.0.0-rc93+dev " -o runc .

@kolyshkin
Copy link
Contributor

kolyshkin commented Mar 4, 2021

Seeing this for the first time (on CentOS 7):

--- PASS: TestFreeze (0.50s)
=== RUN   TestSystemdFreeze
    utils_test.go:86: exec_test.go:539: unexpected error: unable to freeze

--- FAIL: TestSystemdFreeze (1.03s)

Update this is related to #2774 / #2791 and not related to this change. Yet it seems very strange.

@kolyshkin
Copy link
Contributor

Added a commit with README.md changes.

@kolyshkin
Copy link
Contributor

Yet another test to fix :)

not ok 1 runc update --kernel-memory{,-tcp} (initialized)
# (in test file tests/integration/cgroups.bats, line 24)
#   `[ "$status" -eq 0 ]' failed
# runc spec (status=0):
# 
# runc run -d --console-socket /tmp/bats-run-7663/runc.OeJiKe/tty/sock test_cgroups_kmem (status=1):
# time="2021-03-04T20:20:24Z" level=error msg="container_linux.go:367: starting container process caused: process_linux.go:520: container init caused: process_linux.go:483: setting cgroup config for procHooks process caused: kernel memory accounting disabled in this runc build"

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This makes those tests that require kmem to skip on RHEL7 kernels.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
thaJeztah and others added 2 commits March 4, 2021 16:45
Kernel-memory accounting is known to be broken on RHEL 7 kernels.
This patch automatically disables kernel-memory accounting when
building on a 3.10 kernel.

The original behavior is preserved if a custom BUILDTAGS is passed;

    make BUILDTAGS='seccomp selinux' ...

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
1. Provide a more realistic example of using make BUILDTAGS=

2. Fix nokmem dependencies cell (<none> was not rendered).

3. Add information about auto-setting nokmem tag for EL7 kernel.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@kolyshkin
Copy link
Contributor

Rebased, added a skip to int tests.

@kolyshkin
Copy link
Contributor

CI failure

not ok 27 global --debug to --log --log-format 'text'
# (in test file tests/integration/debug.bats, line 58)
#   `[[ "${output}" == *"child process in init()"* ]]' failed

is a well-known flake (see #2756). Restarted.

Comment on lines +64 to +67
e.g. to disable `seccomp` and enable `nokmem`, run:

```bash
make BUILDTAGS='seccomp'
make BUILDTAGS="nokmem"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should encourage keeping seccomp enabled; should we change to (e.g.);

Suggested change
e.g. to disable `seccomp` and enable `nokmem`, run:
```bash
make BUILDTAGS='seccomp'
make BUILDTAGS="nokmem"
e.g. to disable kernel memory limiting, but keep `seccomp`, run:
```bash
make BUILDTAGS="nokmem seccomp"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't matter much to me, it's just a usage example, not a recommendation about how to compile runc, so can be anything (and I sometimes find myself disabling seccomp so I can compile runc without having to install libseccomp-devel).

But yes, maybe yours is a better/fuller example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree it's just an example. People love to copy/paste though, and hate to read 😓

@kolyshkin
Copy link
Contributor

We will need to remove it from the runtime-spec too...

They were already OPTIONAL and are now also NOT RECOMMENDED (opencontainers/runtime-spec#1093)

@hqhq hqhq closed this in #2840 Apr 13, 2021
@thaJeztah thaJeztah deleted the disable_kmem branch April 13, 2021 09:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants