Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable runc-dmz by default #4174

Merged
merged 3 commits into from
Mar 13, 2024
Merged

Conversation

rata
Copy link
Member

@rata rata commented Jan 18, 2024

There are several issues that we caught at the last minute. Several maintainers said it seems safer to disable it by default, so let's do that.

runc-dmz is marked as experimental and besides the build tag you need to opt-in with RUNC_DMZ=true to use it.

Future PRs can go further and maybe even remove the buildtag, simplifying the CI (and the time it takes to run).


cc @cyphar @kolyshkin @lifubang

@rata rata force-pushed the rata/runc-dmz-disable branch 2 times, most recently from 79fd1ba to 6c47543 Compare January 18, 2024 17:20
@rata
Copy link
Member Author

rata commented Jan 18, 2024

Grr, there are some github actions that compile with no_dmz, that of course are of no use now. I'll be on PTO starting tomorrow EOD, can someone please pick this up and carry it to the merge line? :)

See dac4171 for what was added by DMZ.

@cyphar
Copy link
Member

cyphar commented Jan 18, 2024

I was about to write a similar PR 😅. I'll close this once I send that one. I'll also be on PTO from next week.

Copy link
Member

@cyphar cyphar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To properly switch to making runc-dmz opt-in, more work is needed to remove some of the checks we've added as well as changing the logic of the RUNC_DMZ environment variable.

I will send a patch for this tomorrow.

@rata
Copy link
Member Author

rata commented Jan 26, 2024

@cyphar as you didn't open a PR, I've changed this one to do that. Let me know what you think, and feel free to open a PR that replaces this.

@rata rata force-pushed the rata/runc-dmz-disable branch 2 times, most recently from c956a0a to d1ca579 Compare January 26, 2024 17:07
@@ -127,20 +127,20 @@ function teardown() {
[ "${lines[0]}" = "410" ]
}

@test "runc run [runc-dmz]" {
runc --debug run test_hello
@test "RUNC_DMZ=true runc run [runc-dmz]" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@test "RUNC_DMZ=true runc run [runc-dmz]" {
@test "[RUNC_DMZ=true] runc run" {

@rata rata force-pushed the rata/runc-dmz-disable branch 3 times, most recently from e07265a to 5073630 Compare February 1, 2024 11:35
@rata
Copy link
Member Author

rata commented Feb 1, 2024

Suse mirrors are breaking the github actions, but it was passing all the tests before

@rata
Copy link
Member Author

rata commented Feb 1, 2024

@kolyshkin for some reason I can't answer to your comment, but I've fixed it. PTAL :)

@kolyshkin
Copy link
Contributor

@rata @cyphar To me, using both build tag and an env var seems to be an overkill. I think we can drop the build tag entirely (and maybe also as well as selinux compat feature).

@rata
Copy link
Member Author

rata commented Feb 2, 2024

@kolyshkin I agree, that is why I mentioned it in the PR description. But I wouldn't do it in 1.2, as we are introducing the machinery to cross-compile C, just for runc-dmz. I would have the build-tag in case there si an issue (cross-compiling C is not as easy as cross-compiling go) and we haven't ever released with this machinery.

I would keep it, if it fails people can easily avoid it and build anyways (with the same behavior as runc-dmz is disabled by default), we can see the bug reports and fix it without a huge impact as they have other trivial alternatives.

In 1.3 we can remove the build-tag (even if we don't remove runc-dmz) and that is all.

What do you think?

@kolyshkin
Copy link
Contributor

we are introducing the machinery to cross-compile C, just for runc-dmz

Maybe I do not understand, what do you mean by "cross-compile"? We compile runc-dmz for the same architecture as the main runc (and we already have some C code in runc, see libcontainer/nsenter). IOW, we do not cross-compile runc-dmz (unless we also cross-compile runc).

Perhaps we can leave runc_nodmz as is, meaning dmz is compiled in and the resulting binary is is dmz-capable by default. If someone is having trouble with compilation because of runc-dmz, they can add runc_nodmz build tag.

The next step is what to do about enabling dmz during runtime. We have agreed that we should have it disabled by default. I'm not sure if using an environment variable is the best way to do that -- I remember @crosbymichael being very against using environment to affect how a command works (but I don't remember why exactly, maybe Michael himself or @thaJeztah can shed some light). But for now, let's say, we want RUNC_DMZ=yes or somesuch to be present in the environment for runc to invoke dmz.

@kolyshkin
Copy link
Contributor

Maybe I do not understand, what do you mean by "cross-compile"? We compile runc-dmz for the same architecture as the main runc (and we already have some C code in runc, see libcontainer/nsenter). IOW, we do not cross-compile runc-dmz (unless we also cross-compile runc).

Having said that, yes, we do cross-compile runc ourselves (that includes C stuff in runc-dmz and libct/nsenter) as we want to provide release binaries for multiple architectures. That doesn't mean that most users do it -- I suspect they all compile natively (that's what I'd do).

@rata
Copy link
Member Author

rata commented Feb 5, 2024

But for now, let's say, we want RUNC_DMZ=yes or somesuch to be present in the environment for runc to invoke dmz

@kolyshkin I've updated it to do just that.

@rata
Copy link
Member Author

rata commented Feb 9, 2024

Rebased now that tests should be fixed in main. PTAL

@rata rata force-pushed the rata/runc-dmz-disable branch 2 times, most recently from ac795bd to e1477d6 Compare February 9, 2024 13:09
if os.Getenv("RUNC_DMZ") == "legacy" {
logrus.Debugf("RUNC_DMZ=legacy set -- switching back to classic /proc/self/exe cloning")
// Only RUNC_DMZ=true enables runc_dmz
if os.Getenv("RUNC_DMZ") != "true" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use strconv.ParseBool

Copy link
Member Author

@rata rata Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AkihiroSuda it returns an error which we need to handle and also errors out when parsing "". So, the code becomes a mess for, IMHO, no good reason:

-       // Setting RUNC_DMZ=legacy disables this dmz method.
-       if os.Getenv("RUNC_DMZ") == "legacy" {
-               logrus.Debugf("RUNC_DMZ=legacy set -- switching back to classic /proc/self/exe cloning")
+       // Only RUNC_DMZ=true enables runc_dmz
+       runcDmz := os.Getenv("RUNC_DMZ")
+       if runcDmz == "" {
+               logrus.Debugf("RUNC_DMZ is not set -- switching back to classic /proc/self/exe cloning")
                return nil, ErrNoDmzBinary
        }
+       if dmzEnabled, err := strconv.ParseBool(runcDmz); err == nil && !dmzEnabled {
+               logrus.Debugf("RUNC_DMZ is not true -- switching back to classic /proc/self/exe cloning")
+               return nil, ErrNoDmzBinary
+       } else if err != nil {
+               return nil, fmt.Errorf("parsing RUNC_DMZ: %w", err)
+       }
+

Do you really think it is worth it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, to allow RUNC_DMZ=1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to do that!

@@ -323,14 +323,14 @@ function check_exec_debug() {
[ "$status" -eq 0 ]
}

@test "RUNC_DMZ=legacy runc exec [execve error]" {
@test "RUNC_DMZ=false runc exec [execve error]" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, why setting RUNC_DMZ=false here if this is the default?

Copy link
Member Author

@rata rata Feb 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO it seems better to be explicit when the test needs that. If we switch the default later to enabled, it will be painful to realize which tests we need to add the RUNC_DMZ=false.

Not because of failing tests, those are easy fix (although pointless overhead, imho), but cases exactly like this one are the tricky ones, that we want to test it with and without runc_dmz. When we switch the default, we need to find these cases where now we need to disable it. Otherwise test will just pass and we won't be testing that case (we will be testing twice with it enabled, the default and the test requesting it enabled).

But that is my personal preference. I've updated to remove it, to match yours :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That mistake is very simple to make, I was doing it here 🙈

I've fixed it, but if you decide you prefer the explicit false/true, I have it ready in another branch locally.

Let me know what you think

@rata
Copy link
Member Author

rata commented Feb 14, 2024

@kolyshkin PTAL

@rata rata force-pushed the rata/runc-dmz-disable branch 2 times, most recently from 34cb9c6 to 7630252 Compare February 15, 2024 13:28
@rata rata requested a review from kolyshkin February 15, 2024 13:29
@AkihiroSuda
Copy link
Member

ping @cyphar @kolyshkin PTAL

@lifubang
Copy link
Member

This PR changes the buildtag from runc_nodmz to runc_dmz, the build-tag is enabled by default but runc-dmz is only used when RUNC_DMZ=true is set.

I see this sentence in your PR description, but I think this is not implemented in your PR. @rata

And there is another small issue in the main branch: https://github.com/opencontainers/runc/blob/main/Makefile#L82 , could you please fix it in this PR as well? It should be:

rm -f runc runc-* libcontainer/dmz/binary/runc-dmz

@rata
Copy link
Member Author

rata commented Feb 28, 2024

I see this sentence in your PR description, but I think this is not implemented in your PR. @rata

Yes, I changed it as @kolyshkin asked. I've updated the PR description to reflect the current state.

And there is another small issue in the main branch: https://github.com/opencontainers/runc/blob/main/Makefile#L82 , could you please fix it in this PR as well? It should be:

Fixed, thanks!

If it is compiled, the user needs to opt-in with this env variable to
use it.

While we are there, remove the RUNC_DMZ=legacy as that is now the
default.

Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
Signed-off-by: Rodrigo Campos <rodrigo@sdfg.com.ar>
@rata
Copy link
Member Author

rata commented Feb 29, 2024

PTAL @lifubang

@rata
Copy link
Member Author

rata commented Mar 6, 2024

@lifubang @kolyshkin friendly ping?

@cyphar
Copy link
Member

cyphar commented Mar 7, 2024

Since RUNC_DMZ is now opt-in, IMHO we can remove the logic in libcontainer/dmz/selinux*.go. We must keep isDmzBinarySafe because we don't want an environment variable to make containers unsafe, but the compatibility logic isn't necessary if the user is explicitly opting in to the feature IMHO (for the same reason we are not going to add all of the capability-related compatibility code once we make runc-dmz opt-in).

WDYT @AkihiroSuda @kolyshkin?

Aside from that, looks good.

@AkihiroSuda
Copy link
Member

Can we just merge this, release v1.2.0 RC, and work on refactoring in a follow-up PR?

Copy link
Member

@cyphar cyphar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'll submit a separate patch removing the selinux logic...

@cyphar cyphar merged commit 1950892 into opencontainers:main Mar 13, 2024
45 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants