Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix runc-dmz error printing #4172

Merged
merged 2 commits into from
Jan 22, 2024
Merged

Conversation

rata
Copy link
Member

@rata rata commented Jan 12, 2024

This error code is using functions that are present in nolibc too.

When using nolibc, the error is printed like:

exec /runc.armel: errno=8

When using libc, as its perror() implementation translates the errno to
a message, it is printed like:

exec /runc.armel: exec format error

Note that when using libc, the error is printed in the same way as
before.


We can try to do things to expand errno to a string with nolibc (like using the system's errno definitions and reimplement strerror() with those, ignoring locales; or try to send the errno back to go and using go unix package to print it), but it's not clear runc-dmz is a good idea in the first place, so let's go the easy path for now.

We can later revisit this if we want runc-dmz and we want to really reduce the size with nolibc.

Updates: #4170 #4158

@rata
Copy link
Member Author

rata commented Jan 12, 2024

cc @lifubang

@rata rata force-pushed the rata/runc-dmz branch 3 times, most recently from 3b64fce to 9f667ed Compare January 12, 2024 11:59
@rata
Copy link
Member Author

rata commented Jan 12, 2024

I'm unsure if a simpler option might be to handle this on go. We have lot of wrappers already, it seems like some error printing is missing maybe Nah, brain fart, exec works to the runc-dmz (the Fexecve in go works, it is the execve in runc-dmz that fails), so nothing we can catch looking at Fexecve failures in go.

The other tricks I mentioned might be possible, but for now this seems like the best path. The runc-dmz binary is still less than 1MB with libc and the runc binary is 14MB, it is still a big difference.

@cyphar
Copy link
Member

cyphar commented Jan 12, 2024

Personally, I don't care about error messages or locales with runc-dmz. If we had an assembly version I would also use that.

@rata
Copy link
Member Author

rata commented Jan 12, 2024

@cyphar but any objection to do this, so we print an error when there is an error on exec?

@cyphar
Copy link
Member

cyphar commented Jan 15, 2024

I think the purpose of runc-dmz is unclear if we remove nolibc.

@rata
Copy link
Member Author

rata commented Jan 15, 2024

@cyphar as I said here, it's still 21x smaller (on my host runc is 14100, runc-dmz with libc is 668.0).

@lifubang lifubang mentioned this pull request Jan 16, 2024
21 tasks
@kolyshkin
Copy link
Contributor

Since the error is printed by the Go code which can do errno -> string translation, why don't we do that (i.e. add a little postprocessing)?

@rata
Copy link
Member Author

rata commented Jan 16, 2024

@kolyshkin the error is not printed by the go code, that is the issue. I mentioned it here.

In the case that the container binary is compiled for another architechture (just one possible fail to exec), the go code execs into runc-dmz just fine, but then runc-dmz fails to exec into the binary. Therefore, this patch is just adding the error printing to exec in the runc-dmz code.

@cyphar
Copy link
Member

cyphar commented Jan 16, 2024

I think we should just remove runc-dmz. While it was a cute idea, it's clearly far more trouble than it is worth.

@lifubang
Copy link
Member

Yes, I agree, I think we can announce that what is the minimal memory that runc required. Then when k8s(or other projects) bumps new runc version, they should meet the required minimal memory.

@rata
Copy link
Member Author

rata commented Jan 16, 2024

But it is still 21x smaller than the runc binary (600K), the mem overhead on startup without it can be a 1 or more gigas if you have some churn in a k8s environment without this. With this it can be ~60MB in a similar environment.

I think its worth a try, it's very easy to remove it later if we want to too.

@rata
Copy link
Member Author

rata commented Jan 16, 2024

IMHO, even if we are going to not use runc-dmz, we should just switch the default to be without it. But I'd keep it in case the overhead in some cases is huge, we can easily activate it and solve those issues (with the small adjustments needed).

And we can better use our time for other solutions for runc 1.3, if we want to.

@kolyshkin
Copy link
Contributor

@kolyshkin the error is not printed by the go code, that is the issue. I mentioned it here.

Right. I guess we can just print the errno then (as the probability of an error here is low). Maybe add a XXX comment to the source code saying there's no strerror in nolibc. The error message could also suggest re-running runc with runc-dmz disabled to get a better error, but maybe I'm overreaching.

I also agree with @rata that we should make runc-dmz non-default (and experimental) because of all these issues (we maintain a list of those in #4158). Depending on how it goes, we will either improve it further or remove it in future releases.

@cyphar
Copy link
Member

cyphar commented Jan 17, 2024

If printing the errno is an option and doesn't inflate the binary size, we should do that. The only purpose of runc-dmz is to be small -- increasing the binary size 250x to get slightly prettier errors doesn't make sense.

@rata rata force-pushed the rata/runc-dmz branch 2 times, most recently from 363b5cf to 00acc36 Compare January 18, 2024 10:35
@rata rata changed the title Fix runc-dmz error printing and disable nolibc Fix runc-dmz error printing Jan 18, 2024
@rata
Copy link
Member Author

rata commented Jan 18, 2024

Sure, I've updated the PR to do just that, then. PTAL :)

libcontainer/dmz/_dmz.c Outdated Show resolved Hide resolved
Comment on lines 18 to 26
char *prefix = "exec ";
int err_len = strlen(prefix) + strlen(argv[0]) + 1;
char err[err_len];

strcpy(err, prefix);
strcpy(err + strlen(prefix), argv[0]);
err[err_len - 1] = '\0';

perror(err);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't something like this be simpler?

Suggested change
char *prefix = "exec ";
int err_len = strlen(prefix) + strlen(argv[0]) + 1;
char err[err_len];
strcpy(err, prefix);
strcpy(err + strlen(prefix), argv[0]);
err[err_len - 1] = '\0';
perror(err);
char err[5 + PATH_MAX] = "exec "; // "exec " + argv[0]
strlcat(err, argv[0], sizeof(err));
err[sizeof(err) - 1] = '\0';
perror(err);

No dynamic memory management, no VFAs, and no error-prone pointer maths.

Unfortunately, for some reason strlcat has issues even though it is defined in nolibc (as is strlen)...

make[1]: Entering directory '/home/cyphar/src/github.com/opencontainers/runc/libcontainer/dmz'
gcc -fno-asynchronous-unwind-tables -fno-ident -s -Os -nostdlib -lgcc -static -o binary/runc-dmz _dmz.c
/usr/lib64/gcc/x86_64-suse-linux/13/../../../../x86_64-suse-linux/bin/ld: /tmp/ccIfhSHD.o: in function `main':
_dmz.c:(.text.startup+0x5e): undefined reference to `strlen'
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:18: binary/runc-dmz] Error 1

Copy link
Member Author

@rata rata Jan 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, weird. This might fun to investigate. Just removing the __attribute__((unused)) from the function definition in nolibc, makes it work. So, for some reason it is removing a function that is used at link-time.

But for our use case, we can just do that with strcpy() and be happy. So I've updated it to that :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, with strncat() it's simpler, updated to that :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, I've chased down the bug to use strlcpy() and fixed it upstream, the series was applied a few months ago: https://lore.kernel.org/all/20240219204821.GA9819@1wt.eu/

The issue was gcc being too smart and replacing things that implement strlen/friends with a call to its builtin implementation, that when we compile without stdlib it of course fails.

Needless to say, we don't need to do anything in runc, as we are not using strlcpy().

libcontainer/dmz/_dmz.c Outdated Show resolved Hide resolved
@lifubang
Copy link
Member

@rata Like you mentioned in here: #4173 (comment)
I think we need to add test here too.

@rata rata force-pushed the rata/runc-dmz branch 3 times, most recently from d1dda61 to 12c7025 Compare January 18, 2024 16:50
@rata
Copy link
Member Author

rata commented Jan 18, 2024

@lifubang Now that there is agreement with this, I added one. I copied from yours, it is a nice way to force a failure at that point, thanks :)

@rata
Copy link
Member Author

rata commented Jan 19, 2024

I'll be afk next week. If this is not ready to merge as-is, can someone please pick it up so we merge this? :)

libcontainer/dmz/_dmz.c Outdated Show resolved Hide resolved
tests/integration/run.bats Outdated Show resolved Hide resolved
libcontainer/dmz/_dmz.c Outdated Show resolved Hide resolved
@rata rata requested a review from cyphar January 21, 2024 14:14
@rata
Copy link
Member Author

rata commented Jan 21, 2024

PTAL

@rata rata force-pushed the rata/runc-dmz branch 2 times, most recently from 3e417b4 to 90dba15 Compare January 21, 2024 14:31
rata added 2 commits January 21, 2024 15:35
This error code is using functions that are present in nolibc too.

When using nolibc, the error is printed like:

	exec /runc.armel: errno=8

When using libc, as its perror() implementation translates the errno to
a message, it is printed like:

	exec /runc.armel: exec format error

Note that when using libc, the error is printed in the same way as
before.

Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
@lifubang lifubang merged commit 4baaf18 into opencontainers:main Jan 22, 2024
45 checks passed
@rata rata deleted the rata/runc-dmz branch January 26, 2024 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants