-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix runc-dmz error printing #4172
Conversation
cc @lifubang |
3b64fce
to
9f667ed
Compare
The other tricks I mentioned might be possible, but for now this seems like the best path. The runc-dmz binary is still less than 1MB with libc and the runc binary is 14MB, it is still a big difference. |
Personally, I don't care about error messages or locales with runc-dmz. If we had an assembly version I would also use that. |
@cyphar but any objection to do this, so we print an error when there is an error on exec? |
I think the purpose of |
Since the error is printed by the Go code which can do errno -> string translation, why don't we do that (i.e. add a little postprocessing)? |
@kolyshkin the error is not printed by the go code, that is the issue. I mentioned it here. In the case that the container binary is compiled for another architechture (just one possible fail to exec), the go code execs into runc-dmz just fine, but then runc-dmz fails to exec into the binary. Therefore, this patch is just adding the error printing to exec in the runc-dmz code. |
I think we should just remove |
Yes, I agree, I think we can announce that what is the minimal memory that runc required. Then when k8s(or other projects) bumps new runc version, they should meet the required minimal memory. |
But it is still 21x smaller than the runc binary (600K), the mem overhead on startup without it can be a 1 or more gigas if you have some churn in a k8s environment without this. With this it can be ~60MB in a similar environment. I think its worth a try, it's very easy to remove it later if we want to too. |
IMHO, even if we are going to not use runc-dmz, we should just switch the default to be without it. But I'd keep it in case the overhead in some cases is huge, we can easily activate it and solve those issues (with the small adjustments needed). And we can better use our time for other solutions for runc 1.3, if we want to. |
Right. I guess we can just print the errno then (as the probability of an error here is low). Maybe add a XXX comment to the source code saying there's no strerror in nolibc. The error message could also suggest re-running runc with runc-dmz disabled to get a better error, but maybe I'm overreaching. I also agree with @rata that we should make runc-dmz non-default (and experimental) because of all these issues (we maintain a list of those in #4158). Depending on how it goes, we will either improve it further or remove it in future releases. |
If printing the errno is an option and doesn't inflate the binary size, we should do that. The only purpose of |
363b5cf
to
00acc36
Compare
Sure, I've updated the PR to do just that, then. PTAL :) |
libcontainer/dmz/_dmz.c
Outdated
char *prefix = "exec "; | ||
int err_len = strlen(prefix) + strlen(argv[0]) + 1; | ||
char err[err_len]; | ||
|
||
strcpy(err, prefix); | ||
strcpy(err + strlen(prefix), argv[0]); | ||
err[err_len - 1] = '\0'; | ||
|
||
perror(err); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't something like this be simpler?
char *prefix = "exec "; | |
int err_len = strlen(prefix) + strlen(argv[0]) + 1; | |
char err[err_len]; | |
strcpy(err, prefix); | |
strcpy(err + strlen(prefix), argv[0]); | |
err[err_len - 1] = '\0'; | |
perror(err); | |
char err[5 + PATH_MAX] = "exec "; // "exec " + argv[0] | |
strlcat(err, argv[0], sizeof(err)); | |
err[sizeof(err) - 1] = '\0'; | |
perror(err); |
No dynamic memory management, no VFAs, and no error-prone pointer maths.
Unfortunately, for some reason strlcat
has issues even though it is defined in nolibc
(as is strlen
)...
make[1]: Entering directory '/home/cyphar/src/github.com/opencontainers/runc/libcontainer/dmz'
gcc -fno-asynchronous-unwind-tables -fno-ident -s -Os -nostdlib -lgcc -static -o binary/runc-dmz _dmz.c
/usr/lib64/gcc/x86_64-suse-linux/13/../../../../x86_64-suse-linux/bin/ld: /tmp/ccIfhSHD.o: in function `main':
_dmz.c:(.text.startup+0x5e): undefined reference to `strlen'
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:18: binary/runc-dmz] Error 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, weird. This might fun to investigate. Just removing the __attribute__((unused))
from the function definition in nolibc, makes it work. So, for some reason it is removing a function that is used at link-time.
But for our use case, we can just do that with strcpy()
and be happy. So I've updated it to that :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, with strncat()
it's simpler, updated to that :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, I've chased down the bug to use strlcpy()
and fixed it upstream, the series was applied a few months ago: https://lore.kernel.org/all/20240219204821.GA9819@1wt.eu/
The issue was gcc being too smart and replacing things that implement strlen/friends with a call to its builtin implementation, that when we compile without stdlib it of course fails.
Needless to say, we don't need to do anything in runc, as we are not using strlcpy()
.
@rata Like you mentioned in here: #4173 (comment) |
d1dda61
to
12c7025
Compare
@lifubang Now that there is agreement with this, I added one. I copied from yours, it is a nice way to force a failure at that point, thanks :) |
I'll be afk next week. If this is not ready to merge as-is, can someone please pick it up so we merge this? :) |
PTAL |
3e417b4
to
90dba15
Compare
This error code is using functions that are present in nolibc too. When using nolibc, the error is printed like: exec /runc.armel: errno=8 When using libc, as its perror() implementation translates the errno to a message, it is printed like: exec /runc.armel: exec format error Note that when using libc, the error is printed in the same way as before. Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
This error code is using functions that are present in nolibc too.
When using nolibc, the error is printed like:
When using libc, as its perror() implementation translates the errno to
a message, it is printed like:
Note that when using libc, the error is printed in the same way as
before.
We can try to do things to expand errno to a string with nolibc (like using the system's errno definitions and reimplement
strerror()
with those, ignoring locales; or try to send the errno back to go and using go unix package to print it), but it's not clear runc-dmz is a good idea in the first place, so let's go the easy path for now.We can later revisit this if we want runc-dmz and we want to really reduce the size with nolibc.
Updates: #4170 #4158