-
Notifications
You must be signed in to change notification settings - Fork 351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Implement a fallback when clone3
returns ENOSYS
#2030
Comments
Thanks for the great survey, I honestly hadn't considered seccomp. Sorry. Can you briefly tell me what the security issues are?
|
No worries. It has been interesting reading up all the email threads in about The main issue is that some are choose to block
An example of this is how firebox and chrome choose to block
There are more links you can dig further from the mozilla bug I referenced above. As a result, platforms may choose to block With that being said, |
Thanks for the detailed survey, excellent. Would the solution be to fall back to clone(2)? |
Fall back to |
I agree with @yihuaf. This fallback is needed. However, the version of linux kernel support will not be lowered. |
Wait a second, I think we made a mistake here. We are not suppose to bypass libc to call the raw syscall, for clone3 or clone... We need to go through libc for this. There are internal bookkeepings we bypass if we call the raw syscalls. In fact, |
What problems do we have if we bypass it? |
Through my research on clone and clone3, it is my understand that we should not be bypassing the libc to create process (through kernel clone or clone3 syscall). Libc internally does bookkeepings with regarding to both process and threads. I am not familiar with the internal libc implementation details but there are a number of places (glibc email threads and github issues on other projects) where lib authors (both musl and glibc) stated that people should not be bypassing libc. Specifically, if people call kernel syscall clone/clone3 to create a process, they may not call libc functions again. It may work but may also fail in subtle ways that they will not support. In another word, in their design, this usecase falls into the undefined behavior case. With that being said, there is currently no For conclusion, we should keep the existing clone3 logic and implements the fallback. In the case that clone3 becomes an issue, we can easily switch to the fallback. In the future when libc supports the |
This is done via #2121 |
During the recent research on seccomp while trying to resolve #2022, I learned that
clone3
returning ENOSYS or EPERM is actually an issue in the security context, not as simple as whether the kernel has a minimal version. Specifically, this reminded me of #1861. In this issue, theclone3
returned ENOSYS, but the kernel version is 5.15, which should definitely supportclone3
. I suspect that it is because there are rules on the host system that blocksclone3
but returns ENOSYS instead. Normally, these blocked calls would return EPERM, but returning EPERM forclone3
would break glibc and therefore almost all applications. Therefore, it is very likely thatclone3
is blocked asENOSYS
and the application should just fallback to fork or clone.However, we choose to not implement a fallback. I believe we were naive at the time thinking that as long as we mandate a minimum kernel version, this should not be an issue. However, since now there are real evidence that real system has these behaviors, we may want to reconsider the assumptions we had at the time. Falling back to
clone
from a implementation perspective is not hard. We already have the change, twice, LOL.On the other hand,
clone3
is the only way to use new features such as the new time namespace. I am not sure about the implication on this just yet.@containers/youki-maintainers comments?
A few note for references. This is PR #1610 where we decided to use
clone3
. We also have to useclone
because fork doesn't let us control create a sibling process.The text was updated successfully, but these errors were encountered: