Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fd resue race causes runc init can't start due to a go stdlib bug #4294

Closed
Tracked by #4114
kolyshkin opened this issue May 28, 2024 · 6 comments · Fixed by #4452
Closed
Tracked by #4114

Fd resue race causes runc init can't start due to a go stdlib bug #4294

kolyshkin opened this issue May 28, 2024 · 6 comments · Fixed by #4452
Labels
Milestone

Comments

@kolyshkin
Copy link
Contributor

kolyshkin commented May 28, 2024

Description

From https://cirrus-ci.com/task/6471857094787072:

=== RUN TestInitJoinPID
exec_test.go:1471: unexpected error: unable to start container process: unable to start init: fork/exec /proc/self/fd/7: permission denied
--- FAIL: TestInitJoinPID (0.29s)

I've only seen it happen once. Filing for visibility.

EDIT:
It's not a flaky test, but a really bug in practice.

@kolyshkin kolyshkin changed the title flake in flake in TestInitJoinPID on CentOS 7 May 28, 2024
@kolyshkin
Copy link
Contributor Author

CentOS 7 is gone (see #4333), so this one can be closed I guess.

@kolyshkin
Copy link
Contributor Author

I saw this once on Ubuntu 24.04 now:

https://github.com/opencontainers/runc/actions/runs/10823144914/job/30028174423?pr=4358

...
=== RUN   TestSharedPidnsInitKill
--- PASS: TestSharedPidnsInitKill (0.18s)
=== RUN   TestInitJoinPID
    exec_test.go:1444: unexpected error: unable to start container process: unable to start init: fork/exec /proc/self/fd/7: permission denied
--- FAIL: TestInitJoinPID (0.14s)
=== RUN   TestInitJoinNetworkAndUser
--- PASS: TestInitJoinNetworkAndUser (0.33s)
=== RUN   TestTmpfsCopyUp
...

So it might be a genuine issue with the test case.

@kolyshkin kolyshkin reopened this Sep 12, 2024
@kolyshkin
Copy link
Contributor Author

One more, in test (ubuntu-20.04, 1.23.x, criu-dev) but with a different test. From logs

=== RUN   TestSeccompPermitWriteMultipleConditions
    seccomp_test.go:251: |: unable to start container process: unable to start init: fork/exec /proc/self/fd/7: permission denied
--- FAIL: TestSeccompPermitWriteMultipleConditions (0.13s)

@kolyshkin kolyshkin changed the title flake in TestInitJoinPID on CentOS 7 flaky unit tests (unable to start container process: unable to start init: fork/exec /proc/self/fd/7: permission denied) Sep 13, 2024
@kolyshkin
Copy link
Contributor Author

Another failure in test (ubuntu-20.04, 1.23.x, -race). From the logs:

=== RUN   TestSeccompDenyWriteConditional
    seccomp_test.go:205: unexpected error: unable to start container process: unable to start init: fork/exec /proc/self/fd/7: permission denied
--- FAIL: TestSeccompDenyWriteConditional (0.14s)

@kolyshkin
Copy link
Contributor Author

From https://github.com/opencontainers/runc/actions/runs/11300641540/job/31433805394?pr=4441 (cross-i386):

=== RUN   TestRootfsPropagationSharedMount
    exec_test.go:1288: unexpected error: unable to start container process: unable to start init: fork/exec /proc/self/fd/7: permission denied
--- FAIL: TestRootfsPropagationSharedMount (0.13s)

lifubang added a commit to lifubang/runc that referenced this issue Oct 17, 2024
In opencontainers#3987(0e9a335), we may use a memfd to copy run to start runc init,
due to a Go stdlib bug, we need to add safeExe to the set of
ExtraFiles otherwise it is possible for the stdlib to clobber the fd
during forkAndExecInChild1 and replace it with some other file that
might be malicious. This is less than ideal (because the descriptor
will be non-O_CLOEXEC) however we have protections in "runc init" to
stop us from leaking extra file descriptors.
See <golang/go#61751>.

But because of we have added safeExe to the set of ExtraFiles, if the
fd of safeExe is too small, go stdlib will dup3 it to another fd, then
it will cause the original fd closed. (opencontainers#4294)

Signed-off-by: lfbzhm <lifubang@acmcoder.com>
lifubang added a commit to lifubang/runc that referenced this issue Oct 17, 2024
In opencontainers#3987(0e9a335), we may use a memfd to copy run to start runc init,
due to a Go stdlib bug, we need to add safeExe to the set of
ExtraFiles otherwise it is possible for the stdlib to clobber the fd
during forkAndExecInChild1 and replace it with some other file that
might be malicious. This is less than ideal (because the descriptor
will be non-O_CLOEXEC) however we have protections in "runc init" to
stop us from leaking extra file descriptors.
See <golang/go#61751>.

There is a race situation when we are opening this memfd, if the fd 5
or 6 was closed at that time, maybe it will be reused by memfd.

But because of we have added safeExe to the set of ExtraFiles, if the
fd of safeExe is than stdio fds count + ExtraFiles count, go stdlib will
dup3 it to another fd, then it will cause the original fd closed. (opencontainers#4294)

Signed-off-by: lfbzhm <lifubang@acmcoder.com>
lifubang added a commit to lifubang/runc that referenced this issue Oct 17, 2024
In opencontainers#3987(0e9a335), we may use a memfd to copy run to start runc init,
due to a Go stdlib bug, we need to add safeExe to the set of
ExtraFiles otherwise it is possible for the stdlib to clobber the fd
during forkAndExecInChild1 and replace it with some other file that
might be malicious. This is less than ideal (because the descriptor
will be non-O_CLOEXEC) however we have protections in "runc init" to
stop us from leaking extra file descriptors.
See <golang/go#61751>.

There is a race situation when we are opening this memfd, if the fd 5
or 6 was closed at that time, maybe it will be reused by memfd.

But because of we have added safeExe to the set of ExtraFiles, if the
fd of safeExe is than stdio fds count + ExtraFiles count, go stdlib will
dup3 it to another fd, then it will cause the original fd closed. (opencontainers#4294)

Signed-off-by: lfbzhm <lifubang@acmcoder.com>
lifubang added a commit to lifubang/runc that referenced this issue Oct 17, 2024
In opencontainers#3987(0e9a335), we may use a memfd to copy run to start runc init,
due to a Go stdlib bug, we need to add safeExe to the set of
ExtraFiles otherwise it is possible for the stdlib to clobber the fd
during forkAndExecInChild1 and replace it with some other file that
might be malicious. This is less than ideal (because the descriptor
will be non-O_CLOEXEC) however we have protections in "runc init" to
stop us from leaking extra file descriptors.
See <golang/go#61751>.

There is a race situation when we are opening this memfd, if the fd 5
or 6 was closed at that time, maybe it will be reused by memfd.

But because of we have added safeExe to the set of ExtraFiles, if the
fd of safeExe is than stdio fds count + ExtraFiles count, go stdlib will
dup3 it to another fd, then it will cause the original fd closed. (opencontainers#4294)

Signed-off-by: lfbzhm <lifubang@acmcoder.com>
lifubang added a commit to lifubang/runc that referenced this issue Oct 17, 2024
In opencontainers#3987(0e9a335), we may use a memfd to copy run to start runc init,
due to a Go stdlib bug, we need to add safeExe to the set of
ExtraFiles otherwise it is possible for the stdlib to clobber the fd
during forkAndExecInChild1 and replace it with some other file that
might be malicious. This is less than ideal (because the descriptor
will be non-O_CLOEXEC) however we have protections in "runc init" to
stop us from leaking extra file descriptors.
See <golang/go#61751>.

There is a race situation when we are opening this memfd, if the fd 6
or 7 was closed at that time, maybe it will be reused by memfd.

But because of we have added safeExe to the set of ExtraFiles, if the
fd of safeExe is not bigger than stdio fds count + ExtraFiles count, go
stdlib will dup3 it to another fd, then it will cause the original fd
closed. (opencontainers#4294)

Signed-off-by: lfbzhm <lifubang@acmcoder.com>
lifubang added a commit to lifubang/runc that referenced this issue Oct 18, 2024
In opencontainers#3987(0e9a335), we may use a memfd to copy run to start runc init,
due to a Go stdlib bug, we need to add safeExe to the set of
ExtraFiles otherwise it is possible for the stdlib to clobber the fd
during forkAndExecInChild1 and replace it with some other file that
might be malicious. This is less than ideal (because the descriptor
will be non-O_CLOEXEC) however we have protections in "runc init" to
stop us from leaking extra file descriptors.
See <golang/go#61751>.

There is a race situation when we are opening this memfd, if the fd 6
or 7 was closed at that time, maybe it will be reused by memfd.

Because we want to add safeExe to the set of ExtraFiles, if the fd of
safeExe is too small, go stdlib will dup3 it to another fd, or dup3 a
other fd to this fd, then it will cause the fd type cmd.Path refers to
a random path. (issue: opencontainers#4294)

Signed-off-by: lfbzhm <lifubang@acmcoder.com>
@lifubang
Copy link
Member

This is really a bug in practice, I can reproduce it in local.
We can start 1000 containers quickly, then we will have a change to reproduce.
Especially apply with the PR #4448, I don't know why, maybe because this PR make runc start more quickly than before.

@lifubang lifubang added this to the 1.2.0 milestone Oct 18, 2024
lifubang added a commit to lifubang/runc that referenced this issue Oct 18, 2024
In opencontainers#3987(0e9a335), we may use a memfd to copy run to start runc init,
due to a Go stdlib bug, we need to add safeExe to the set of
ExtraFiles otherwise it is possible for the stdlib to clobber the fd
during forkAndExecInChild1 and replace it with some other file that
might be malicious. This is less than ideal (because the descriptor
will be non-O_CLOEXEC) however we have protections in "runc init" to
stop us from leaking extra file descriptors.
See <golang/go#61751>.

There is a race situation when we are opening this memfd, if the fd 6
or 7 was closed at that time, maybe it will be reused by memfd.

Because we want to add safeExe to the set of ExtraFiles, if the fd of
safeExe is too small, go stdlib will dup3 it to another fd, or dup3 a
other fd to this fd, then it will cause the fd type cmd.Path refers to
a random path. (issue: opencontainers#4294)

Signed-off-by: lfbzhm <lifubang@acmcoder.com>
lifubang added a commit to lifubang/runc that referenced this issue Oct 21, 2024
Due to a Go stdlib bug, it is possible for the stdlib to clobber the fd
during forkAndExecInChild1 and replace it with some other file that
might be malicious. See <golang/go#61751>.
It will cause runc init process can't start. (opencontainers#4294)
It only occurs when we are using a fd type string, for example:
proc/self/fd/7, as a cmd path to start runc init, because there is a fd
reuse race, if some small fd closed, the kernel may reuse this fd to
refer to runc binary. If this fd num is small than the length of
`cmd.ExtraFiles`, it will hit this Go stdlib bug. If we found this
situation, we can dup it as a new bigger fd num to avoid.

Signed-off-by: lfbzhm <lifubang@acmcoder.com>
lifubang added a commit to lifubang/runc that referenced this issue Oct 21, 2024
Due to a Go stdlib bug, it is possible for the stdlib to clobber the fd
during forkAndExecInChild1 and replace it with some other file that
might be malicious. See <golang/go#61751>.
It will cause runc init process can't start. (opencontainers#4294)
It only occurs when we are using a fd type string, for example:
proc/self/fd/7, as a cmd path to start runc init, because there is a fd
reuse race, if some small fd closed, the kernel may reuse this fd to
refer to runc binary. If this fd num is small than the length of
`cmd.ExtraFiles`, it will hit this Go stdlib bug. If we found this
situation, we can dup it as a new bigger fd num to avoid.

Signed-off-by: lfbzhm <lifubang@acmcoder.com>
@lifubang lifubang changed the title flaky unit tests (unable to start container process: unable to start init: fork/exec /proc/self/fd/7: permission denied) Fd resue race causes runc init can't start due to a go stdlib bug Oct 21, 2024
@lifubang lifubang mentioned this issue Oct 21, 2024
21 tasks
lifubang added a commit to lifubang/runc that referenced this issue Oct 21, 2024
Due to a Go stdlib bug, it is possible for the stdlib to clobber the fd
during forkAndExecInChild1 and replace it with some other file that
might be malicious. See <golang/go#61751>.
It will cause runc init process can't start. (opencontainers#4294)
It only occurs when we are using a fd type string, for example:
proc/self/fd/7, as a cmd path to start runc init, because there is a fd
reuse race, if some small fd closed, the kernel may reuse this fd to
refer to runc binary. If this fd num is small than the length of
`cmd.ExtraFiles`, it will hit this Go stdlib bug. If we found this
situation, we can dup it as a new bigger fd num to avoid.

Signed-off-by: lfbzhm <lifubang@acmcoder.com>
lifubang added a commit to lifubang/runc that referenced this issue Oct 21, 2024
There is a race situation when we are opening a file, if there is a
small fd was closed at that time, maybe it will be reused by safeExe.
Because of Go stdlib fds shuffling bug, if the fd of safeExe is too
small, go stdlib will dup3 it to another fd, or dup3 a other fd to this
fd, then it will cause the fd type cmd.Path refers to a random path,
and it can lead to an error "permission denied" when starting the process.
Please see opencontainers#4294 and <golang/go#61751>.
So we should not use the original fd of safeExe, but use the fd after
shuffled by Go stdlib. Because Go stdlib will guarantee this fd refers to
the correct file.

Signed-off-by: lfbzhm <lifubang@acmcoder.com>
lifubang added a commit to lifubang/runc that referenced this issue Oct 21, 2024
There is a race situation when we are opening a file, if there is a
small fd was closed at that time, maybe it will be reused by safeExe.
Because of Go stdlib fds shuffling bug, if the fd of safeExe is too
small, go stdlib will dup3 it to another fd, or dup3 a other fd to this
fd, then it will cause the fd type cmd.Path refers to a random path,
and it can lead to an error "permission denied" when starting the process.
Please see opencontainers#4294 and <golang/go#61751>.
So we should not use the original fd of safeExe, but use the fd after
shuffled by Go stdlib. Because Go stdlib will guarantee this fd refers to
the correct file.

Signed-off-by: lfbzhm <lifubang@acmcoder.com>
lifubang added a commit to lifubang/runc that referenced this issue Oct 21, 2024
There is a race situation when we are opening a file, if there is a
small fd was closed at that time, maybe it will be reused by safeExe.
Because of Go stdlib fds shuffling bug, if the fd of safeExe is too
small, go stdlib will dup3 it to another fd, or dup3 a other fd to this
fd, then it will cause the fd type cmd.Path refers to a random path,
and it can lead to an error "permission denied" when starting the process.
Please see opencontainers#4294 and <golang/go#61751>.
So we should not use the original fd of safeExe, but use the fd after
shuffled by Go stdlib. Because Go stdlib will guarantee this fd refers to
the correct file.

Signed-off-by: lfbzhm <lifubang@acmcoder.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants