Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shim cleanup on exit #1316

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

shim cleanup on exit #1316

wants to merge 2 commits into from

Conversation

helsaawy
Copy link
Contributor

@helsaawy helsaawy commented Mar 7, 2022

Fixed bug introduced by PR #1289 where graceful shim shutdown prevents containerd from deleting the sandbox bundle since
the shim is still using it and containerd errors, because anotherprocess is using that directory and the files within it.

containerd waits for the shim to close its ttrpc server before marking it as closed and cleaning up resources, such as the sandbox bundle. However, now the shim shuts down the ttrpc server and then exits gracefully, rather than executing an os.Exit(0), causing a timing error, where the shim is still alive, with the sandbox bundle as its working directory and the panic.log file within the bundle as its stderr, and contaienrd is attempting to delete the sandbox bundle directory that another process (the shim) is using.

The PR changes the working directory and closes the stderr of the shim during exit, before the ttrpc server is closed, so the bundle can be deleted while the shim finishes shutting down.

Signed-off-by: Hamza El-Saawy hamzaelsaawy@microsoft.com

@helsaawy helsaawy requested a review from a team as a code owner March 7, 2022 18:16
@jterry75
Copy link
Contributor

jterry75 commented Mar 7, 2022

Sad day. Can logrus even write a message after this point? I assume anything that happens in the shim post this call now has no way of being logged anywhere? Maybe thats ok since its such a small window now?

@helsaawy
Copy link
Contributor Author

helsaawy commented Mar 7, 2022

It should be able to, since its normally set up with ETW or a logging binary (via a pipe). Local testing works fine too.
(I think logrus also writes warning to stderr, so I dont know how go handles writing to that ...)

I figured closing errors could safely be ignored, but I can log them if you think they're worthwhile

Fixed bug introduced by earlier PR where graceful shim shutdown
prevents containerd from deleting the sandbox bundle since
the shim is still using it and containerd errors, because another
process is using that directory and the files within it.

Signed-off-by: Hamza El-Saawy <hamzaelsaawy@microsoft.com>
@jterry75
Copy link
Contributor

jterry75 commented Mar 7, 2022

Nah, this seems fine.

// alive.
// Change the directory to the parent and close stderr (panic.log) to
// allow bundle deletion to succeed.
if err := os.Chdir(".."); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something in me is screaming that we shouldn't do this, but I don't know why.

@helsaawy helsaawy marked this pull request as draft March 11, 2022 19:09
@helsaawy helsaawy force-pushed the he/close branch 2 times, most recently from 8101d00 to b9508da Compare March 23, 2022 16:04
Rather than changing the directory, have the ttrpc wait until cleanup is
done

Signed-off-by: Hamza El-Saawy <hamzaelsaawy@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants