Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fix with runc container lifetime management #1272

Merged
merged 3 commits into from
Feb 3, 2022

Conversation

helsaawy
Copy link
Contributor

@helsaawy helsaawy commented Jan 10, 2022

Fixed bug where container is cast as a process, which then causes the
container to be deleted prematurely before when the container finishes
executing.
Currently, whenever an LCOW container is stopped, the logs show multiple
errors being raised that runc cannot find the container, which cause the
Kill command issued by containerd to exit unsuccessfully.

Added conversion of runc log file error strings into error types that
wrap HResult error types.
Wrapped runc errors from log file, which is more informative that error
returned from cmd execution.

Added traces to guest container operations, to trace low level container
operations.

Signed-off-by: Hamza El-Saawy hamzaelsaawy@microsoft.com

Comment on lines 147 to 148
entity := log.G(ctx).WithField(logfields.ContainerID, c.id)
entity.Debug("opengcs::Container::Delete")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the logs here be something other than debug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ill change them to Info

@@ -101,6 +110,34 @@ type standardLogEntry struct {
Err error `json:"error,omitempty"`
}

func (l *standardLogEntry) asError() (err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not the biggest fan of this function, trying to see if I can think of another way of doing things

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also am not a fan, but I couldn't find an example elsewhere of something similar, and simply wrapping the string an errors.New meant the string operations would be pushed elsewhere in the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is wrapping the string an issue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not directly reference the errors in https://github.com/opencontainers/runc/blob/master/libcontainer/error.go instead of re-hardcoding the strings here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it may be because of version issues (I think I was usuing v0.0.115), but the runc errors I was getting were container with id exists, and not what runc exports of container with given ID already exists. Similarly, container does not exist has the container id in the middle of it

I do want to revisit this in the future if we're confident those runc errors will hold steady and use those errors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the string wrapping, that means elsewhere in the code, if we want to match against the error returned, we'd have to use string search, and that would leak into the runc handling code

Copy link
Contributor

@anmaxvl anmaxvl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Fixed bug where container is cast as a process, which then causes the
container to be deleted prematurely before when the container finishes
executing.

Added conversion of runc log file error strings into `error` types that
wrap HResult error types.
Wrapped runc errors from log file, which is more informative that error
returned from cmd execution.

Added traces to guest container operations, to trace low level container
operations.

Signed-off-by: Hamza El-Saawy <hamzaelsaawy@microsoft.com>
Signed-off-by: Hamza El-Saawy <hamzaelsaawy@microsoft.com>
Signed-off-by: Hamza El-Saawy <hamzaelsaawy@microsoft.com>
Copy link
Contributor

@ambarve ambarve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@helsaawy helsaawy merged commit 71baff4 into microsoft:master Feb 3, 2022
@helsaawy helsaawy deleted the he/runcstate branch February 3, 2022 19:12
princepereira pushed a commit to princepereira/hcsshim that referenced this pull request Aug 29, 2024
Fixed bug where container is cast as a process, which then causes the
container to be deleted prematurely before when the container finishes
executing.
Currently, whenever an LCOW container is stopped, the logs show multiple
errors being raised that runc cannot find the container, which cause the
Kill command issued by containerd to exit unsuccessfully.

Added conversion of runc log file error strings into error types that
wrap HResult error types.
Wrapped runc errors from log file, which is more informative that error
returned from cmd execution.

Added traces to guest container operations, to trace low level container
operations.

Signed-off-by: Hamza El-Saawy hamzaelsaawy@microsoft.com
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants