Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runc-facade squashes rare runc errors #12365

Closed
utam0k opened this issue Aug 25, 2022 · 4 comments · Fixed by #14259
Closed

runc-facade squashes rare runc errors #12365

utam0k opened this issue Aug 25, 2022 · 4 comments · Fixed by #14259
Assignees
Labels
team: workspace Issue belongs to the Workspace team

Comments

@utam0k
Copy link
Contributor

utam0k commented Aug 25, 2022

Is your feature request related to a problem? Please describe

This issue came from #9247 (comment)

Rarely runc will cause an error due to something (probably seccomp notify). It is very likely that these errors are difficult to resolve at the root. Therefore, runc-facade is used to retry and squash these errors.

This problem still happens recently, the gcp log
image

Solution

Please give it a retry when runc returns any error around here

err = syscall.Exec(runcPath, os.Args, os.Environ())
if err != nil {
return xerrors.Errorf("exec %s: %w", runcPath, err)
}

How to reproduce

Open this repository
https://github.com/spearki/gitpod-runc-issue-repro

Describe the behaviour you'd like

Wrote above

NOTE

@kylos101
Copy link
Contributor

Thanks @utam0k!

runc-facade is used to retry and squash these errors.

How long, or how many times should runc-facade do retry before it gives up?

Are there any scenarios where we would NOT want to retry, because retrying would be disruptive to the user experience?

@utam0k
Copy link
Contributor Author

utam0k commented Aug 25, 2022

At least one time. Even so, it should be effective enough.

How long, or how many times should runc-facade do retry before it gives up?

Unfortunately, when the user application gets failed in starting a container, runc-facade tried to retry. Ideally, we have to distinguish between the error from runc or a user application, but we can't now. So in case of user application failure, we will have to retry in vain.
Fortunately, runc creating time is very fast, I recommend, don't worry to retry if the user application gets failed
https://github.com/containers/youki#motivation

Are there any scenarios where we would NOT want to retry, because retrying would be disruptive to the user experience?

@kylos101
Copy link
Contributor

@utam0k I removed this from breakdown, but, as you shared it will only take 30m, please feel free to assign yourself and set status toIn-Progress.

If it ends up taking more time, please move back to our inbox (no status).

@utam0k utam0k self-assigned this Oct 28, 2022
@utam0k utam0k moved this to In Progress in 🌌 Workspace Team Oct 28, 2022
Repository owner moved this from In Progress to Awaiting Deployment in 🌌 Workspace Team Oct 28, 2022
@jenting
Copy link
Contributor

jenting commented Oct 31, 2022

Therefore, runc-facade is used to retry and squash these errors.

@utam0k
Could you please point me to where is the runc-facade retry code?

@jenting jenting moved this from Awaiting Deployment to In Validation in 🌌 Workspace Team Nov 4, 2022
@utam0k utam0k moved this from In Validation to Done in 🌌 Workspace Team Nov 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team: workspace Issue belongs to the Workspace team
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants