-
Notifications
You must be signed in to change notification settings - Fork 950
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bugfix: avoid the deadlock when failed to remove invalid sandbox #2073
Conversation
cri/v1alpha1/cri_utils.go
Outdated
// Kubelet will use list to get the sandboxes, but will not get the status of the failed pod | ||
// whose meta data has not been put into the Sandbox Store. And Kubelet will keep trying to | ||
// get the status of the failed pod and won't create a new one to replace it. It's a DEAD LOCK. | ||
// Actually Kubelet should not know the existent of invalid pod whose meta data won't be in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
%s/existent/existence ?
cri/v1alpha1/cri_utils.go
Outdated
// it is still running? | ||
if status != apitypes.StatusRunning && status != apitypes.StatusCreated { | ||
// Remove invalid sandbox. | ||
c.ContainerMgr.Remove(ctx, sandbox.ID, &apitypes.ContainerRemoveOptions{Volumes: true, Force: true}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove is a dangerous action, should we just print a warning log and just keep the invalid containers.
Maybe some containers are invalid for cri, but valid for pouchd, WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The containers belong to pouchd have been filtered already.
The containers here are CRI sandbox.
If we don't remove the invalid sandbox container here, they will never be removed :)
ping @YaoZengzeng |
I think add checking invalid sandboxes is necessary, but we should try to know why remove container is failed? |
@rudyfly Yes, maybe we should print the error message when failed to remove the invalid sandbox. |
Codecov Report
@@ Coverage Diff @@
## master #2073 +/- ##
==========================================
- Coverage 65% 64.87% -0.14%
==========================================
Files 209 209
Lines 16227 16275 +48
==========================================
+ Hits 10548 10558 +10
- Misses 4370 4399 +29
- Partials 1309 1318 +9
|
if err != nil { | ||
return nil, fmt.Errorf("failed to filter invalid sandboxes: %v", err) | ||
} | ||
|
||
sandboxes := make([]*runtime.PodSandbox, 0, len(sandboxList)) | ||
for _, s := range sandboxList { | ||
sandbox, err := toCriSandbox(s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about put the outermost loop of filterInvalidSandboxes
here, WDYT?
Expect that, LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... It's a code organization issue, I want to encapsulate all the logic into this function.
Now, I think it's OK.
As the code evolve, maybe it will be better to put the loop outside :)
74b449b
to
0fa378e
Compare
Signed-off-by: YaoZengzeng <yaozengzeng@zju.edu.cn>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM @rudyfly please check it again. Thanks
Signed-off-by: YaoZengzeng yaozengzeng@zju.edu.cn
Ⅰ. Describe what this PR did
It has the possibility that we failed to run the sandbox and it is not being cleaned up.
Kubelet will use list to get the sandboxes, but will not get the status of the failed pod whose meta data has not been put into the Sandbox Store.
And Kubelet will keep trying to get the status of the failed pod and won't create a new one to replace it.
It's a DEAD LOCK.
Actually Kubelet should not know the existent of invalid pod whose meta data won't be in the Sandbox Store.
So we could avoid the DEAD LOCK mentioned above.
Ⅱ. Does this pull request fix one issue?
Ⅲ. Describe how you did it
Ⅳ. Describe how to verify it
Ⅴ. Special notes for reviews