Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pod stay in Error status after node shutdown #2365

Closed
yylt opened this issue Aug 30, 2023 · 1 comment · Fixed by #2361
Closed

pod stay in Error status after node shutdown #2365

yylt opened this issue Aug 30, 2023 · 1 comment · Fixed by #2361
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@yylt
Copy link
Contributor

yylt commented Aug 30, 2023

What happened:

the admission webhook pod stop by kubelet, it always exit with 255 code, and will make error containerStatus
When node shutdown, pod which running on the shutdownd node will by kill by kubelet, and container is stopped and not removed, the pod will be orphan, when node restart, pod is in error always unless delete by kubectl command.

the error pod will be left.

$ kubectl get po |grep gateway-api
gateway-api-admission-server-5d555f679-dm27b    1/1     Running     0             32s   10.232.2.177   node-9    <none>           <none>
gateway-api-admission-server-6ccfb998f9-sj7z8   0/1     Error       1             18h   10.232.2.116   node-9    <none>           <none>

the container status is

  containerStatuses:
  - containerID: containerd://cd80a5c87756a408a7232b8370881e021bd7a40ded699abd36e2c62647fa343f
    image: localhost/library/gateway-api-admission-server:v0.7
    lastState: {}
    name: kiali
    ready: false
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: containerd://cd80a5c87756a408a7232b8370881e021bd7a40ded699abd36e2c62647fa343f
        exitCode: 255
        finishedAt: "2023-08-24T12:21:57Z"
        reason: Error
        startedAt: "2023-08-22T06:41:16Z"

Try to delete pod by kubectl command, and listen event

$ kubectl delete po gateway-api-admission-server-6ccfb998f9-sj7z8
pod "gateway-api-admission-server-6ccfb998f9-sj7z8" deleted

# ctr event
2023-08-29 03:34:08.896318113 +0000 UTC k8s.io /tasks/exit {"container_id":"29e3311c6448aa99fea57344d228d2cd036dadf001d879609956bd52e2cd26fa","id":"29e3311c6448aa99fea57344d228d2cd036dadf001d879609956bd52e2cd26fa","pid":23798,"exit_status":255,"exited_at":"2023-08-29T03:34:08.896194993Z"}
2023-08-29 03:34:08.918506354 +0000 UTC k8s.io /tasks/delete {"container_id":"29e3311c6448aa99fea57344d228d2cd036dadf001d879609956bd52e2cd26fa","pid":23798,"exit_status":255,"exited_at":"2023-08-29T03:34:08.896194993Z"}
the log is here

$ kubectl logs gateway-api-admission-server-6ccfb998f9-r4rfb -f
gateway-api-admission-webhook version: v0.7.1 (ab03a594e7db13b8d2579929b204d8d10990fd2b)
I0828 03:16:37.519872       1 main.go:90] admission webhook server started and listening on :8443
I0828 03:34:08.893252       1 main.go:97] admission webhook received kill signal
F0828 03:34:08.893547       1 main.go:87] admission-webhook-server stopped: http: Server closed

when process exit, tt should print with error level, not fatal level,

What you expected to happen:

admission-server exit code with 0

How to reproduce it (as minimally and precisely as possible):

1 deploy admission-server pod
2 ctr event watch event on the node which has admission-server pod
3 kubectl delete pod which is admission-server
4 will print exit_status by ctr event and find exit_status is not 0 or null.

Anything else we need to know?:

ctr command can be downloaded in https://github.com/containerd/containerd/releases/tag/v1.7.5

@yylt yylt added the kind/bug Categorizes issue or PR as related to a bug. label Aug 30, 2023
@shaneutt shaneutt added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Aug 30, 2023
@youngnick
Copy link
Contributor

Thanks for this issue @yylt, as we're going to be deprecating the webhook (as seen in #2319), I'm not sure if we should fix this?

I think maybe a better way to say this is "PRs welcomed, but this won't block anything going forward".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

3 participants