-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NGF Pod cannot recover if NGINX master process fails without cleaning up #1108
Comments
Another way to reproduce a similar error (majority of the time) is to
|
Is this being worked on? Would it help if you'd take the fix from #1532 into the helm chart temporarily, with a flag in the values.yaml, until it's resolved in a better way? |
Hey @AlexEndris, we actually haven't prioritized this because the only way we could cause it to happen is to kill the nginx process in the pod, which would be a highly unusual case. I assume you are running into this issue yourself? Can you describe under what circumstances this occurs for you? |
Yes. But I realise it might be a sort of edge case scenario that might not even happen. Essentially, we package a small k8s cluster using k3s. We shut it down and ship it. Upon restarting, nginx doesn't recover and we would need to manually kill the pod/restart the deployment to get it running again. The issue is, I don't have access to that, when it's shipped, and they want an out of the box experience. |
Thanks for the details @AlexEndris. I added this issue to our community triage meeting agenda scheduled for Monday. We will discuss it then. If you'd like to join, the meeting info is here. |
@AlexEndris We discussed this during our community meeting and we think we can take a look at it in our next release. We'd like to first look at the fix in #1532 to see if we can solve the problem in the code, which shouldn't be too bad. Once we do fix it, you can pull the edge release so you can get the fix before we do another full release if you're looking for something soon. Thanks for letting us know! |
Thank you very much! It's highly appreciated! |
When completed, should remove the |
Describe the bug
When the NGINX master process fails without cleaning up (
kill -9 <nginx-master-pid>
), the NGF Pod cannot recover because the new NGINX container cannot start.To Reproduce
Steps to reproduce the behavior:
runAsNonRoot
fromtrue
tofalse
indeploy/manifests/nginx-gateway.yaml
kubectl debug -it -n nginx-gateway <NGF_POD> --image=busybox:1.28 --target=nginx-gateway
kill -9 <nginx-master-PID>
in the ephemeral containerkubectl logs -f -n nginx-gateway <NGF_POD> -c nginx
Expected behavior
The NGINX container should restart and the NGF Pod should recover.
Your environment
"version":"edge","commit":"72b6c6ef8915c697626eeab88fdb6a3ce15b8da0"
Additional context
Log file of nginx container showing error:
The text was updated successfully, but these errors were encountered: