-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Controller freezes and kube fails to restart based on liveness. #167
Comments
we've encountered this and noticed that when it happens, the logs have not been updated. what if you changed |
thats a pretty good idea. we dont currently have a log file set up, but i imagine it wouldnt be too hard and should be an appropriate way to catch this error path. If it is still freezing after that change, we know there is something else wrong with kube that is hanging us up and stoping kube from checking our liveness. |
Describe the bug
there are rare instances where we have seen our controller stop doing any work and there are no logs for several days, but kube hasnt restarted the pod. We are not quite sure why the liveness check dont cause our pod to be restarted, but have 1 theory that maybe we are touching the liveness file but not actually getting events from kube for some unknown reason.
ref:
To Reproduce
its intermittently rare and hard to reproduce
Expected behavior
kube should restart us based on our liveness or we should restart ourselves.
Possible Solution
we know that our watch gets recreated on an interval defined by timeoutSeconds, so we should track when we start watching in watchman.js, and if we havent recieved any data or the connection hasnt closed in (timeoutSeconds + a few buffer minutes), then we should either restart the watch within the code or just exit the process and get restarted in a new container.
The text was updated successfully, but these errors were encountered: