Controller freezes and kube fails to restart based on liveness. #167

alewitt2 · 2021-03-11T19:21:30Z

Describe the bug
there are rare instances where we have seen our controller stop doing any work and there are no logs for several days, but kube hasnt restarted the pod. We are not quite sure why the liveness check dont cause our pod to be restarted, but have 1 theory that maybe we are touching the liveness file but not actually getting events from kube for some unknown reason.

ref:

To Reproduce
its intermittently rare and hard to reproduce

Expected behavior
kube should restart us based on our liveness or we should restart ourselves.

Possible Solution
we know that our watch gets recreated on an interval defined by timeoutSeconds, so we should track when we start watching in watchman.js, and if we havent recieved any data or the connection hasnt closed in (timeoutSeconds + a few buffer minutes), then we should either restart the watch within the code or just exit the process and get restarted in a new container.

alewitt2 · 2021-03-11T19:23:02Z

razee-io/Razee#135

charlesthomas · 2021-08-04T16:37:47Z

we've encountered this and noticed that when it happens, the logs have not been updated. what if you changed sh/liveness.sh to watch a log file instead of a separate file?

alewitt2 · 2021-08-04T17:24:46Z

thats a pretty good idea. we dont currently have a log file set up, but i imagine it wouldnt be too hard and should be an appropriate way to catch this error path. If it is still freezing after that change, we know there is something else wrong with kube that is hanging us up and stoping kube from checking our liveness.

alewitt2 added the bug Something isn't working label Mar 11, 2021

alewitt2 linked a pull request Jan 24, 2022 that will close this issue

feat: add fail safe intervals to restart watches as necessary #236

Merged

alewitt2 closed this as completed in #236 Jan 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Controller freezes and kube fails to restart based on liveness. #167

Controller freezes and kube fails to restart based on liveness. #167

alewitt2 commented Mar 11, 2021 •

edited

Loading

alewitt2 commented Mar 11, 2021

charlesthomas commented Aug 4, 2021

alewitt2 commented Aug 4, 2021 •

edited

Loading

Controller freezes and kube fails to restart based on liveness. #167

Controller freezes and kube fails to restart based on liveness. #167

Comments

alewitt2 commented Mar 11, 2021 • edited Loading

alewitt2 commented Mar 11, 2021

charlesthomas commented Aug 4, 2021

alewitt2 commented Aug 4, 2021 • edited Loading

alewitt2 commented Mar 11, 2021 •

edited

Loading

alewitt2 commented Aug 4, 2021 •

edited

Loading