-
Notifications
You must be signed in to change notification settings - Fork 450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Statefulset deployment leaves zombie lsp #4738
Comments
Hey @patriziobassi, I'm here to help with any bugs, questions, or contributions you have. Let's work together to tackle this issue! To address the issue of dead logical switch ports (LSPs) when redeploying or restarting a StatefulSet in Kube-OVN version 1.12.6, you can use the
Integrating this function into your controller logic will help ensure that dead LSPs are cleaned up automatically, preventing issues like pods staying in a pending state due to leftover LSPs [1][2]. |
Hi i'm wondering if #4304 is pretty the same. |
Hi, @patriziobassi what did you mean by |
i meant just a normal "apt upgrade" of the machine in order to patch the kernel, and reboot. i'm not sure what's the root cause at the moment, but we are experiencing this several times. I wonder if a sort of garbage collector of dead lsp may be implemented. in the issue 4304 you mention a controller restart. when we do rolling reboot of the nodes the situation may be pretty similar. |
It seems to be the same problem; if the kube-ovn-controller stops during a Pod recreate, this issue will occur. |
yes i agree and i think there are two cases:
we experimented both, fix was always to manually find and delete the orphaned lsp |
@patriziobassi do you have any detailed log and error messages. The reason might be kube-ovn-controller doesn't gc lsps when starts. We meet long time gc (over half an hour) in a large scale cluster incident, no workload can go into running status because kube-ovn-controller is busy deleting zombie lsps, so we move the gc after start up. We only meet this conflict issue for static IP pods with different name. For random allocated IP and statefulset the IP CRs should have enough information to avoid conflict. I am afraid there are unknown reasons. |
Kube-OVN Version
1.12.6
Kubernetes Version
1.30
Operation-system/Kernel Version
Ubuntu 22.04
Description
Hi,
when deploying some statefulset (at least it is happened several time with statefulset, maybe even with other resources) it is pretty easy to have dead lsps when trying to redeploy.
this happens when for instance you create/delete/recreate the same statefulset or you create a stetefulset, you do a rolling update of the worker node (for instance in order to patch) .
when trying to redeploy or re-start the statefulset, the pod stays in pending status and "describe" shows error 500.
with
#kubectl ko nbctl show | grep namespac> i can get all the pods of the affected namesapce and then with lsp-del command i can manually cleanup, for instance it happended with Harbor helm deploy and redeploy.
kubectl ko nbctl lsp-del harbor-database-0.harbor-prod
kubectl ko nbctl lsp-del harbor-redis-0.harbor-prod
Thank you
Steps To Reproduce
already mentioned before.
Current Behavior
lsp leftover
Expected Behavior
lsp cleaned up
The text was updated successfully, but these errors were encountered: