-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node driver registrar says "Lost connection" and do nothing #139
Comments
a1e11275 Merge pull request #139 from pohly/kind-for-kubernetes-latest 1c0fb096 prow.sh: use KinD main for latest Kubernetes 1d77cfcb Merge pull request #138 from pohly/kind-update-0.10 bff2fb7e prow.sh: KinD 0.10.0 git-subtree-dir: release-tools git-subtree-split: a1e11275b5a4febd6ad21beeac730e22c579825b
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
/remove-lifecycle stale |
To restart the container automatically, #152 would help you. |
I am experiencing the same behavior. Given all 3 containers (node-driver-registrar, secrets-store and liveness-probe) are running, if secrets-store gets restarted, node-driver-registrar loses connection to unix:///csi/csi.sock. Node-driver-registrar logs: I1012 11:51:22.299487 1 main.go:164] Version: v2.3.0 Secret store logs: I1012 11:53:06.149183 1 exporter.go:33] metrics backend: prometheus Liveness probe logs: I1012 11:51:22.555987 1 main.go:149] calling CSI driver to discover driver name
That does not help, already have this:
|
This helps during the startup of node-driver-registrar if the kubelet plugin registration fails but these logs show that the kubelet plugin registration succeeded so the workaround in #152 can't help here:
@yatanasov-hs thanks for the logs, I see that liveness probe attempted to connect after node-driver-registrar logged
The error Both node-driver-registrar or livenessprobe should be able to reestablish the connection to |
I'm having same problem after upgrade okd 4.7 to 4.8. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Hi, I happened to see the same behaviour on k8s 1.24 and |
/reopen |
@mauriciopoppe: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I am also observing this on our AWS EKS |
It should help with yes. |
Thanks @gnufied for the confirmation. |
Thanks @gnufied, I wanted to expand on the effect of #322 (released in https://github.com/kubernetes-csi/node-driver-registrar/releases/tag/v2.9.0) and this issue. The possible scenario where we might see errors like #139 (comment) or #139 (comment):
Is if the #322 is adding a throughout check on the |
@mauriciopoppe sorry for late response, but the way I have seen this issue to happen is:
So #322 mainly fixes issue with stale registration sockets, assuming health check is enabled on registration socket. Now, I know that this issue has bugs linked where csi-driver socket is stale, but in current iteration of node-registrar, we only connect to csi-driver socket once on startup. We are in-fact closing the connection after retrieving driver-name and if for some reason node registrar can't connect to csi driver socket, it will simply exit on startup and restart itself. But as such - #322 does not fixes stale csi driver sockets issues. |
Log:
After restart container, it works.
The text was updated successfully, but these errors were encountered: