-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster operator network Degraded is True with RolloutHung: DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - pod ovnkube-node-7j854 is in CrashLoopBackOff State #7265
Comments
Hi any update on this issue ? |
Error: ovs-ctl[1473]: id: 'openvswitch': no such user I was able to finish the installation by following: I had to do this on the masters and workers. I also had to overide the MCP Config Annotations on the three masters in order to finish the installation. |
hi @ArthurVardevanyan Thanks for the comment. Can you let me know the steps you followed to solve this ?The link doesnt show any fix [https://access.redhat.com/solutions/3494661] And did you face this in 4.13 ? Also my worker node didnt come up only. |
I have this problem deploying RHCOS 4.14-9.2 on libvirt. The root cause seems to be the systemd-sysusers service configuration: Aug 30 08:59:12 clust-hxhnn-master-0 systemd-sysusers[735]: Creating group 'hugetlbfs' with GID 978.
Aug 30 08:59:12 clust-hxhnn-master-0 systemd-sysusers[735]: Creating group 'openvswitch' with GID 977.
Aug 30 08:59:12 clust-hxhnn-master-0 systemd-sysusers[735]: Creating group 'unbound' with GID 976.
Aug 30 08:59:12 clust-hxhnn-master-0 systemd-sysusers[735]: Creating user 'openvswitch' (Open vSwitch Daemons) with UID 977 and GID 977.
Aug 30 08:59:12 clust-hxhnn-master-0 systemd-sysusers[735]: Creating user 'unbound' (Unbound DNS resolver) with UID 976 and GID 976.
Aug 30 08:59:12 clust-hxhnn-master-0 systemd-sysusers[735]: /etc/gshadow: Group "unbound" already exists.
Aug 30 08:59:12 clust-hxhnn-master-0 systemd[1]: systemd-sysusers.service: Main process exited, code=exited, status=1/FAILURE
Aug 30 08:59:12 clust-hxhnn-master-0 systemd[1]: systemd-sysusers.service: Failed with result 'exit-code'.
Aug 30 08:59:12 clust-hxhnn-master-0 systemd[1]: Failed to start systemd-sysusers.service - Create System Users.
There are duplicate entries in I'm guessing this is a regression from RHCOS moving to the current Fedora 38 base: both the RHCOS rpmostree and Fedora 38 openvswitch and unbound package configurations are clashing. |
I meet the same error :
I saw your comment here: #7265 (comment) |
@zhengxiaomei123 OpenShift libvirt IPI seems to be more or less unmaintained. This unfortunately isn't reflected in the documentation within this repo. I ended up installing OpenShift following the bare-metal UPI instructions inside my libvirt environment. |
Thanks very much. I will try the bare-metal UPI way too. |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
@openshift-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Version: release 4.13 branch
#Env:
Azure VM
Error from Console:
[root@sriniedgeonokdbootstrap bin]# ./openshift-install create cluster
? Platform azure
INFO Credentials loaded from file "/root/.azure/osServicePrincipal.json"
? Region westeurope
? Base Domain edgeaiokd.clusters.openshiftcorp.com
? Cluster Name edgeclusterokd
? Pull Secret [? for help] ***************************************************************************************************************************************************************************************INFO Creating infrastructure resources...
INFO Waiting up to 20m0s (until 1:52AM) for the Kubernetes API at https://api.edgeclusterokd.edgeaiokd.clusters.openshiftcorp.com:6443...
INFO API v1.26.4-2835+7d221229dc9796-dirty up
INFO Waiting up to 30m0s (until 2:09AM) for bootstrapping to complete...
INFO Pulling VM console logs
INFO Pulling debug logs from the bootstrap machine
INFO Cluster operator cloud-controller-manager TrustedCABundleControllerControllerAvailable is True with AsExpected: Trusted CA Bundle Controller works as expected
INFO Cluster operator cloud-controller-manager TrustedCABundleControllerControllerDegraded is False with AsExpected: Trusted CA Bundle Controller works as expected
INFO Cluster operator cloud-controller-manager CloudConfigControllerAvailable is True with AsExpected: Cloud Config Controller works as expected
INFO Cluster operator cloud-controller-manager CloudConfigControllerDegraded is False with AsExpected: Cloud Config Controller works as expected
ERROR Cluster operator network Degraded is True with RolloutHung: DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - pod ovnkube-node-7j854 is in CrashLoopBackOff State
ERROR DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - pod ovnkube-node-kllcn is in CrashLoopBackOff State
ERROR DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - pod ovnkube-node-xw7hg is in CrashLoopBackOff State
ERROR DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - last change 2023-06-21T05:39:21Z
INFO Cluster operator network ManagementStateDegraded is False with :
INFO Cluster operator network Progressing is True with Deploying: DaemonSet "/openshift-multus/network-metrics-daemon" is waiting for other operators to become ready
INFO DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" is not available (awaiting 3 nodes)
INFO DaemonSet "/openshift-network-diagnostics/network-check-target" is waiting for other operators to become ready
INFO Deployment "/openshift-network-diagnostics/network-check-source" is waiting for other operators to become ready
INFO Deployment "/openshift-cloud-network-config-controller/cloud-network-config-controller" is waiting for other operators to become ready
INFO Deployment "/openshift-multus/multus-admission-controller" is waiting for other operators to become ready
ERROR Bootstrap failed to complete: timed out waiting for the condition
ERROR Failed to wait for bootstrapping to complete. This error usually happens when there is a problem with control plane hosts that prevents the control plane operators from creating the control plane.
WARNING The bootstrap machine is unable to resolve API and/or API-Int Server URLs
INFO Error: error while checking pod status: timed out waiting for the condition
INFO Using /opt/openshift/auth/kubeconfig as KUBECONFIG
INFO Gathering cluster resources ...
OVN Pod Logs
[root@sriniedgeonokdbootstrap auth]# kubectl logs ovnkube-node-7j854 -n openshift-ovn-kubernetes -f
Defaulted container "ovn-controller" out of: ovn-controller, ovn-acl-logging, kube-rbac-proxy, kube-rbac-proxy-ovn-metrics, ovnkube-node, drop-icmp
2023-06-21T05:39:34+00:00 - starting ovn-controller
2023-06-21T05:39:34Z|00001|vlog|INFO|opened log file /var/log/ovn/acl-audit-log.log
2023-06-21T05:39:34.525Z|00002|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting...
2023-06-21T05:39:34.525Z|00003|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connection attempt failed (No such file or directory)
2023-06-21T05:39:35.527Z|00004|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting...
2023-06-21T05:39:35.527Z|00005|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connection attempt failed (No such file or directory)
2023-06-21T05:39:35.527Z|00006|reconnect|INFO|unix:/var/run/openvswitch/db.sock: waiting 2 seconds before reconnect
2023-06-21T05:39:37.529Z|00007|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting...
2023-06-21T05:39:37.529Z|00008|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connection attempt failed (No such file or directory)
2023-06-21T05:39:37.529Z|00009|reconnect|INFO|unix:/var/run/openvswitch/db.sock: waiting 4 seconds before reconnect
2023-06-21T05:39:41.531Z|00010|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting...
2023-06-21T05:39:41.531Z|00011|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connection attempt failed (No such file or directory)
2023-06-21T05:39:41.531Z|00012|reconnect|INFO|unix:/var/run/openvswitch/db.sock: continuing to reconnect in the background but suppressing further logging
Please find the tar file for bootstrap logs
log-bundle-20230620151915.tar.gz
The text was updated successfully, but these errors were encountered: