openshift-ingress stuck in 'Pending' with "won't update DNS record for load balancer service openshift-ingress/router-default because status contains no ingresses" #973

thomasmckay · 2018-12-27T15:53:36Z

Version

$ openshift-install version
bin/openshift-install v0.8.0-master-2-g5e7b36d6351c9cc773f1dadc64abf9d7041151b1-dirty

Platform (aws|libvirt|openstack):

libvirt

What happened?

cluster create seemed to work (oc login works, lots of running and completed pods). The openshift-ingress pod is stuck in pending. the openshift-ingress-operator is running. deleting the openshift-ingress pod results in success and then new openshift-ingress continues in pending.

$ oc --config=auth/kubeconfig --namespace=openshift-ingress-operator logs "ingress-operator-694bd9bf8d-8j6wj"
...
time="2018-12-27T15:46:54Z" level=info msg="reconciling clusteringress v1alpha1.ClusterIngress{TypeMeta:v1.TypeMeta{Kind:\"ClusterIngress\", APIVersion:\"ingress.openshift.io/v1alpha1\"}, ObjectMeta:v1.ObjectMeta{Name:\"default\", GenerateName:\"\", Namespace:\"openshift-ingress-operator\", SelfLink:\"/apis/ingress.openshift.io/v1alpha1/namespaces/openshift-ingress-operator/clusteringresses/default\", UID:\"ce6d384b-09e4-11e9-94ac-52fdfc072182\", ResourceVersion:\"9848\", Generation:1, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63681518193, loc:(*time.Location)(0x1d9f4c0)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string{\"ingress.openshift.io/default-cluster-ingress\"}, ClusterName:\"\"}, Spec:v1alpha1.ClusterIngressSpec{IngressDomain:(*string)(0xc42076c810), NodePlacement:(*v1alpha1.NodePlacement)(0xc42077a9a0), DefaultCertificateSecret:(*string)(nil), NamespaceSelector:(*v1.LabelSelector)(nil), RouteSelector:(*v1.LabelSelector)(nil), HighAvailability:(*v1alpha1.ClusterIngressHighAvailability)(0xc42076c800), UnsupportedExtensions:(*[]string)(nil)}, Status:v1alpha1.ClusterIngressStatus{}}"
time="2018-12-27T15:46:54Z" level=info msg="won't update DNS record for load balancer service openshift-ingress/router-default because status contains no ingresses"

What you expected to happen?

openshift-ingress not to be stuck in Pending

The text was updated successfully, but these errors were encountered:

thomasmckay · 2018-12-27T18:05:15Z

I see this note in libvirt docs so perhaps the openshift-installer needs a change on libvirt: https://github.com/openshift/installer/blob/master/docs/dev/libvirt-howto.md#libvirt-vs-aws

"There isn't a load balancer on libvirt"

wking · 2019-01-02T21:00:48Z

What does the API say about why the pod is stuck in pending?

thomasmckay · 2019-01-02T21:13:04Z

$ oc describe -n openshift-ingress pod/router-default-6b779fb468-r7tpc
Name:           router-default-6b779fb468-r7tpc
Namespace:      openshift-ingress
Node:           <none>
Labels:         app=router
                pod-template-hash=2633596024
                router=router-default
Annotations:    <none>
Status:         Pending
IP:             
Controlled By:  ReplicaSet/router-default-6b779fb468
Containers:
  router:
    Image:      registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-01-01-225811@sha256:7a32d6d2d8477afab8cea56ee629b777a6c6bf54bd861b326b2e1861a616bd8c
    Ports:      80/TCP, 443/TCP, 1936/TCP
    Liveness:   http-get http://:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get http://:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment:
      STATS_PORT:                 1936
      ROUTER_SERVICE_NAMESPACE:   openshift-ingress
      DEFAULT_CERTIFICATE_DIR:    /etc/pki/tls/private
      ROUTER_SERVICE_NAME:        default
      ROUTER_CANONICAL_HOSTNAME:  apps.quay.tt.testing
    Mounts:
      /etc/pki/tls/private from default-certificate (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from router-token-tr4mh (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  default-certificate:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  router-certs-default
    Optional:    false
  router-token-tr4mh:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  router-token-tr4mh
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  node-role.kubernetes.io/worker=
Tolerations:     <none>
Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  22s (x2747 over 7h)  default-scheduler  0/1 nodes are available: 1 node(s) didn't match node selector.

wking · 2019-01-05T00:05:49Z

0/1 nodes are available: 1 node(s) didn't match node selector

This sounds like "you don't have any worker nodes" to me. Try working through this.

thomasmckay · 2019-01-05T04:35:32Z

I'll work through that and follow up here but there was some talk on slack that "worker nodes come later" as if it was normal not to have a worker. I have never seen a worker VM created in any of my runs. The bootstrap and master VMs come up immediately, then bootstrap completes and is removed leaving just the master.

wking · 2019-01-05T04:42:58Z

The bootstrap and master VMs come up immediately, then bootstrap completes and is removed leaving just the master.

I never get workers on libvirt because I haven't worked around this or openshift/cluster-api-provider-libvirt#45 for my non-standard default pool location. Nevertheless, the cluster comes up fine, the bootstrap node gets torn down, etc. There are a handful of pods that aren't scheduled because they have the worker selector like you have, and obviously the functionality provided by those pods will be missing. My understanding is that we're moving towards having everything installed by the installer tolerate master nodes, which would make missing workers even less of an issue, but we're not there yet. You can probably file issues with any repositories that don't tolerate masters; I dunno if anyone's gotten around to that yet.

thomasmckay · 2019-01-05T13:08:20Z

For the record, I also get an selinux error, if relevant.

SELinux is preventing qemu-system-x86 from getattr access on the file /home/images/quay-bootstrap.ign.

*****  Plugin restorecon (99.5 confidence) suggests   ************************

If you want to fix the label. 
/home/images/quay-bootstrap.ign default label should be user_home_t.
Then you can run restorecon. The access attempt may have been stopped due to insufficient permissions to access a parent directory in which case try to change the following command accordingly.
Do
# /sbin/restorecon -v /home/images/quay-bootstrap.ign

*****  Plugin catchall (1.49 confidence) suggests   **************************

If you believe that qemu-system-x86 should be allowed getattr access on the quay-bootstrap.ign file by default.
Then you should report this as a bug.
You can generate a local policy module to allow this access.
Do
allow this access for now by executing:
# ausearch -c 'qemu-system-x86' --raw | audit2allow -M my-qemusystemx86
# semodule -X 300 -i my-qemusystemx86.pp

Additional Information:
Source Context                system_u:system_r:svirt_t:s0:c58,c725
Target Context                system_u:object_r:home_root_t:s0
Target Objects                /home/images/quay-bootstrap.ign [ file ]
Source                        qemu-system-x86
Source Path                   qemu-system-x86
Port                          <Unknown>
Host                          thomasmckay-desktop.usersys.redhat.com
Source RPM Packages           
Target RPM Packages           
Policy RPM                    selinux-policy-3.14.1-48.fc28.noarch
Selinux Enabled               True
Policy Type                   targeted
Enforcing Mode                Permissive
Host Name                     thomasmckay-desktop.usersys.redhat.com
Platform                      Linux thomasmckay-desktop.usersys.redhat.com
                              4.18.18-200.fc28.x86_64 #1 SMP Mon Nov 12 03:17:32

zeenix · 2019-06-24T12:29:28Z

@thomasmckay Is this still an issue?

zeenix · 2019-06-28T15:23:11Z

Assuming not reproducible anymore. @thomasmckay please reopen if that's not the case. Thanks.

/close

openshift-ci-robot · 2019-06-28T15:23:13Z

@zeenix: Closing this issue.

In response to this:

Assuming not reproducible anymore. @thomasmckay please reopen if that's not the case. Thanks.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

eparis added the platform/libvirt label Feb 20, 2019

openshift-ci-robot closed this as completed Jun 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

openshift-ingress stuck in 'Pending' with "won't update DNS record for load balancer service openshift-ingress/router-default because status contains no ingresses" #973

openshift-ingress stuck in 'Pending' with "won't update DNS record for load balancer service openshift-ingress/router-default because status contains no ingresses" #973

thomasmckay commented Dec 27, 2018

thomasmckay commented Dec 27, 2018

wking commented Jan 2, 2019

thomasmckay commented Jan 2, 2019

wking commented Jan 5, 2019

thomasmckay commented Jan 5, 2019

wking commented Jan 5, 2019

thomasmckay commented Jan 5, 2019

zeenix commented Jun 24, 2019

zeenix commented Jun 28, 2019

openshift-ci-robot commented Jun 28, 2019

openshift-ingress stuck in 'Pending' with "won't update DNS record for load balancer service openshift-ingress/router-default because status contains no ingresses" #973

openshift-ingress stuck in 'Pending' with "won't update DNS record for load balancer service openshift-ingress/router-default because status contains no ingresses" #973

Comments

thomasmckay commented Dec 27, 2018

Version

Platform (aws|libvirt|openstack):

What happened?

What you expected to happen?

thomasmckay commented Dec 27, 2018

wking commented Jan 2, 2019

thomasmckay commented Jan 2, 2019

wking commented Jan 5, 2019

thomasmckay commented Jan 5, 2019

wking commented Jan 5, 2019

thomasmckay commented Jan 5, 2019

zeenix commented Jun 24, 2019

zeenix commented Jun 28, 2019

openshift-ci-robot commented Jun 28, 2019