Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openshift-ingress stuck in 'Pending' with "won't update DNS record for load balancer service openshift-ingress/router-default because status contains no ingresses" #973

Closed
thomasmckay opened this issue Dec 27, 2018 · 10 comments

Comments

@thomasmckay
Copy link

Version

$ openshift-install version
bin/openshift-install v0.8.0-master-2-g5e7b36d6351c9cc773f1dadc64abf9d7041151b1-dirty

Platform (aws|libvirt|openstack):

libvirt

What happened?

cluster create seemed to work (oc login works, lots of running and completed pods). The openshift-ingress pod is stuck in pending. the openshift-ingress-operator is running. deleting the openshift-ingress pod results in success and then new openshift-ingress continues in pending.

$ oc --config=auth/kubeconfig --namespace=openshift-ingress-operator logs "ingress-operator-694bd9bf8d-8j6wj"
...
time="2018-12-27T15:46:54Z" level=info msg="reconciling clusteringress v1alpha1.ClusterIngress{TypeMeta:v1.TypeMeta{Kind:\"ClusterIngress\", APIVersion:\"ingress.openshift.io/v1alpha1\"}, ObjectMeta:v1.ObjectMeta{Name:\"default\", GenerateName:\"\", Namespace:\"openshift-ingress-operator\", SelfLink:\"/apis/ingress.openshift.io/v1alpha1/namespaces/openshift-ingress-operator/clusteringresses/default\", UID:\"ce6d384b-09e4-11e9-94ac-52fdfc072182\", ResourceVersion:\"9848\", Generation:1, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63681518193, loc:(*time.Location)(0x1d9f4c0)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string{\"ingress.openshift.io/default-cluster-ingress\"}, ClusterName:\"\"}, Spec:v1alpha1.ClusterIngressSpec{IngressDomain:(*string)(0xc42076c810), NodePlacement:(*v1alpha1.NodePlacement)(0xc42077a9a0), DefaultCertificateSecret:(*string)(nil), NamespaceSelector:(*v1.LabelSelector)(nil), RouteSelector:(*v1.LabelSelector)(nil), HighAvailability:(*v1alpha1.ClusterIngressHighAvailability)(0xc42076c800), UnsupportedExtensions:(*[]string)(nil)}, Status:v1alpha1.ClusterIngressStatus{}}"
time="2018-12-27T15:46:54Z" level=info msg="won't update DNS record for load balancer service openshift-ingress/router-default because status contains no ingresses"

What you expected to happen?

openshift-ingress not to be stuck in Pending

@thomasmckay
Copy link
Author

I see this note in libvirt docs so perhaps the openshift-installer needs a change on libvirt: https://github.com/openshift/installer/blob/master/docs/dev/libvirt-howto.md#libvirt-vs-aws

"There isn't a load balancer on libvirt"

@wking
Copy link
Member

wking commented Jan 2, 2019

What does the API say about why the pod is stuck in pending?

@thomasmckay
Copy link
Author

$ oc describe -n openshift-ingress pod/router-default-6b779fb468-r7tpc
Name:           router-default-6b779fb468-r7tpc
Namespace:      openshift-ingress
Node:           <none>
Labels:         app=router
                pod-template-hash=2633596024
                router=router-default
Annotations:    <none>
Status:         Pending
IP:             
Controlled By:  ReplicaSet/router-default-6b779fb468
Containers:
  router:
    Image:      registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-01-01-225811@sha256:7a32d6d2d8477afab8cea56ee629b777a6c6bf54bd861b326b2e1861a616bd8c
    Ports:      80/TCP, 443/TCP, 1936/TCP
    Liveness:   http-get http://:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get http://:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment:
      STATS_PORT:                 1936
      ROUTER_SERVICE_NAMESPACE:   openshift-ingress
      DEFAULT_CERTIFICATE_DIR:    /etc/pki/tls/private
      ROUTER_SERVICE_NAME:        default
      ROUTER_CANONICAL_HOSTNAME:  apps.quay.tt.testing
    Mounts:
      /etc/pki/tls/private from default-certificate (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from router-token-tr4mh (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  default-certificate:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  router-certs-default
    Optional:    false
  router-token-tr4mh:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  router-token-tr4mh
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  node-role.kubernetes.io/worker=
Tolerations:     <none>
Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  22s (x2747 over 7h)  default-scheduler  0/1 nodes are available: 1 node(s) didn't match node selector.

@wking
Copy link
Member

wking commented Jan 5, 2019

0/1 nodes are available: 1 node(s) didn't match node selector

This sounds like "you don't have any worker nodes" to me. Try working through this.

@thomasmckay
Copy link
Author

I'll work through that and follow up here but there was some talk on slack that "worker nodes come later" as if it was normal not to have a worker. I have never seen a worker VM created in any of my runs. The bootstrap and master VMs come up immediately, then bootstrap completes and is removed leaving just the master.

@wking
Copy link
Member

wking commented Jan 5, 2019

The bootstrap and master VMs come up immediately, then bootstrap completes and is removed leaving just the master.

I never get workers on libvirt because I haven't worked around this or openshift/cluster-api-provider-libvirt#45 for my non-standard default pool location. Nevertheless, the cluster comes up fine, the bootstrap node gets torn down, etc. There are a handful of pods that aren't scheduled because they have the worker selector like you have, and obviously the functionality provided by those pods will be missing. My understanding is that we're moving towards having everything installed by the installer tolerate master nodes, which would make missing workers even less of an issue, but we're not there yet. You can probably file issues with any repositories that don't tolerate masters; I dunno if anyone's gotten around to that yet.

@thomasmckay
Copy link
Author

For the record, I also get an selinux error, if relevant.

SELinux is preventing qemu-system-x86 from getattr access on the file /home/images/quay-bootstrap.ign.

*****  Plugin restorecon (99.5 confidence) suggests   ************************

If you want to fix the label. 
/home/images/quay-bootstrap.ign default label should be user_home_t.
Then you can run restorecon. The access attempt may have been stopped due to insufficient permissions to access a parent directory in which case try to change the following command accordingly.
Do
# /sbin/restorecon -v /home/images/quay-bootstrap.ign

*****  Plugin catchall (1.49 confidence) suggests   **************************

If you believe that qemu-system-x86 should be allowed getattr access on the quay-bootstrap.ign file by default.
Then you should report this as a bug.
You can generate a local policy module to allow this access.
Do
allow this access for now by executing:
# ausearch -c 'qemu-system-x86' --raw | audit2allow -M my-qemusystemx86
# semodule -X 300 -i my-qemusystemx86.pp

Additional Information:
Source Context                system_u:system_r:svirt_t:s0:c58,c725
Target Context                system_u:object_r:home_root_t:s0
Target Objects                /home/images/quay-bootstrap.ign [ file ]
Source                        qemu-system-x86
Source Path                   qemu-system-x86
Port                          <Unknown>
Host                          thomasmckay-desktop.usersys.redhat.com
Source RPM Packages           
Target RPM Packages           
Policy RPM                    selinux-policy-3.14.1-48.fc28.noarch
Selinux Enabled               True
Policy Type                   targeted
Enforcing Mode                Permissive
Host Name                     thomasmckay-desktop.usersys.redhat.com
Platform                      Linux thomasmckay-desktop.usersys.redhat.com
                              4.18.18-200.fc28.x86_64 #1 SMP Mon Nov 12 03:17:32

@zeenix
Copy link
Contributor

zeenix commented Jun 24, 2019

@thomasmckay Is this still an issue?

@zeenix
Copy link
Contributor

zeenix commented Jun 28, 2019

Assuming not reproducible anymore. @thomasmckay please reopen if that's not the case. Thanks.

/close

@openshift-ci-robot
Copy link
Contributor

@zeenix: Closing this issue.

In response to this:

Assuming not reproducible anymore. @thomasmckay please reopen if that's not the case. Thanks.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants