libvirt installer - openshift-console pods never start, installation fails #1443

abradshaw · 2019-03-20T13:35:13Z

Version

$ openshift-install version
bin/openshift-install unreleased-master-583-gee26337a50fc67db480292a07e0a6b0fc8ea17fa
built from commit ee26337a50fc67db480292a07e0a6b0fc8ea17fa

Platform (aws|libvirt):

libvirt

What happened?

Compiled the go installer with the flag for libvirt, installed following the instructions. It seems to get nearly all the way there, it removes the bootstrap node and leaves one master and one worker but the final part of the install times out ...

bin/openshift-install create cluster

? Platform libvirt
? Libvirt Connection URI qemu+tcp://192.168.122.1/system
? Base Domain adrians.laptop
? Cluster Name lab
? Pull Secret [? for help] ******************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************
INFO Fetching OS image: rhcos-maipo-400.7.20190306.0-qemu.qcow2.gz
INFO Creating infrastructure resources...
INFO Waiting up to 30m0s for the Kubernetes API at https://api.lab.adrians.laptop:6443...
INFO API v1.12.4+915ac9d up
INFO Waiting up to 30m0s for the bootstrap-complete event...
INFO Destroying the bootstrap resources...
INFO Waiting up to 30m0s for the cluster at https://api.lab.adrians.laptop:6443 to initialize...
FATAL failed to initialize the cluster: Could not update servicemonitor "openshift-kube-scheduler-operator/kube-scheduler-operator" (303 of 310): the server does not recognize this resource, check extension API servers

The location it complains about (https://api.lab.adrians.laptop:6443) is totally available in my webbrowser

What you expected to happen?

To have a working cluster

How to reproduce it (as minimally and precisely as possible)?

$ # bin/openshift-install create cluster
? Platform libvirt
? Libvirt Connection URI qemu+tcp://192.168.122.1/system
? Base Domain adrians.laptop
? Cluster Name lab
? Pull Secret

Anything else we need to know?

While the installer does error out and no web UI is available I was able to run the kubectl command to list pods - looks like its my opensshift console pods that have the issue
(see attachment)

References

Maybe #1397 is related

oc-get-all-pods.txt

abradshaw · 2019-03-20T13:40:32Z

I also see that only 1 out of 2 routers are running

oc get events -n openshift-ingress
LAST SEEN   TYPE      REASON             OBJECT                                 MESSAGE
3m44s       Warning   FailedScheduling   pod/router-default-77949cf64d-kpfqp    0/2 nodes are available: 1 node(s) didn't have free ports for the requested pod ports, 1 node(s) didn't match node selector.
165m        Warning   FailedScheduling   pod/router-default-77949cf64d-kpfqp    0/2 nodes are available: 1 node(s) didn't match node selector, 1 node(s) had taints that the pod didn't tolerate.
175m        Warning   FailedScheduling   pod/router-default-77949cf64d-npt7n    0/1 nodes are available: 1 node(s) didn't match node selector.
173m        Normal    Scheduled          pod/router-default-77949cf64d-npt7n    Successfully assigned openshift-ingress/router-default-77949cf64d-npt7n to lab-dst4j-worker-0-9d65g
166m        Normal    Pulling            pod/router-default-77949cf64d-npt7n    Pulling image "registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-03-20-081027@sha256:43ad3327ac1aec2c5c33eaa3bc4517ca475999b2c882c366b08e5716b8d15c59"
165m        Normal    Pulled             pod/router-default-77949cf64d-npt7n    Successfully pulled image "registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-03-20-081027@sha256:43ad3327ac1aec2c5c33eaa3bc4517ca475999b2c882c366b08e5716b8d15c59"
165m        Warning   Failed             pod/router-default-77949cf64d-npt7n    Error: object "openshift-ingress"/"router-stats-default" not registered
175m        Warning   FailedScheduling   pod/router-default-77949cf64d-tt4sw    0/1 nodes are available: 1 node(s) didn't match node selector.
170m        Warning   FailedScheduling   pod/router-default-77949cf64d-tt4sw    0/2 nodes are available: 1 node(s) didn't have free ports for the requested pod ports, 1 node(s) didn't match node selector.
165m        Warning   FailedScheduling   pod/router-default-77949cf64d-tt4sw    0/2 nodes are available: 1 node(s) didn't match node selector, 1 node(s) had taints that the pod didn't tolerate.
163m        Normal    Pulled             pod/router-default-77949cf64d-tt4sw    Container image "registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-03-20-081027@sha256:43ad3327ac1aec2c5c33eaa3bc4517ca475999b2c882c366b08e5716b8d15c59" already present on machine
163m        Normal    Created            pod/router-default-77949cf64d-tt4sw    Created container router
163m        Normal    Started            pod/router-default-77949cf64d-tt4sw    Started container router
168m        Normal    SuccessfulCreate   replicaset/router-default-77949cf64d   Created pod: router-default-77949cf64d-kpfqp

kxr · 2019-03-29T23:39:54Z

Having the same exact problem.

chrisu001 · 2019-04-03T19:20:03Z

the solution was quite easy in my case:
openshift-install 086a885
built from commit 086a885

the current ingress deployment is configured with 2 replica pods of the router

$ oc get deployment.apps/router-default -n openshift-ingress -o yaml | grep  -A6 -B2 replica
spec:
  progressDeadlineSeconds: 600
  replicas: 2
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      ingress.operator.openshift.io/ingress-controller-deployment: default
  strategy:
...

so you have to spin up at least two compute nodes (aka worker) to fullfill the requirements

$ head install-config.yaml 
apiVersion: v1beta4
baseDomain: testcluster.testdomain
compute:
  - name: worker
    platform: {}
    replicas: 3
controlPlane:
  name: master

ghost · 2019-04-19T07:35:20Z

@abradshaw,

use bind instead of dnsmasq [libvirt] Problem with openshift-console <POD> #1397
use two worker [libvirt] router-default 0/1 - forever "pending status" #1395

Regards,
Fabio Sbano

zeenix · 2019-06-12T17:29:57Z

/label platform/libvirt

zeenix · 2019-06-17T15:05:40Z

I'm guessing this is no longer reproducible.

/lifecycle stale

zeenix · 2019-06-17T15:07:30Z

The workaroundable console issue is tracked in #1007.

/close

openshift-ci-robot · 2019-06-17T15:07:35Z

@zeenix: Closing this issue.

In response to this:

The workaroundable console issue is tracked in #1007.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot added the platform/libvirt label Jun 12, 2019

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 17, 2019

openshift-ci-robot closed this as completed Jun 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

libvirt installer - openshift-console pods never start, installation fails #1443

libvirt installer - openshift-console pods never start, installation fails #1443

abradshaw commented Mar 20, 2019 •

edited

Loading

abradshaw commented Mar 20, 2019

kxr commented Mar 29, 2019

chrisu001 commented Apr 3, 2019

ghost commented Apr 19, 2019 •

edited by ghost

Loading

zeenix commented Jun 12, 2019

zeenix commented Jun 17, 2019

zeenix commented Jun 17, 2019

openshift-ci-robot commented Jun 17, 2019

libvirt installer - openshift-console pods never start, installation fails #1443

libvirt installer - openshift-console pods never start, installation fails #1443

Comments

abradshaw commented Mar 20, 2019 • edited Loading

Version

Platform (aws|libvirt):

What happened?

bin/openshift-install create cluster

What you expected to happen?

How to reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

References

abradshaw commented Mar 20, 2019

kxr commented Mar 29, 2019

chrisu001 commented Apr 3, 2019

ghost commented Apr 19, 2019 • edited by ghost Loading

zeenix commented Jun 12, 2019

zeenix commented Jun 17, 2019

zeenix commented Jun 17, 2019

openshift-ci-robot commented Jun 17, 2019

abradshaw commented Mar 20, 2019 •

edited

Loading

ghost commented Apr 19, 2019 •

edited by ghost

Loading