Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libvirt installer - openshift-console pods never start, installation fails #1443

Closed
abradshaw opened this issue Mar 20, 2019 · 8 comments
Closed
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. platform/libvirt

Comments

@abradshaw
Copy link

abradshaw commented Mar 20, 2019

Version

$ openshift-install version
bin/openshift-install unreleased-master-583-gee26337a50fc67db480292a07e0a6b0fc8ea17fa
built from commit ee26337a50fc67db480292a07e0a6b0fc8ea17fa

Platform (aws|libvirt):

libvirt

What happened?

Compiled the go installer with the flag for libvirt, installed following the instructions. It seems to get nearly all the way there, it removes the bootstrap node and leaves one master and one worker but the final part of the install times out ...

bin/openshift-install create cluster

? Platform libvirt
? Libvirt Connection URI qemu+tcp://192.168.122.1/system
? Base Domain adrians.laptop
? Cluster Name lab
? Pull Secret [? for help] ******************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************
INFO Fetching OS image: rhcos-maipo-400.7.20190306.0-qemu.qcow2.gz
INFO Creating infrastructure resources...
INFO Waiting up to 30m0s for the Kubernetes API at https://api.lab.adrians.laptop:6443...
INFO API v1.12.4+915ac9d up
INFO Waiting up to 30m0s for the bootstrap-complete event...
INFO Destroying the bootstrap resources...
INFO Waiting up to 30m0s for the cluster at https://api.lab.adrians.laptop:6443 to initialize...
FATAL failed to initialize the cluster: Could not update servicemonitor "openshift-kube-scheduler-operator/kube-scheduler-operator" (303 of 310): the server does not recognize this resource, check extension API servers

The location it complains about (https://api.lab.adrians.laptop:6443) is totally available in my webbrowser

What you expected to happen?

To have a working cluster

How to reproduce it (as minimally and precisely as possible)?

$ # bin/openshift-install create cluster
? Platform libvirt
? Libvirt Connection URI qemu+tcp://192.168.122.1/system
? Base Domain adrians.laptop
? Cluster Name lab
? Pull Secret 

Anything else we need to know?

While the installer does error out and no web UI is available I was able to run the kubectl command to list pods - looks like its my opensshift console pods that have the issue
(see attachment)

References

Maybe #1397 is related

oc-get-all-pods.txt

@abradshaw
Copy link
Author

I also see that only 1 out of 2 routers are running

oc get events -n openshift-ingress
LAST SEEN   TYPE      REASON             OBJECT                                 MESSAGE
3m44s       Warning   FailedScheduling   pod/router-default-77949cf64d-kpfqp    0/2 nodes are available: 1 node(s) didn't have free ports for the requested pod ports, 1 node(s) didn't match node selector.
165m        Warning   FailedScheduling   pod/router-default-77949cf64d-kpfqp    0/2 nodes are available: 1 node(s) didn't match node selector, 1 node(s) had taints that the pod didn't tolerate.
175m        Warning   FailedScheduling   pod/router-default-77949cf64d-npt7n    0/1 nodes are available: 1 node(s) didn't match node selector.
173m        Normal    Scheduled          pod/router-default-77949cf64d-npt7n    Successfully assigned openshift-ingress/router-default-77949cf64d-npt7n to lab-dst4j-worker-0-9d65g
166m        Normal    Pulling            pod/router-default-77949cf64d-npt7n    Pulling image "registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-03-20-081027@sha256:43ad3327ac1aec2c5c33eaa3bc4517ca475999b2c882c366b08e5716b8d15c59"
165m        Normal    Pulled             pod/router-default-77949cf64d-npt7n    Successfully pulled image "registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-03-20-081027@sha256:43ad3327ac1aec2c5c33eaa3bc4517ca475999b2c882c366b08e5716b8d15c59"
165m        Warning   Failed             pod/router-default-77949cf64d-npt7n    Error: object "openshift-ingress"/"router-stats-default" not registered
175m        Warning   FailedScheduling   pod/router-default-77949cf64d-tt4sw    0/1 nodes are available: 1 node(s) didn't match node selector.
170m        Warning   FailedScheduling   pod/router-default-77949cf64d-tt4sw    0/2 nodes are available: 1 node(s) didn't have free ports for the requested pod ports, 1 node(s) didn't match node selector.
165m        Warning   FailedScheduling   pod/router-default-77949cf64d-tt4sw    0/2 nodes are available: 1 node(s) didn't match node selector, 1 node(s) had taints that the pod didn't tolerate.
163m        Normal    Pulled             pod/router-default-77949cf64d-tt4sw    Container image "registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-03-20-081027@sha256:43ad3327ac1aec2c5c33eaa3bc4517ca475999b2c882c366b08e5716b8d15c59" already present on machine
163m        Normal    Created            pod/router-default-77949cf64d-tt4sw    Created container router
163m        Normal    Started            pod/router-default-77949cf64d-tt4sw    Started container router
168m        Normal    SuccessfulCreate   replicaset/router-default-77949cf64d   Created pod: router-default-77949cf64d-kpfqp

@kxr
Copy link

kxr commented Mar 29, 2019

Having the same exact problem.

@chrisu001
Copy link

the solution was quite easy in my case:
openshift-install 086a885
built from commit 086a885

the current ingress deployment is configured with 2 replica pods of the router

$ oc get deployment.apps/router-default -n openshift-ingress -o yaml | grep  -A6 -B2 replica
spec:
  progressDeadlineSeconds: 600
  replicas: 2
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      ingress.operator.openshift.io/ingress-controller-deployment: default
  strategy:
...

so you have to spin up at least two compute nodes (aka worker) to fullfill the requirements

$ head install-config.yaml 
apiVersion: v1beta4
baseDomain: testcluster.testdomain
compute:
  - name: worker
    platform: {}
    replicas: 3
controlPlane:
  name: master

@ghost
Copy link

ghost commented Apr 19, 2019

@zeenix
Copy link
Contributor

zeenix commented Jun 12, 2019

/label platform/libvirt

@zeenix
Copy link
Contributor

zeenix commented Jun 17, 2019

I'm guessing this is no longer reproducible.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 17, 2019
@zeenix
Copy link
Contributor

zeenix commented Jun 17, 2019

The workaroundable console issue is tracked in #1007.

/close

@openshift-ci-robot
Copy link
Contributor

@zeenix: Closing this issue.

In response to this:

The workaroundable console issue is tracked in #1007.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. platform/libvirt
Projects
None yet
Development

No branches or pull requests

5 participants