Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libvirt: Unable to access web console #1007

Closed
rhopp opened this issue Jan 7, 2019 · 47 comments
Closed

libvirt: Unable to access web console #1007

rhopp opened this issue Jan 7, 2019 · 47 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. platform/libvirt priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.

Comments

@rhopp
Copy link

rhopp commented Jan 7, 2019

Version

$ openshift-install version
v0.9.0-master

(compiled from master)

Platform (aws|libvirt|openstack):

libvirt

What happened?

I'm trying to install openshift 4 using this installer. It seems, that everything was OK. I've done all the steps described in here. Installation was ok, I was able to login using oc with credentials from the installation output, but I'm not able to access web console.

Looking at openshift-console project, everything seems ok:

OUTPUT
╭─rhopp@dhcp-10-40-4-106 ~/go/src/github.com/openshift/installer  ‹master*› 
╰─$ oc project openshift-console
Already on project "openshift-console" on server "https://test1-api.tt.testing:6443".
╭─rhopp@dhcp-10-40-4-106 ~/go/src/github.com/openshift/installer  ‹master*› 
╰─$ oc get all
NAME                                     READY     STATUS    RESTARTS   AGE
pod/console-operator-79b8b8cb8d-cgpfn    1/1       Running   1          1h
pod/openshift-console-6ddfcc76b5-2kmpx   1/1       Running   0          1h
pod/openshift-console-6ddfcc76b5-sp5zm   1/1       Running   0          1h
pod/openshift-console-6ddfcc76b5-z52hq   1/1       Running   0          1h

NAME              TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
service/console   ClusterIP   172.30.198.57   <none>        443/TCP   1h

NAME                                DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/console-operator    1         1         1            1           1h
deployment.apps/openshift-console   3         3         3            3           1h

NAME                                           DESIRED   CURRENT   READY     AGE
replicaset.apps/console-operator-79b8b8cb8d    1         1         1         1h
replicaset.apps/openshift-console-6ddfcc76b5   3         3         3         1h

NAME                               HOST/PORT                                         PATH      SERVICES   PORT      TERMINATION          WILDCARD
route.route.openshift.io/console   console-openshift-console.apps.test1.tt.testing             console    https     reencrypt/Redirect   None

The pods are running, service and route are up, but accessing https://console-openshift-console.apps.test1.tt.testing in browser says it couldn't resolve IP address.

As part of the setup I've configured dnsmasq as it was described in the libvirt guide.
For example, ping test1-api.tt.testing works as expected, but ping console-openshift-console.apps.test1.tt.testing throws:

ping: console-openshift-console.apps.test1.tt.testing: Name or service not known

What you expected to happen?

Web console to be accessible.

How to reproduce it (as minimally and precisely as possible)?

Follow https://github.com/openshift/installer/blob/master/docs/dev/libvirt-howto.md (my host machine is Fedora 29)

INSTALLATION OUTPUT
╭─rhopp@localhost ~/go/src/github.com/openshift/installer/bin  ‹master*› 
╰─$ ./openshift-install create cluster
? SSH Public Key  [Use arrows to move, type to filter, ? for more help]
  /home/rhopp/.ssh/gitlab.cee.key.pub
> <none>
? SSH Public Key  [Use arrows to move, type to filter, ? for more help]
> /home/rhopp/.ssh/gitlab.cee.key.pub
  <none>
? SSH Public Key /home/rhopp/.ssh/gitlab.cee.key.pub
? Platform  [Use arrows to move, type to filter]
> aws
  libvirt
  openstack
? Platform  [Use arrows to move, type to filter]
  aws
> libvirt
  openstack
? Platform libvirt
? Libvirt Connection URI [? for help] (qemu+tcp://192.168.122.1/system) 
? Libvirt Connection URI qemu+tcp://192.168.122.1/system
? Base Domain [? for help] tt.testing
? Base Domain tt.testing
? Cluster Name [? for help] test1
? Cluster Name test1
? Pull Secret [? for help] *************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************                           INFO Fetching OS image: redhat-coreos-maipo-47.247-qemu.qcow2.gz 
INFO Creating cluster...                          
INFO Waiting up to 30m0s for the Kubernetes API... 
INFO API v1.11.0+e3fa228 up                       
INFO Waiting up to 30m0s for the bootstrap-complete event... 
INFO Destroying the bootstrap resources...        
INFO Waiting up to 10m0s for the openshift-console route to be created... 
INFO Install complete!                            
INFO Run 'export KUBECONFIG=/home/rhopp/go/src/github.com/openshift/installer/bin/auth/kubeconfig' to manage the cluster with 'oc', the OpenShift CLI. 
INFO The cluster is ready when 'oc login -u kubeadmin -p 5tQwM-fXfkC-MIeAH-BmLeN' succeeds (wait a few minutes). 
INFO Access the OpenShift web-console here: https://console-openshift-console.apps.test1.tt.testing 
INFO Login to the console with user: kubeadmin, password: 5tQwM-fXfkC-MIeAH-BmLeN
@crawford
Copy link
Contributor

crawford commented Jan 7, 2019

Duplicate of #411.

@openshift-ci-robot
Copy link
Contributor

@crawford: Closing this issue.

In response to this:

Duplicate of #411.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@wking
Copy link
Member

wking commented Jan 8, 2019

#411 was closed, since AWS works. Reopening for libvirt.

@wking
Copy link
Member

wking commented Mar 7, 2019

Docs in flight with #1371

@ghost
Copy link

ghost commented Mar 10, 2019

Hi,

Does this working? #1371
he responds by all wildcard?

Best Regards,
Fábio Sbano

@zeenix
Copy link
Contributor

zeenix commented May 22, 2019

90b0d45 only documents a workaround, unfortunately.

/reopen

@openshift-ci-robot
Copy link
Contributor

@zeenix: Reopened this issue.

In response to this:

90b0d45 only documents a workaround, unfortunately.

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sgreene570
Copy link

Has anyone had luck with the work around posted in 90b0d45 recently? My libvirt cluster does not bring up the console operator with or without the documented workaround.

@sgreene570
Copy link

I tried setting the oauth hostname statically without wildcards in my dnsmasq config and im still getting oauth console errors.
See below.

dnsmasq config

~$ cat /etc/NetworkManager/dnsmasq.d/openshift.conf 
server=/tt.testing/192.168.126.1
address=/.apps.tt.testing/192.168.126.51
address=/oauth-openshift.apps.test1.tt.testing/192.168.126.51

Sanity check that hostname is resolving to proper node IP

~$ ping oauth-openshift.apps.test1.tt.testing
PING oauth-openshift.apps.test1.tt.testing (192.168.126.51) 56(84) bytes of data.
64 bytes from 192.168.126.51 (192.168.126.51): icmp_seq=1 ttl=64 time=0.114 ms
64 bytes from 192.168.126.51 (192.168.126.51): icmp_seq=2 ttl=64 time=0.136 ms

Output of openshift-console crashed pod logs

~$ oc logs -f console-67dbf7f789-k4gqg  
2019/05/30 22:51:45 cmd/main: cookies are secure!
2019/05/30 22:51:45 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.test1.tt.testing/oauth/token failed: Head https://oauth-openshift.apps.test1.tt.testing: dial tcp: lookup oauth-openshift.apps.test1.tt.testing on 172.30.0.10:53: no such host

Am I missing something?

@zeenix
Copy link
Contributor

zeenix commented May 31, 2019

Has anyone had luck with the work around posted in 90b0d45 recently?

I just did and except for the usual timeout issue, the cluster came up all good afaict.

@zeenix
Copy link
Contributor

zeenix commented Jun 28, 2019

/priority important-longterm

@openshift-ci-robot openshift-ci-robot added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Jun 28, 2019
@zeenix
Copy link
Contributor

zeenix commented Jun 28, 2019

@cfergeau You said you had a WIP patch to fix this on libvirt level. Do you think you'd be able to get that in, in the near future?

/assign @cfergeau

@openshift-ci-robot
Copy link
Contributor

@zeenix: GitHub didn't allow me to assign the following users: cfergeau.

Note that only openshift members and repo collaborators can be assigned and that issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

@cfergeau You said you had a WIP patch to fix this on libvirt level. Do you think you'd be able to get that in, in the near future?

/assign @cfergeau

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@TuranTimur
Copy link

Hi. I did the same but still error persist.
Do I need to debug installer? or would there be any other pointer?

tail -f setup/.openshift_install.log
time="2019-08-10T04:47:10+08:00" level=debug msg="Still waiting for the cluster to initialize: Multiple errors are preventing progress:\n* Could not update servicemonitor "openshift-apiserver-operator/openshift-apiserver-operator" (417 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-authentication-operator/authentication-operator" (382 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-cluster-version/cluster-version-operator" (6 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-controller-manager-operator/openshift-controller-manager-operator" (421 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-image-registry/image-registry" (388 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-kube-apiserver-operator/kube-apiserver-operator" (398 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-kube-controller-manager-operator/kube-controller-manager-operator" (402 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-kube-scheduler-operator/kube-scheduler-operator" (406 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-machine-api/cluster-autoscaler-operator" (144 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-machine-api/machine-api-operator" (408 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-operator-lifecycle-manager/olm-operator" (411 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-service-catalog-apiserver-operator/openshift-service-catalog-apiserver-operator" (391 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-service-catalog-controller-manager-operator/openshift-service-catalog-controller-manager-operator" (394 of 422): the server does not recognize this resource, check extension API servers"
time="2019-08-10T04:54:14+08:00" level=debug msg="Still waiting for the cluster to initialize: Multiple errors are preventing progress:\n* Could not update servicemonitor "openshift-apiserver-operator/openshift-apiserver-operator" (417 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-authentication-operator/authentication-operator" (382 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-cluster-version/cluster-version-operator" (6 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-controller-manager-operator/openshift-controller-manager-operator" (421 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-image-registry/image-registry" (388 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-kube-apiserver-operator/kube-apiserver-operator" (398 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-kube-controller-manager-operator/kube-controller-manager-operator" (402 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-kube-scheduler-operator/kube-scheduler-operator" (406 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-machine-api/cluster-autoscaler-operator" (144 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-machine-api/machine-api-operator" (408 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-operator-lifecycle-manager/olm-operator" (411 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-service-catalog-apiserver-operator/openshift-service-catalog-apiserver-operator" (391 of 422): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor "openshift-service-catalog-controller-manager-operator/openshift-service-catalog-controller-manager-operator" (394 of 422): the server does not recognize this resource, check extension API servers"
time="2019-08-10T04:56:51+08:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.2.0-0.okd-2019-08-09-191209"
time="2019-08-10T04:56:51+08:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.2.0-0.okd-2019-08-09-191209: downloading update"
time="2019-08-10T04:56:56+08:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.2.0-0.okd-2019-08-09-191209"
time="2019-08-10T04:57:11+08:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.2.0-0.okd-2019-08-09-191209: 19% complete"
time="2019-08-10T04:57:22+08:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.2.0-0.okd-2019-08-09-191209: 82% complete"
time="2019-08-10T04:57:38+08:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.2.0-0.okd-2019-08-09-191209: 95% complete"
time="2019-08-10T05:00:27+08:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.2.0-0.okd-2019-08-09-191209: 95% complete"
time="2019-08-10T05:01:40+08:00" level=fatal msg="failed to initialize the cluster: Working towards 4.2.0-0.okd-2019-08-09-191209: 95% complete"

@rthallisey
Copy link

rthallisey commented Jul 3, 2020

Here's a workaround: #1648 (comment)

@clnperez
Copy link
Contributor

clnperez commented Aug 6, 2020

Last week while trying to do some basic verification I ran into an issue where the workaround listed in the installer troubleshooting doc wasn't working. We figured out it was due to the fact that I had spun up a cluster with three workers, but the ingress controller has 2 set in its replicaset. So neither of those pods landed on the. 51 worker -- and we saw the same symptoms as if no workaround had been applied. It doesn't look like there's a way to do wildcards and have multiple IPs for a host entry. dnsmasq seems to take the last entry in a file as the IP instead of do any kind of round-robin. Any suggestions? Or do we just need to edit the manifest for the ingress operator to create 3 replicas?

@marshallford
Copy link

@clnperez I'm running into the same issue. Did you manage to find a solve?

@clnperez
Copy link
Contributor

@marshallford no, nothing other than spinning up that 3rd replica for the ingress.

Prashanth684 added a commit to Prashanth684/installer that referenced this issue Jan 11, 2021
…config

Since libvirt 5.6.0, there is an option to pass in dnsmasq options through the libvirt network [1]. This addresses the following problems:

- eliminate the need for hacking routes in the cluster (the workaround mentioned in [3]) so that libvirt's dnsmasq does not manage the domain (and so the requests from inside the cluster will go up the chain to the host itself).
- eliminate the hacky workaround used in the multi-arch CI automation to inject `*.apps` entries in the libvirt network that point to a single worker node [2]. Instead of waiting for the libvirt networks to come up and update entries, we can set this before the installation itself through the install config.
- another issue this solves - with the above mentioned workaround, having multiple worker nodes becomes problematic when running upgrade tests. Having the route to just one worker node would fail the upgrade when that worker node is down. With this change, we could now point to the .1 address and have a load balancer forward traffic to any worker node.

With this change, the option can be specified through the install config yaml in the network section as pairs of option name and values. An example:
```
platform:
  libvirt:
    network:
      dnsmasqOptions:
      - name: "address"
        value: "/.apps.tt.testing/192.168.126.51"
      if: tt0
```
The terraform provider supports rendering these options through a datasource and injecting them into the network xml.
Since this config is optional, not specifying it will continue to work as before without issues.

[1] https://libvirt.org/formatnetwork.html#elementsNamespaces
[2] https://github.com/openshift/release/blob/master/ci-operator/templates/openshift/installer/cluster-launch-installer-remote-libvirt-e2e.yaml#L532-L554
[2] openshift#1007
@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 28, 2021
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 27, 2021
@openshift-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci-robot
Copy link
Contributor

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

dale-fu pushed a commit to dale-fu/installer that referenced this issue Aug 30, 2021
…config

Since libvirt 5.6.0, there is an option to pass in dnsmasq options through the libvirt network [1]. This addresses the following problems:

- eliminate the need for hacking routes in the cluster (the workaround mentioned in [3]) so that libvirt's dnsmasq does not manage the domain (and so the requests from inside the cluster will go up the chain to the host itself).
- eliminate the hacky workaround used in the multi-arch CI automation to inject `*.apps` entries in the libvirt network that point to a single worker node [2]. Instead of waiting for the libvirt networks to come up and update entries, we can set this before the installation itself through the install config.
- another issue this solves - with the above mentioned workaround, having multiple worker nodes becomes problematic when running upgrade tests. Having the route to just one worker node would fail the upgrade when that worker node is down. With this change, we could now point to the .1 address and have a load balancer forward traffic to any worker node.

With this change, the option can be specified through the install config yaml in the network section as pairs of option name and values. An example:
```
platform:
  libvirt:
    network:
      dnsmasqOptions:
      - name: "address"
        value: "/.apps.tt.testing/192.168.126.51"
      if: tt0
```
The terraform provider supports rendering these options through a datasource and injecting them into the network xml.
Since this config is optional, not specifying it will continue to work as before without issues.

[1] https://libvirt.org/formatnetwork.html#elementsNamespaces
[2] https://github.com/openshift/release/blob/master/ci-operator/templates/openshift/installer/cluster-launch-installer-remote-libvirt-e2e.yaml#L532-L554
[2] openshift#1007
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. platform/libvirt priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

Successfully merging a pull request may close this issue.