-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[libvirt] Failed to get console route for v0.10.0 tag and ImagePullBackoff for clusterapi-manager-controllers #1078
Comments
I have the same issue, it fails to pull the image doing an "oc describe pod " shows the following. Normal Scheduled 46m default-scheduler Successfully assigned openshift-cluster-api/clusterapi-manager-controllers-db4fbd5fc-bmlhw to ocp-master-0 |
I tried today with master branch and didn't see this issue but then again with 0.10.0 tag it is occurring so might be something to do with the way we tag the payload for this tag? |
I also rebuilt from 0.9.1 tag and got it working with that. |
0.10.0 is a special release. It is the beta1 build, which means that it targets a different set of content than 0.9.1. I also noticed that the libvirt container isn't pushed to quay (unlike its AWS counterpart), so I think it was just missed in the release process. |
@crawford thanks, that explain why it is happening with only |
I just wanted to add that my team is hitting this issue as well and are stuck on 0.9.1 for now until we find a way to run 0.10.x locally with libvirt. |
Same issue with 0.10.1, console is not deployed because there are no workers available... because the clusterapi-manager-controllers is not up... because it is trying to pull the image from an internal registry which I cannot access:
|
This is a release issue, the installer just pins the update payloads the release folks push to quay.io. It's being tracked here. |
@wking any update on this one? I am using the latest master and have exactly the same issue |
You must have a pull secret to api.ci in order to access libvirt, because the installer team has chosen not to build libvirt for OCP. |
@smarterclayton thanks. How do I get one? |
If you're not in the openshift GitHub organization, you can't get one. libvirt isn't supported in the official installer. You need to use the origin variant or not use libvirt. |
@smarterclayton what's Origin variant? Flavor isn't that important to me. I need a local 4.0 cluster :) |
git clone openshift/installer, run hack/build-go.sh, and that's origin |
For libvirt, you need to set |
Unless you take steps to preserve the public (I think?) OKD builds at registry.svc.ci.openshift.org/openshift/origin-release, they're going to get garbage-collected after a few days. Master installer builds (currently the only way to get libvirt compiled in) point there by default, so your cluster should run fine for a few days and then probably start to die as the backing images get garbage-collected. Should be fine for dev-work (the libvirt target), but it's not going to work for long-running tasks out of the box. |
@smarterclayton @wking but this is exactly what I am doing:
Events: Is this expected with 0.10.0? Since it works with 0.9.1 for me on Fedora 28 with libvirt. However, unfortunately, 0.9.1 has a bug (already fixed openshift/console#1112) and this version does not work for me since I need to work with OperatorHub and integration of Eclipse Che operator. What would be the best way to proceed? Give up on a local install with libvirt and look for AWS resources? |
@eivantsov https://bugzilla.redhat.com/show_bug.cgi?id=1666561this is where it is tracked, do put your comments. |
Building from tagged releases get update payloads from quay.io, see the Bugzilla bug linked above (twice now ;). Building from master should work better, but comes wiith its own caveats.
We will sell AWS support, so yeah, I expect that is the best route if you want fewer quirks at this stage. |
@praveenkumar i don't have access to this issue @wking would it be fair to say that 0.10.0+ libvirt installation is broken now? |
@eivantsov if you login using your redhat account then you will able to access this atm.
@eivantsov this only broken for tagged release which have released payloads but it does work with from master as @wking said. |
@praveenkumar I have the same problem with master too And I am logged in with my RH email |
@wking @praveenkumar
Last lines from install log:
There are a couple of failed pods with connection refused errors:
Events: |
Current master (commit id d3ff3af):
Findings:
I think the issues can be workarounded by adding more cpu/ram to the master node (so, create the manifests, modify the cpu/ram specs for the masters, and create the cluster), but I will need to find somewhere else to test it, my laptop is not capable of do that. Just in case, in order to make this 'work' in my laptop (t480s, 16 gb ram) I need to:
|
@e-minguez i have 24GB and while installing I do not see all of my RAM being used. |
I do have 16 and unless I add 4 gb more of swap (goodbye nvme!) the installer is oom-killed :) |
It seems, that I stumbled upon the same issue using 0.11.0 version and AWS as the target infrastructure.
INFO Creating cluster...
INFO Waiting up to 30m0s for the Kubernetes API...
INFO API v1.11.0+8868a98a7b up
INFO Waiting up to 30m0s for the bootstrap-complete event...
ERROR: logging before flag.Parse: E0130 12:44:05.888016 25969 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=3, ErrCode=NO_ERROR, debug=""
WARNING RetryWatcher - getting event failed! Re-creating the watcher. Last RV: 297
INFO Destroying the bootstrap resources...
INFO Waiting up to 10m0s for the openshift-console route to be created...
FATAL waiting for openshift-console URL: context deadline exceeded
Here's the |
I am in the completely same situation as @eivantsov . I need to test the operator in the marketplace and I am also hitting this issue in
and I can confirm that it's almost impossible to run the installer on t480s with 16gigs of RAM. |
@wking Since now this payload is available on |
I still can see that issue on the master
|
I've been having similar issues which I put down to memory sizing #1041 |
Hi guys, The minimum required is : 1 x master *router - need two worker Best, |
Is there a router pull or docs I can link for that? I guess we need to bump our libvirt default to catch up. |
wking, I'll create a howto.. see running at https://youtu.be/ZOZPmwUwWj8 Best Regards, |
Wking, Are you using the latest version of the installer? Regards, |
I haven't run it on libvirt in a while, but if the router for some reason needs 2+ compute nodes now, we'd want to update the default and some validation. Or is the issue total compute memory constraints or similar, and not actually compute replica count? |
WKing, replica count=2 then you can not hear two 443, 80 on the same physical or virtual host. Best Regards, |
I needed to change some things in my setup configuration file and adjust the memory and solve the problem with dnsmasq. Regards, |
.tf and wildcard "*.apps.test1.tt.testing" on bind listen ip 192.168.126.1 Regards, |
Hi, [fsbano@voyager-1 ~]$ oc get deployment [fsbano@voyager-1 ~]$ oc describe deployment/router-default Type Reason Age From Message Normal ScalingReplicaSet 8m deployment-controller Scaled up replica set router-default-779745f684 to 2 [fsbano@voyager-1 ~]$ oc edit deployments/router-default
Best Regards, |
Hi, My install-config.yaml file [root@voyager-1 ~]# more install-config.yaml
$ ./openshift-install create cluster --dir . --log-level debug Best Regards, |
Hey, Step-by-Step [fsbano@voyager-1 ~]$ oc get pod --all-namespaces | egrep -v '(Running|Completed)'
[fsbano@voyager-1 ~]$ host prometheus-k8s-openshift-monitoring.apps.jaguar.fsbano.io [fsbano@voyager-1 ~]$ sudo service named restart [fsbano@voyager-1 ~]$ host prometheus-k8s-openshift-monitoring.apps.jaguar.fsbano.io [fsbano@voyager-1 ~]$ oc scale deployment.apps/console --replicas=0 [fsbano@voyager-1 ~]$ oc get pod [fsbano@voyager-1 ~]$ oc get pod [fsbano@voyager-1 ~]$ oc get pod Best Regards, |
Can I send a pull request? Best Regards, |
@ssbano From which commit (component) it is required to have 2 worker to make it work in case of libvirt platform? If this is hard requirement then it would be problematic for us (Code Ready Container team), we are trying only single node cluster (with no worker). I tested 0.14.0 tag with single worker and everything worked as expected. but today when I am trying out the master then getting following error (is this because of that limitation?)
|
I was using the master until yesterday. this morning I did downgrade to 0.14 and it is also running perfectly Could you describe your setup? PS: I saw that they updated the image to 20190310 Regards, |
on the two worker it works only with a work but will always be "pending". Please, oc project openshift-ingress see #1395 Regards, |
I think this is likely a duplicate of #1007 |
Already fixed. |
Version
Platform (aws|libvirt|openstack):
libvirt
What happened?
Installer is failed to get the
console route
and exit after 10 mins due tocontext deadline exceeded
.What you expected to happen?
Installer should able to create the cluster without any issue.
How to reproduce it (as minimally and precisely as possible)?
Anything else we need to know?
Looks like the issue with
clusterapi-manager-controllers-db4fbd5fc-f7x6x
pod since it is inImagePullBackoff
state and logs shows that it is not able to identify the master node.References
The text was updated successfully, but these errors were encountered: