Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[4.6, 4.7] installer bootstrap never completes although cluster install completes #502

Closed
fortinj66 opened this issue Feb 4, 2021 · 6 comments

Comments

@fortinj66
Copy link
Contributor

fortinj66 commented Feb 4, 2021

Describe the bug
When installing newer 4.6 and 4.7 clusters (vSphere IPI) bootstrap never completes. However, cluster install continues and if I do a wait-for install complete it will finish.

There is another issue which may be related... When the api-int IP fails over from the boot strap, the API becomes unavailable and does not comeback. If I reboot the master node with the api-int IP, the cluster API comes back online and installation continues. This has started happening within the last week or so.... I can open another bug if needed for this

Version
vSphere IPI

./openshift-install 4.7.0-0.okd-2021-02-04-113520
built from commit 8fd219508096f711ba0eb9e73416be19ddf90bb9
release image registry.ci.openshift.org/origin/release@sha256:872acc227406870ea535300bc2e985474640de09549a6d21052d142ec8aa09fd

How reproducible
100%

Log bundle
openshift_install.log
Must-gather

qa-c1v4-kqlzb-master-1 is the master node which had to be rebooted manually...

@fortinj66
Copy link
Contributor Author

Similar issue at
(openshift/installer#4643)

@fortinj66
Copy link
Contributor Author

This is also affecting newer 4.6 nightlies

@fortinj66 fortinj66 changed the title [4.7] installer bootstrap never completes although cluster install completes [4.6, 4.7] installer bootstrap never completes although cluster install completes Feb 14, 2021
@vrutkovs
Copy link
Member

Seems to be a regression from https://bugzilla.redhat.com/show_bug.cgi?id=1918281

@ronnessim
Copy link

We've been experiencing a similar issue since 4.40 (that's the last known working version that installs properly with IPI on VMWare. We see this in the output of the installer for 4.60 that was released today:

INFO API v1.19.2-1049+f173eb4a83e557-dirty up
INFO Waiting up to 30m0s for bootstrapping to complete...
E0214 18:52:40.894830 255812 reflector.go:307] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to watch *v1.ConfigMap: Get "https://api.:6443/api/v1/namespaces/kube-system/configmaps?allowWatchBookmarks=true&fieldSelector=metadata.name%3Dbootstrap&resourceVersion=10395&timeoutSeconds=345&watch=true": dial tcp <API_IP>:6443: connect: connection refused

This repeats until we see:

FATAL Bootstrap failed to complete: failed to wait for bootstrapping to complete: timed out waiting for the condition. I belive our logs are similar to those already attached to this issue, but if our logs would be useful, we are happy to share them.

@fortinj66
Copy link
Contributor Author

For OKD, this is actually an issue with /etc/resolv.conf and how it interacts with systemd-resolved. We fixed prepender code for the masters and workers and now the same fix is needed for the bootstrap nodes...

for OKD, /etc/resolv.conf should be a link...

OCP doesn't use systemd-resolved so it should be a static file

Best way to check is to ssh into the nodes and check /etc/resolv.conf

@fortinj66
Copy link
Contributor Author

Fix is in upstream installer. openshift/installer#4654

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants