FATAL failed waiting for Kubernetes API: Get "https://api.milan46.conlab.ocp:6443/version?timeout=32s": Service Unavailable #4643

fazlk28 · 2021-02-11T09:49:04Z

Trying to create Openshift 4.6 setup on baremetal using UPI method.
1 load balancer
1 bootstrap machine
3 master node
2 worker node

Bootstrap machine logs:

Wait for bootstrap logs:

Version

[root@milan-installer ocinstall]# openshift-install version
openshift-install 4.6.16
built from commit 8a1ec01353e68cb6ebb1dd890d684f885c33145a
release image quay.io/openshift-release-dev/ocp-release@sha256:3e855ad88f46ad1b7f56c312f078ca6adaba623c5d4b360143f9f82d2f349741
[root@milan-installer ocinstall]#

Platform:

What happened?

Trying to create Openshift 4.6 setup on baremetal using UPI method.
1 load balancer
1 bootstrap machine
3 master node
2 worker node

The bootstrap machine is completed successfully. But for some reason the kubernetes API service is unavailable.
Am not able to figure out why it is throwing this error since everything looks fine.
If some proxy/dns settings were misconfigured, the bootstrap would have been in completed state.
but here the proxy/dns is configured properly.
Would be glad to some help in this issue since i have been stuck in this issue for a while
Thanks in advance

Wait for bootstrap logs:

[root@milan-installer ocinstall]# ./openshift-install wait-for bootstrap-complete --log-level=debug --dir=ign_mani_files
DEBUG OpenShift Installer 4.6.16
DEBUG Built from commit 8a1ec01353e68cb6ebb1dd890d684f885c33145a
INFO Waiting up to 20m0s for the Kubernetes API at https://api.milan46.conlab.ocp:6443...
DEBUG Still waiting for the Kubernetes API: Get "https://api.milan46.conlab.ocp:6443/version?timeout=32s": Service Unavailable
DEBUG Still waiting for the Kubernetes API: Get "https://api.milan46.conlab.ocp:6443/version?timeout=32s": Service Unavailable
DEBUG Still waiting for the Kubernetes API: Get "https://api.milan46.conlab.ocp:6443/version?timeout=32s": Service Unavailable
DEBUG Still waiting for the Kubernetes API: Get "https://api.milan46.conlab.ocp:6443/version?timeout=32s": Service Unavailable
DEBUG Still waiting for the Kubernetes API: Get "https://api.milan46.conlab.ocp:6443/version?timeout=32s": Service Unavailable
DEBUG Still waiting for the Kubernetes API: Get "https://api.milan46.conlab.ocp:6443/version?timeout=32s": Service Unavailable
DEBUG Still waiting for the Kubernetes API: Get "https://api.milan46.conlab.ocp:6443/version?timeout=32s": Service Unavailable
DEBUG Still waiting for the Kubernetes API: Get "https://api.milan46.conlab.ocp:6443/version?timeout=32s": Service Unavailable
DEBUG Still waiting for the Kubernetes API: Get "https://api.milan46.conlab.ocp:6443/version?timeout=32s": Service Unavailable
DEBUG Still waiting for the Kubernetes API: Get "https://api.milan46.conlab.ocp:6443/version?timeout=32s": Service Unavailable
DEBUG Still waiting for the Kubernetes API: Get "https://api.milan46.conlab.ocp:6443/version?timeout=32s": Service Unavailable
DEBUG Still waiting for the Kubernetes API: Get "https://api.milan46.conlab.ocp:6443/version?timeout=32s": Service Unavailable
ERROR Attempted to gather ClusterOperator status after wait failure: listing ClusterOperator objects: Get "https://api.milan46.conlab.ocp:6443/apis/config.openshift.io/v1/clusteroperators": Service Unavailable
INFO Use the following commands to gather logs from the cluster
INFO openshift-install gather bootstrap --help
FATAL failed waiting for Kubernetes API: Get "https://api.milan46.conlab.ocp:6443/version?timeout=32s": Service Unavailable

Bootstrap machine logs:

Feb 11 09:04:38 milan-bootstrap bootkube.sh[18370]: Skipped "secret-initial-kube-controller-manager-service-account-private-key.yaml" secrets.v1./initial-service-account-private-key -n openshift-config as it already exists
Feb 11 09:04:38 milan-bootstrap bootkube.sh[18370]: Skipped "secret-kube-apiserver-to-kubelet-signer.yaml" secrets.v1./kube-apiserver-to-kubelet-signer -n openshift-kube-apiserver-operator as it already exists
Feb 11 09:04:38 milan-bootstrap bootkube.sh[18370]: Skipped "secret-loadbalancer-serving-signer.yaml" secrets.v1./loadbalancer-serving-signer -n openshift-kube-apiserver-operator as it already exists
Feb 11 09:04:39 milan-bootstrap bootkube.sh[18370]: Skipped "secret-localhost-serving-signer.yaml" secrets.v1./localhost-serving-signer -n openshift-kube-apiserver-operator as it already exists
Feb 11 09:04:39 milan-bootstrap bootkube.sh[18370]: Skipped "secret-service-network-serving-signer.yaml" secrets.v1./service-network-serving-signer -n openshift-kube-apiserver-operator as it already exists
Feb 11 09:04:39 milan-bootstrap bootkube.sh[18370]: Sending bootstrap-finished event.Tearing down temporary bootstrap control plane...
Feb 11 09:04:39 milan-bootstrap bootkube.sh[18370]: Waiting for CEO to finish...
Feb 11 09:04:40 milan-bootstrap bootkube.sh[18370]: W0211 09:04:40.607883       1 etcd_env.go:298] cipher is not supported for use with etcd: "TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256"
Feb 11 09:04:40 milan-bootstrap bootkube.sh[18370]: W0211 09:04:40.608310       1 etcd_env.go:298] cipher is not supported for use with etcd: "TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256"
Feb 11 09:04:40 milan-bootstrap bootkube.sh[18370]: I0211 09:04:40.634884       1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True.
Feb 11 09:06:56 milan-bootstrap bootkube.sh[18370]: I0211 09:06:56.210792       1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True.
Feb 11 09:06:59 milan-bootstrap bootkube.sh[18370]: I0211 09:06:59.610326       1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True.
Feb 11 09:10:56 milan-bootstrap bootkube.sh[18370]: I0211 09:10:56.885588       1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True.
Feb 11 09:11:00 milan-bootstrap bootkube.sh[18370]: I0211 09:11:00.088490       1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True.
Feb 11 09:11:00 milan-bootstrap bootkube.sh[18370]: I0211 09:11:00.138906       1 waitforceo.go:64] Cluster etcd operator bootstrapped successfully
Feb 11 09:11:00 milan-bootstrap bootkube.sh[18370]: I0211 09:11:00.139073       1 waitforceo.go:58] cluster-etcd-operator bootstrap etcd
Feb 11 09:11:00 milan-bootstrap bootkube.sh[18370]: bootkube.service complete

What you expected to happen?

The kubernetes API service should be available and the bootstrap complete process should proceed ahead.

How to reproduce it (as minimally and precisely as possible)?

Followed the baremetal installation steps. https://docs.openshift.com/container-platform/4.6/installing/installing_bare_metal/installing-bare-metal.html

With below test machines
1 load balancer
1 bootstrap machine
3 master node
2 worker node

$ your-commands-here

Anything else we need to know?

Enter text here.

References

enter text here.

The text was updated successfully, but these errors were encountered:

staebler · 2021-02-15T04:37:39Z

Your issue is either with your DNS configuration for the API URL or with your load balancer. The cluster does not need the API URL in order to install successfully, which is why the cluster installation succeeds despite your not being able to access the cluster via the API URL.

fazlk28 · 2021-02-15T07:25:04Z

Thanks for the update. Could you please help me out here on what changes I need to do on my end? The workaround you had mentioned is for my original issue? Also you had mentioned this,* "whatever ${CLUSTER_ETCD_OPERATOR_IMAGE} points to, it probably needs to be updated to use different ciphers"* Could you please be more specific on how do I check the image and where I need to change to. Also how do I update the ciphers. Would really appreciate your help here. Regards, Kareem

…

On Mon, Feb 15, 2021 at 12:30 AM John Fortin ***@***.***> wrote: looks like this part of boot-kube.sh is failing: then echo "Waiting for CEO to finish..." bootkube_podman_run \ --volume "$PWD:/assets:z" \ "${CLUSTER_ETCD_OPERATOR_IMAGE}" \ /usr/bin/cluster-etcd-operator \ wait-for-ceo \ --kubeconfig /assets/auth/kubeconfig fi # Workaround for opencontainers/runc#1807 touch /opt/openshift/.bootkube.done echo "bootkube.service complete" whatever ${CLUSTER_ETCD_OPERATOR_IMAGE} points to, it probably needs to be updated to use different ciphers — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#4643 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ASZ7UKHLT6JSQZ54UGH74MTS7AMWNANCNFSM4XOQSXFA> .

staebler · 2021-02-15T14:22:43Z

Also you had mentioned this,* "whatever ${CLUSTER_ETCD_OPERATOR_IMAGE}
points to, it probably needs to be updated to use different ciphers"*
Could you please be more specific on how do I check the image and where I
need to change to.
Also how do I update the ciphers.

I don't know where this advice is coming from. You do not need to do anything with the etcd operator image.

Could you please help me out here on what changes I need to do on my end?

Do dig api.milan46.conlab.ocp to verify that your API URL is indeed resolving to your load balancer.
Check that your load balancer is forwarding https communication to your 3 control plane nodes via https to port 6443.

fortinj66 · 2021-02-15T15:22:02Z

Also you had mentioned this,* "whatever ${CLUSTER_ETCD_OPERATOR_IMAGE}
points to, it probably needs to be updated to use different ciphers"*
Could you please be more specific on how do I check the image and where I
need to change to.
Also how do I update the ciphers.

This is my fault. I was seeing the same issue and went down a rabbit hole due to warnings about ciphers connecting to the ETCD cluster in boot-kube.sh...

I deleted the comment earlier about the ciphers...

The issue I experienced was due to issues setting up /etc/resolv.conf in a systemd-resolved configuration #4654

fazlk28 · 2021-02-16T17:13:28Z

@staebler,

I have the tried to use the dig command on all the nodes and below is the response,

[core@milan-m1 ~]$ dig api.milan46.conlab.ocp

; <<>> DiG 9.11.13-RedHat-9.11.13-6.el8_2.1 <<>> api.milan46.conlab.ocp
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: FORMERR, id: 26490
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: ae87454e32df5e02 (echoed)
;; QUESTION SECTION:
;api.milan46.conlab.ocp.                IN      A

;; Query time: 0 msec
;; SERVER: 10.157.65.38#53(10.157.65.38)
;; WHEN: Mon Feb 15 15:47:18 UTC 2021
;; MSG SIZE  rcvd: 63

[core@milan-m1 ~]$ ping api.milan46.conlab.ocp
PING api.milan46.conlab.ocp (10.157.66.241) 56(84) bytes of data.
64 bytes from 10.157.66.241 (10.157.66.241): icmp_seq=1 ttl=64 time=0.236 ms
64 bytes from 10.157.66.241 (10.157.66.241): icmp_seq=2 ttl=64 time=0.211 ms
64 bytes from 10.157.66.241 (10.157.66.241): icmp_seq=3 ttl=64 time=0.166 ms
^C
--- api.milan46.conlab.ocp ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 37ms
rtt min/avg/max/mdev = 0.166/0.204/0.236/0.031 ms
[core@milan-m1 ~]$

Regarding your comment,"Check that your load balancer is forwarding https communication to your 3 control plane nodes via https to port 6443."

The ports are open,

[root@milan-proxy ~]# netstat -nltupe | grep -E ':80|:443|:6443|:22623'
tcp        0      0 0.0.0.0:443             0.0.0.0:*               LISTEN      0          18318      1041/haproxy
tcp        0      0 0.0.0.0:22623           0.0.0.0:*               LISTEN      0          18316      1041/haproxy
tcp        0      0 0.0.0.0:6443            0.0.0.0:*               LISTEN      0          18315      1041/haproxy
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      0          18317      1041/haproxy
[root@milan-proxy ~]# ss -nltupe | grep -E ':80|:443|:6443|:22623'
tcp    LISTEN     0      128       *:443                   *:*                   users:(("haproxy",pid=1041,fd=8)) ino:18318 sk:ffff918077581f00 <->
tcp    LISTEN     0      128       *:22623                 *:*                   users:(("haproxy",pid=1041,fd=6)) ino:18316 sk:ffff918077580f80 <->
tcp    LISTEN     0      128       *:6443                  *:*                   users:(("haproxy",pid=1041,fd=5)) ino:18315 sk:ffff9180775807c0 <->
tcp    LISTEN     0      128       *:80                    *:*                   users:(("haproxy",pid=1041,fd=7)) ino:18317 sk:ffff918077581740 <->

Could you please let me know how to forward the ports and check? Would appreciate it.

fazlk28 · 2021-02-16T17:27:39Z

@staebler, updating the log bundle here.
log-bundle-20210211045227.tar.gz
)

staebler · 2021-02-16T17:41:17Z

Is it safe to assume that the IP address of your proxy is 10.157.66.241?
It is suspect that the dig query had no answers.
It looks like you are using haproxy for your load balancer. Refer to the [bare-metal installation docs] (https://docs.openshift.com/container-platform/4.6/installing/installing_bare_metal/installing-bare-metal.html#network-connectivity_installing-bare-metal) for information on what the load balancer requirements are.

fazlk28 · 2021-02-16T18:52:30Z

Yes that's right. It is the proxy IP.
I wonder why the dig had no answers. I was able to ping it.
I have been following this page to create the openshift cluster since 4.3 but not sure why am facing an issue now.
I believe i have configured all the pre-requisite for the load balancer.
(https://docs.openshift.com/container-platform/4.6/installing/installing_bare_metal/installing-bare-metal.html#network-connectivity_installing-bare-metal.......000000

I just wanted some help to overcome this. I will dig more on to why the dig had no answers when trying to communicate with the api.milan46.conlab.ocp

staebler · 2021-02-16T19:38:04Z

Can you run curl -kv api.milan46.conlab.ocp:6443?
Can you bypass the load balancer and curl one of your control plane nodes directly on port 6443? Presumably that should work since the cluster installed successfully.
Can you provide your haproxy config?

fazlk28 · 2021-02-17T17:16:10Z

Here is a output of curl command from one of the control plane node

[core@milan-m0 ~]$ curl -kv api.milan46.conlab.ocp:6443
* Rebuilt URL to: api.milan46.conlab.ocp:6443/
*   Trying 10.157.66.241...
* TCP_NODELAY set
* Connected to api.milan46.conlab.ocp (10.157.66.241) port 6443 (#0)
> GET / HTTP/1.1
> Host: api.milan46.conlab.ocp:6443
> User-Agent: curl/7.61.1
> Accept: */*
>
* HTTP 1.0, assume close after body
< HTTP/1.0 400 Bad Request
<
Client sent an HTTP request to an HTTPS server.
* Closing connection 0
[core@milan-m0 ~]$

=================
haproxy config:

[root@milan-proxy ~]# cat /etc/haproxy/haproxy.cfg
defaults
 mode  http
 log global
 option  httplog
 option  dontlognull
 option forwardfor except 127.0.0.0/8
 option  redispatch
 retries 3
 timeout http-request  10s
 timeout queue 1m
 timeout connect 10s
 timeout client  300s
 timeout server  300s
 timeout http-keep-alive 10s
 timeout check 10s
 maxconn 20000
listen stats
 bind :9000
 mode http
 stats enable
 stats uri /
frontend openshift-api-server
 bind *:6443
 default_backend openshift-api-server
 mode tcp
 option tcplog
backend openshift-api-server
 balance source
 mode tcp
 server  milan-bootstrap 10.157.66.242:6443 check
 server  milan-m0 10.157.66.243:6443 check
 server  milan-m1 10.157.66.244:6443 check
 server  milan-m2 10.157.66.245:6443 check
frontend machine-config-server
 bind *:22623
 default_backend machine-config-server
 mode tcp
 option tcplog
backend machine-config-server
 balance source
 mode tcp
 server  milan-bootstrap 10.157.66.242:22623 check
 server  milan-m0 10.157.66.243:22623 check
 server  milan-m1 10.157.66.244:22623 check
 server  milan-m2 10.157.66.245:22623 check
frontend ingress-http
 bind *:80
 default_backend ingress-http
 mode tcp
 option tcplog
backend ingress-http
 balance source
 mode tcp
 server milan-w0 10.157.66.246:80 check
 server milan-w1 10.157.66.247:80 check
frontend ingress-https
 bind *:443
 default_backend ingress-https
 mode tcp
 option tcplog
backend ingress-https
 balance source
 mode tcp
 server milan-w0 10.157.66.246:443 check
 server milan-w1 10.157.66.247:443 check

staebler · 2021-02-17T17:23:13Z

Sorry, wrong curl command. It should have been curl -kv https://api.milan46.conlab.ocp:6443.

fazlk28 · 2021-02-17T18:31:28Z

Here is the output of curl command:

[core@milan-m0 ~]$ curl -kv https://api.milan46.conlab.ocp:6443
* Rebuilt URL to: https://api.milan46.conlab.ocp:6443/
*   Trying 10.157.66.241...
* TCP_NODELAY set
* Connected to api.milan46.conlab.ocp (10.157.66.241) port 6443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Request CERT (13):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, [no content] (0):
* TLSv1.3 (OUT), TLS handshake, Certificate (11):
* TLSv1.3 (OUT), TLS handshake, [no content] (0):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=api.milan46.conlab.ocp
*  start date: Feb 11 08:46:43 2021 GMT
*  expire date: Mar 13 08:46:44 2021 GMT
*  issuer: OU=openshift; CN=kube-apiserver-lb-signer
*  SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* TLSv1.3 (OUT), TLS app data, [no content] (0):
* TLSv1.3 (OUT), TLS app data, [no content] (0):
* TLSv1.3 (OUT), TLS app data, [no content] (0):
* Using Stream ID: 1 (easy handle 0x563838816740)
* TLSv1.3 (OUT), TLS app data, [no content] (0):
> GET / HTTP/2
> Host: api.milan46.conlab.ocp:6443
> User-Agent: curl/7.61.1
> Accept: */*
>
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS app data, [no content] (0):
* Connection state changed (MAX_CONCURRENT_STREAMS == 2000)!
* TLSv1.3 (OUT), TLS app data, [no content] (0):
* TLSv1.3 (IN), TLS app data, [no content] (0):
* TLSv1.3 (IN), TLS app data, [no content] (0):
* TLSv1.3 (IN), TLS app data, [no content] (0):
< HTTP/2 403
< audit-id: 48765dc5-61e8-41dd-8e65-e574a56ed51f
< cache-control: no-cache, private
< content-type: application/json
< x-content-type-options: nosniff
< x-kubernetes-pf-flowschema-uid: 2bb8c895-c824-45f9-9aff-b0044183bf2d
< x-kubernetes-pf-prioritylevel-uid: 5abba096-30ed-499d-a5fa-3d0f58d2a249
< content-length: 233
< date: Wed, 17 Feb 2021 18:30:22 GMT
<
* TLSv1.3 (IN), TLS app data, [no content] (0):
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {

  },
  "status": "Failure",
  "message": "forbidden: User \"system:anonymous\" cannot get path \"/\"",
  "reason": "Forbidden",
  "details": {

  },
  "code": 403
* Connection #0 to host api.milan46.conlab.ocp left intact
}[core@milan-m0 ~]$

staebler · 2021-02-17T18:54:32Z

OK. The curl command made it all the way to the api server and back. Can you run oc get clusterversion --kubeconfig=auth/kubeconfig? If that fails, then add -v 9 to get logging info.

fazlk28 · 2021-02-18T11:08:46Z

[root@milan-installer ign_mani_files]# oc get clusterversion --kubeconfig=auth/kubeconfig -v 9
I0218 06:06:43.254663   10773 loader.go:375] Config loaded from file:  auth/kubeconfig
I0218 06:06:43.255573   10773 round_trippers.go:423] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: oc/4.6.0 (linux/amd64) kubernete                                                                            s/18d7461" 'https://api.milan46.conlab.ocp:6443/api?timeout=32s'
I0218 06:06:48.334429   10773 round_trippers.go:443] GET https://api.milan46.conlab.ocp:6443/api?timeout=32s  in 5078 milliseconds
I0218 06:06:48.334480   10773 round_trippers.go:449] Response Headers:
I0218 06:06:48.334623   10773 cached_discovery.go:121] skipped caching discovery info due to Get "https://api.milan46.conlab.ocp:6443/api?timeout=32s": Service Unavailable
I0218 06:06:48.335439   10773 round_trippers.go:423] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: oc/4.6.0 (linux/amd64) kubernetes/18d7461" 'https://api.milan46.conlab.ocp:6443/api?timeout=32s'
I0218 06:06:53.416962   10773 round_trippers.go:443] GET https://api.milan46.conlab.ocp:6443/api?timeout=32s  in 5081 milliseconds
I0218 06:06:53.417021   10773 round_trippers.go:449] Response Headers:
I0218 06:06:53.417107   10773 cached_discovery.go:121] skipped caching discovery info due to Get "https://api.milan46.conlab.ocp:6443/api?timeout=32s": Service Unavailable
I0218 06:06:53.417196   10773 shortcut.go:89] Error loading discovery information: Get "https://api.milan46.conlab.ocp:6443/api?timeout=32s": Service Unavailable
I0218 06:06:53.417376   10773 round_trippers.go:423] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: oc/4.6.0 (linux/amd64) kubernetes/18d7461" 'https://api.milan46.conlab.ocp:6443/api?timeout=32s'
I0218 06:06:58.497452   10773 round_trippers.go:443] GET https://api.milan46.conlab.ocp:6443/api?timeout=32s  in 5080 milliseconds
I0218 06:06:58.497516   10773 round_trippers.go:449] Response Headers:
I0218 06:06:58.497659   10773 cached_discovery.go:121] skipped caching discovery info due to Get "https://api.milan46.conlab.ocp:6443/api?timeout=32s": Service Unavailable
I0218 06:06:58.497969   10773 round_trippers.go:423] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: oc/4.6.0 (linux/amd64) kubernetes/18d7461" 'https://api.milan46.conlab.ocp:6443/api?timeout=32s'
I0218 06:07:03.577216   10773 round_trippers.go:443] GET https://api.milan46.conlab.ocp:6443/api?timeout=32s  in 5079 milliseconds
I0218 06:07:03.577262   10773 round_trippers.go:449] Response Headers:
I0218 06:07:03.577353   10773 cached_discovery.go:121] skipped caching discovery info due to Get "https://api.milan46.conlab.ocp:6443/api?timeout=32s": Service Unavailable
I0218 06:07:03.577535   10773 round_trippers.go:423] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: oc/4.6.0 (linux/amd64) kubernetes/18d7461" 'https://api.milan46.conlab.ocp:6443/api?timeout=32s'
I0218 06:07:08.657093   10773 round_trippers.go:443] GET https://api.milan46.conlab.ocp:6443/api?timeout=32s  in 5079 milliseconds
I0218 06:07:08.657140   10773 round_trippers.go:449] Response Headers:
I0218 06:07:08.657239   10773 cached_discovery.go:121] skipped caching discovery info due to Get "https://api.milan46.conlab.ocp:6443/api?timeout=32s": Service Unavailable
I0218 06:07:08.657464   10773 helpers.go:234] Connection error: Get https://api.milan46.conlab.ocp:6443/api?timeout=32s: Service Unavailable
F0218 06:07:08.657529   10773 helpers.go:115] Unable to connect to the server: Service Unavailable
goroutine 1 [running]:
k8s.io/klog/v2.stacks(0xc000010001, 0xc001737e00, 0x63, 0x1f9)
        /go/src/github.com/openshift/oc/vendor/k8s.io/klog/v2/klog.go:996 +0xb9
k8s.io/klog/v2.(*loggingT).output(0x4d37f20, 0xc000000003, 0x0, 0x0, 0xc001090af0, 0x48b6d30, 0xa, 0x73, 0x41d400)
        /go/src/github.com/openshift/oc/vendor/k8s.io/klog/v2/klog.go:945 +0x191
k8s.io/klog/v2.(*loggingT).printDepth(0x4d37f20, 0x3, 0x0, 0x0, 0x2, 0xc00179f9e8, 0x1, 0x1)
        /go/src/github.com/openshift/oc/vendor/k8s.io/klog/v2/klog.go:718 +0x165
k8s.io/klog/v2.FatalDepth(...)
        /go/src/github.com/openshift/oc/vendor/k8s.io/klog/v2/klog.go:1449
k8s.io/kubectl/pkg/cmd/util.fatal(0xc0010a5080, 0x34, 0x1)
        /go/src/github.com/openshift/oc/vendor/k8s.io/kubectl/pkg/cmd/util/helpers.go:93 +0x1f0
k8s.io/kubectl/pkg/cmd/util.checkErr(0x348a0e0, 0xc00111ff80, 0x31cf1f0)
        /go/src/github.com/openshift/oc/vendor/k8s.io/kubectl/pkg/cmd/util/helpers.go:188 +0x945
k8s.io/kubectl/pkg/cmd/util.CheckErr(...)
        /go/src/github.com/openshift/oc/vendor/k8s.io/kubectl/pkg/cmd/util/helpers.go:115
k8s.io/kubectl/pkg/cmd/get.NewCmdGet.func1(0xc001571340, 0xc000934000, 0x1, 0x4)
        /go/src/github.com/openshift/oc/vendor/k8s.io/kubectl/pkg/cmd/get/get.go:167 +0x159
github.com/spf13/cobra.(*Command).execute(0xc001571340, 0xc0016fdfc0, 0x4, 0x4, 0xc001571340, 0xc0016fdfc0)
        /go/src/github.com/openshift/oc/vendor/github.com/spf13/cobra/command.go:846 +0x2c2
github.com/spf13/cobra.(*Command).ExecuteC(0xc0012f1340, 0x2, 0xc0012f1340, 0x2)
        /go/src/github.com/openshift/oc/vendor/github.com/spf13/cobra/command.go:950 +0x375
github.com/spf13/cobra.(*Command).Execute(...)
        /go/src/github.com/openshift/oc/vendor/github.com/spf13/cobra/command.go:887
main.main()
        /go/src/github.com/openshift/oc/cmd/oc/oc.go:110 +0x885

goroutine 6 [chan receive]:
k8s.io/klog/v2.(*loggingT).flushDaemon(0x4d37f20)
        /go/src/github.com/openshift/oc/vendor/k8s.io/klog/v2/klog.go:1131 +0x8b
created by k8s.io/klog/v2.init.0
        /go/src/github.com/openshift/oc/vendor/k8s.io/klog/v2/klog.go:416 +0xd8

goroutine 9 [chan receive]:
k8s.io/klog.(*loggingT).flushDaemon(0x4d37e40)
        /go/src/github.com/openshift/oc/vendor/k8s.io/klog/klog.go:1010 +0x8b
created by k8s.io/klog.init.0
        /go/src/github.com/openshift/oc/vendor/k8s.io/klog/klog.go:411 +0xd8

goroutine 20 [select]:

staebler · 2021-02-18T16:24:19Z

Try the following where you connect directly to one of the masters instead of going through your load balancer.

oc get clusterversion --kubeconfig=auth/kubeconfig --server=10.157.66.243:6443

Your load balancer is the one that is returning the 503 Service Unavailable. It has not found a backend server that it thinks is available. You are missing the configuration in your haproxy openshift-api-server backend to perform the health check against the /readyz endpoint. Instead you have the load balancer configured to do an HTTP check against the / endpoint, and that endpoint does not accept HTTP connections. Add the following to your openshift-api-server backend configuration.

 option httpchk GET /readyz

fazlk28 · 2021-02-18T16:59:43Z

Thanks for looking into it. It is surprising to know that it is missing in the haproxy configuration.
I used this same haproxyy to configure a openshift 4.6 cluster in another network and it worked fine.

let me update the haproxy and try again.
Have updated the config, It would be good if you could verify once

[root@milan-proxy ~]# cat /etc/haproxy/haproxy.cfg
defaults
 mode  http
 log global
 option  httplog
 option  dontlognull
 option forwardfor except 127.0.0.0/8
 option  redispatch
 retries 3
 timeout http-request  10s
 timeout queue 1m
 timeout connect 10s
 timeout client  300s
 timeout server  300s
 timeout http-keep-alive 10s
 timeout check 10s
 maxconn 20000
listen stats
 bind :9000
 mode http
 stats enable
 stats uri /
frontend openshift-api-server
 bind *:6443
 default_backend openshift-api-server
 mode tcp
 option tcplog
backend openshift-api-server
 balance source
 mode tcp
 option httpchk GET /readyz
 server  milan-bootstrap 10.157.66.242:6443 check
 server  milan-m0 10.157.66.243:6443 check
 server  milan-m1 10.157.66.244:6443 check
 server  milan-m2 10.157.66.245:6443 check
frontend machine-config-server
 bind *:22623
 default_backend machine-config-server
 mode tcp
 option tcplog
backend machine-config-server
 balance source
 mode tcp
 server  milan-bootstrap 10.157.66.242:22623 check
 server  milan-m0 10.157.66.243:22623 check
 server  milan-m1 10.157.66.244:22623 check
 server  milan-m2 10.157.66.245:22623 check
frontend ingress-http
 bind *:80
 default_backend ingress-http
 mode tcp
 option tcplog
backend ingress-http
 balance source
 mode tcp
 server milan-w0 10.157.66.246:80 check
 server milan-w1 10.157.66.247:80 check
frontend ingress-https
 bind *:443
 default_backend ingress-https
 mode tcp
 option tcplog
backend ingress-https
 balance source
 mode tcp
 server milan-w0 10.157.66.246:443 check
 server milan-w1 10.157.66.247:443 check

staebler · 2021-02-18T17:19:40Z

I am not terribly versed in haproxy, but that seems right on a cursory glance.

fazlk28 · 2021-02-18T18:09:05Z

Sure thanks! Will give it a try and get back to you!
Appreciate your help on looking into it!

openshift-bot · 2021-05-19T19:32:33Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2021-06-18T23:12:06Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot · 2021-07-19T01:58:01Z

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci · 2021-07-19T01:58:11Z

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

fortinj66 mentioned this issue Feb 14, 2021

[4.6, 4.7] installer bootstrap never completes although cluster install completes okd-project/okd#502

Closed

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 19, 2021

openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 18, 2021

openshift-ci bot closed this as completed Jul 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FATAL failed waiting for Kubernetes API: Get "https://api.milan46.conlab.ocp:6443/version?timeout=32s": Service Unavailable #4643

FATAL failed waiting for Kubernetes API: Get "https://api.milan46.conlab.ocp:6443/version?timeout=32s": Service Unavailable #4643

fazlk28 commented Feb 11, 2021

staebler commented Feb 15, 2021 •

edited

Loading

fazlk28 commented Feb 15, 2021 via email

staebler commented Feb 15, 2021

fortinj66 commented Feb 15, 2021 •

edited

Loading

fazlk28 commented Feb 16, 2021 •

edited

Loading

fazlk28 commented Feb 16, 2021

staebler commented Feb 16, 2021

fazlk28 commented Feb 16, 2021

staebler commented Feb 16, 2021

fazlk28 commented Feb 17, 2021

staebler commented Feb 17, 2021

fazlk28 commented Feb 17, 2021

staebler commented Feb 17, 2021

fazlk28 commented Feb 18, 2021

staebler commented Feb 18, 2021

fazlk28 commented Feb 18, 2021

staebler commented Feb 18, 2021

fazlk28 commented Feb 18, 2021

openshift-bot commented May 19, 2021

openshift-bot commented Jun 18, 2021

openshift-bot commented Jul 19, 2021

openshift-ci bot commented Jul 19, 2021

FATAL failed waiting for Kubernetes API: Get "https://api.milan46.conlab.ocp:6443/version?timeout=32s": Service Unavailable #4643

FATAL failed waiting for Kubernetes API: Get "https://api.milan46.conlab.ocp:6443/version?timeout=32s": Service Unavailable #4643

Comments

fazlk28 commented Feb 11, 2021

Bootstrap machine logs:

Wait for bootstrap logs:

Version

Platform:

What happened?

Wait for bootstrap logs:

Bootstrap machine logs:

What you expected to happen?

How to reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

References

staebler commented Feb 15, 2021 • edited Loading

fazlk28 commented Feb 15, 2021 via email

staebler commented Feb 15, 2021

fortinj66 commented Feb 15, 2021 • edited Loading

fazlk28 commented Feb 16, 2021 • edited Loading

fazlk28 commented Feb 16, 2021

staebler commented Feb 16, 2021

fazlk28 commented Feb 16, 2021

staebler commented Feb 16, 2021

fazlk28 commented Feb 17, 2021

================= haproxy config:

staebler commented Feb 17, 2021

fazlk28 commented Feb 17, 2021

staebler commented Feb 17, 2021

fazlk28 commented Feb 18, 2021

staebler commented Feb 18, 2021

fazlk28 commented Feb 18, 2021

staebler commented Feb 18, 2021

fazlk28 commented Feb 18, 2021

openshift-bot commented May 19, 2021

openshift-bot commented Jun 18, 2021

openshift-bot commented Jul 19, 2021

openshift-ci bot commented Jul 19, 2021

staebler commented Feb 15, 2021 •

edited

Loading

fortinj66 commented Feb 15, 2021 •

edited

Loading

fazlk28 commented Feb 16, 2021 •

edited

Loading

=================
haproxy config: