Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FATAL failed waiting for Kubernetes API: Get "https://api.milan46.conlab.ocp:6443/version?timeout=32s": Service Unavailable #4643

Closed
fazlk28 opened this issue Feb 11, 2021 · 22 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@fazlk28
Copy link

fazlk28 commented Feb 11, 2021

Trying to create Openshift 4.6 setup on baremetal using UPI method.
1 load balancer
1 bootstrap machine
3 master node
2 worker node

Bootstrap machine logs:

Wait for bootstrap logs:

Version

[root@milan-installer ocinstall]# openshift-install version
openshift-install 4.6.16
built from commit 8a1ec01353e68cb6ebb1dd890d684f885c33145a
release image quay.io/openshift-release-dev/ocp-release@sha256:3e855ad88f46ad1b7f56c312f078ca6adaba623c5d4b360143f9f82d2f349741
[root@milan-installer ocinstall]#

Platform:

What happened?

Trying to create Openshift 4.6 setup on baremetal using UPI method.
1 load balancer
1 bootstrap machine
3 master node
2 worker node

The bootstrap machine is completed successfully. But for some reason the kubernetes API service is unavailable.
Am not able to figure out why it is throwing this error since everything looks fine.
If some proxy/dns settings were misconfigured, the bootstrap would have been in completed state.
but here the proxy/dns is configured properly.
Would be glad to some help in this issue since i have been stuck in this issue for a while
Thanks in advance

Wait for bootstrap logs:

[root@milan-installer ocinstall]# ./openshift-install wait-for bootstrap-complete --log-level=debug --dir=ign_mani_files
DEBUG OpenShift Installer 4.6.16
DEBUG Built from commit 8a1ec01353e68cb6ebb1dd890d684f885c33145a
INFO Waiting up to 20m0s for the Kubernetes API at https://api.milan46.conlab.ocp:6443...
DEBUG Still waiting for the Kubernetes API: Get "https://api.milan46.conlab.ocp:6443/version?timeout=32s": Service Unavailable
DEBUG Still waiting for the Kubernetes API: Get "https://api.milan46.conlab.ocp:6443/version?timeout=32s": Service Unavailable
DEBUG Still waiting for the Kubernetes API: Get "https://api.milan46.conlab.ocp:6443/version?timeout=32s": Service Unavailable
DEBUG Still waiting for the Kubernetes API: Get "https://api.milan46.conlab.ocp:6443/version?timeout=32s": Service Unavailable
DEBUG Still waiting for the Kubernetes API: Get "https://api.milan46.conlab.ocp:6443/version?timeout=32s": Service Unavailable
DEBUG Still waiting for the Kubernetes API: Get "https://api.milan46.conlab.ocp:6443/version?timeout=32s": Service Unavailable
DEBUG Still waiting for the Kubernetes API: Get "https://api.milan46.conlab.ocp:6443/version?timeout=32s": Service Unavailable
DEBUG Still waiting for the Kubernetes API: Get "https://api.milan46.conlab.ocp:6443/version?timeout=32s": Service Unavailable
DEBUG Still waiting for the Kubernetes API: Get "https://api.milan46.conlab.ocp:6443/version?timeout=32s": Service Unavailable
DEBUG Still waiting for the Kubernetes API: Get "https://api.milan46.conlab.ocp:6443/version?timeout=32s": Service Unavailable
DEBUG Still waiting for the Kubernetes API: Get "https://api.milan46.conlab.ocp:6443/version?timeout=32s": Service Unavailable
DEBUG Still waiting for the Kubernetes API: Get "https://api.milan46.conlab.ocp:6443/version?timeout=32s": Service Unavailable
ERROR Attempted to gather ClusterOperator status after wait failure: listing ClusterOperator objects: Get "https://api.milan46.conlab.ocp:6443/apis/config.openshift.io/v1/clusteroperators": Service Unavailable
INFO Use the following commands to gather logs from the cluster
INFO openshift-install gather bootstrap --help
FATAL failed waiting for Kubernetes API: Get "https://api.milan46.conlab.ocp:6443/version?timeout=32s": Service Unavailable

Bootstrap machine logs:

Feb 11 09:04:38 milan-bootstrap bootkube.sh[18370]: Skipped "secret-initial-kube-controller-manager-service-account-private-key.yaml" secrets.v1./initial-service-account-private-key -n openshift-config as it already exists
Feb 11 09:04:38 milan-bootstrap bootkube.sh[18370]: Skipped "secret-kube-apiserver-to-kubelet-signer.yaml" secrets.v1./kube-apiserver-to-kubelet-signer -n openshift-kube-apiserver-operator as it already exists
Feb 11 09:04:38 milan-bootstrap bootkube.sh[18370]: Skipped "secret-loadbalancer-serving-signer.yaml" secrets.v1./loadbalancer-serving-signer -n openshift-kube-apiserver-operator as it already exists
Feb 11 09:04:39 milan-bootstrap bootkube.sh[18370]: Skipped "secret-localhost-serving-signer.yaml" secrets.v1./localhost-serving-signer -n openshift-kube-apiserver-operator as it already exists
Feb 11 09:04:39 milan-bootstrap bootkube.sh[18370]: Skipped "secret-service-network-serving-signer.yaml" secrets.v1./service-network-serving-signer -n openshift-kube-apiserver-operator as it already exists
Feb 11 09:04:39 milan-bootstrap bootkube.sh[18370]: Sending bootstrap-finished event.Tearing down temporary bootstrap control plane...
Feb 11 09:04:39 milan-bootstrap bootkube.sh[18370]: Waiting for CEO to finish...
Feb 11 09:04:40 milan-bootstrap bootkube.sh[18370]: W0211 09:04:40.607883       1 etcd_env.go:298] cipher is not supported for use with etcd: "TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256"
Feb 11 09:04:40 milan-bootstrap bootkube.sh[18370]: W0211 09:04:40.608310       1 etcd_env.go:298] cipher is not supported for use with etcd: "TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256"
Feb 11 09:04:40 milan-bootstrap bootkube.sh[18370]: I0211 09:04:40.634884       1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True.
Feb 11 09:06:56 milan-bootstrap bootkube.sh[18370]: I0211 09:06:56.210792       1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True.
Feb 11 09:06:59 milan-bootstrap bootkube.sh[18370]: I0211 09:06:59.610326       1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True.
Feb 11 09:10:56 milan-bootstrap bootkube.sh[18370]: I0211 09:10:56.885588       1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True.
Feb 11 09:11:00 milan-bootstrap bootkube.sh[18370]: I0211 09:11:00.088490       1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True.
Feb 11 09:11:00 milan-bootstrap bootkube.sh[18370]: I0211 09:11:00.138906       1 waitforceo.go:64] Cluster etcd operator bootstrapped successfully
Feb 11 09:11:00 milan-bootstrap bootkube.sh[18370]: I0211 09:11:00.139073       1 waitforceo.go:58] cluster-etcd-operator bootstrap etcd
Feb 11 09:11:00 milan-bootstrap bootkube.sh[18370]: bootkube.service complete

What you expected to happen?

The kubernetes API service should be available and the bootstrap complete process should proceed ahead.

How to reproduce it (as minimally and precisely as possible)?

Followed the baremetal installation steps. https://docs.openshift.com/container-platform/4.6/installing/installing_bare_metal/installing-bare-metal.html

With below test machines
1 load balancer
1 bootstrap machine
3 master node
2 worker node

$ your-commands-here

Anything else we need to know?

Enter text here.

References

  • enter text here.
@staebler
Copy link
Contributor

staebler commented Feb 15, 2021

Your issue is either with your DNS configuration for the API URL or with your load balancer. The cluster does not need the API URL in order to install successfully, which is why the cluster installation succeeds despite your not being able to access the cluster via the API URL.

@fazlk28
Copy link
Author

fazlk28 commented Feb 15, 2021 via email

@staebler
Copy link
Contributor

Also you had mentioned this,* "whatever ${CLUSTER_ETCD_OPERATOR_IMAGE}
points to, it probably needs to be updated to use different ciphers"*
Could you please be more specific on how do I check the image and where I
need to change to.
Also how do I update the ciphers.

I don't know where this advice is coming from. You do not need to do anything with the etcd operator image.

Could you please help me out here on what changes I need to do on my end?

  1. Do dig api.milan46.conlab.ocp to verify that your API URL is indeed resolving to your load balancer.
  2. Check that your load balancer is forwarding https communication to your 3 control plane nodes via https to port 6443.

@fortinj66
Copy link
Contributor

fortinj66 commented Feb 15, 2021

Also you had mentioned this,* "whatever ${CLUSTER_ETCD_OPERATOR_IMAGE}
points to, it probably needs to be updated to use different ciphers"*
Could you please be more specific on how do I check the image and where I
need to change to.
Also how do I update the ciphers.

This is my fault. I was seeing the same issue and went down a rabbit hole due to warnings about ciphers connecting to the ETCD cluster in boot-kube.sh...

I deleted the comment earlier about the ciphers...

The issue I experienced was due to issues setting up /etc/resolv.conf in a systemd-resolved configuration #4654

@fazlk28
Copy link
Author

fazlk28 commented Feb 16, 2021

@staebler,

I have the tried to use the dig command on all the nodes and below is the response,

[core@milan-m1 ~]$ dig api.milan46.conlab.ocp

; <<>> DiG 9.11.13-RedHat-9.11.13-6.el8_2.1 <<>> api.milan46.conlab.ocp
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: FORMERR, id: 26490
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: ae87454e32df5e02 (echoed)
;; QUESTION SECTION:
;api.milan46.conlab.ocp.                IN      A

;; Query time: 0 msec
;; SERVER: 10.157.65.38#53(10.157.65.38)
;; WHEN: Mon Feb 15 15:47:18 UTC 2021
;; MSG SIZE  rcvd: 63

[core@milan-m1 ~]$ ping api.milan46.conlab.ocp
PING api.milan46.conlab.ocp (10.157.66.241) 56(84) bytes of data.
64 bytes from 10.157.66.241 (10.157.66.241): icmp_seq=1 ttl=64 time=0.236 ms
64 bytes from 10.157.66.241 (10.157.66.241): icmp_seq=2 ttl=64 time=0.211 ms
64 bytes from 10.157.66.241 (10.157.66.241): icmp_seq=3 ttl=64 time=0.166 ms
^C
--- api.milan46.conlab.ocp ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 37ms
rtt min/avg/max/mdev = 0.166/0.204/0.236/0.031 ms
[core@milan-m1 ~]$

Regarding your comment,"Check that your load balancer is forwarding https communication to your 3 control plane nodes via https to port 6443."

The ports are open,

[root@milan-proxy ~]# netstat -nltupe | grep -E ':80|:443|:6443|:22623'
tcp        0      0 0.0.0.0:443             0.0.0.0:*               LISTEN      0          18318      1041/haproxy
tcp        0      0 0.0.0.0:22623           0.0.0.0:*               LISTEN      0          18316      1041/haproxy
tcp        0      0 0.0.0.0:6443            0.0.0.0:*               LISTEN      0          18315      1041/haproxy
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      0          18317      1041/haproxy
[root@milan-proxy ~]# ss -nltupe | grep -E ':80|:443|:6443|:22623'
tcp    LISTEN     0      128       *:443                   *:*                   users:(("haproxy",pid=1041,fd=8)) ino:18318 sk:ffff918077581f00 <->
tcp    LISTEN     0      128       *:22623                 *:*                   users:(("haproxy",pid=1041,fd=6)) ino:18316 sk:ffff918077580f80 <->
tcp    LISTEN     0      128       *:6443                  *:*                   users:(("haproxy",pid=1041,fd=5)) ino:18315 sk:ffff9180775807c0 <->
tcp    LISTEN     0      128       *:80                    *:*                   users:(("haproxy",pid=1041,fd=7)) ino:18317 sk:ffff918077581740 <->

Could you please let me know how to forward the ports and check? Would appreciate it.

@fazlk28
Copy link
Author

fazlk28 commented Feb 16, 2021

@staebler, updating the log bundle here.
log-bundle-20210211045227.tar.gz
)

@staebler
Copy link
Contributor

  1. Is it safe to assume that the IP address of your proxy is 10.157.66.241?
  2. It is suspect that the dig query had no answers.
  3. It looks like you are using haproxy for your load balancer. Refer to the [bare-metal installation docs] (https://docs.openshift.com/container-platform/4.6/installing/installing_bare_metal/installing-bare-metal.html#network-connectivity_installing-bare-metal) for information on what the load balancer requirements are.

@fazlk28
Copy link
Author

fazlk28 commented Feb 16, 2021

Yes that's right. It is the proxy IP.
I wonder why the dig had no answers. I was able to ping it.
I have been following this page to create the openshift cluster since 4.3 but not sure why am facing an issue now.
I believe i have configured all the pre-requisite for the load balancer.
(https://docs.openshift.com/container-platform/4.6/installing/installing_bare_metal/installing-bare-metal.html#network-connectivity_installing-bare-metal.......000000

I just wanted some help to overcome this. I will dig more on to why the dig had no answers when trying to communicate with the api.milan46.conlab.ocp

@staebler
Copy link
Contributor

Can you run curl -kv api.milan46.conlab.ocp:6443?
Can you bypass the load balancer and curl one of your control plane nodes directly on port 6443? Presumably that should work since the cluster installed successfully.
Can you provide your haproxy config?

@fazlk28
Copy link
Author

fazlk28 commented Feb 17, 2021

Here is a output of curl command from one of the control plane node

[core@milan-m0 ~]$ curl -kv api.milan46.conlab.ocp:6443
* Rebuilt URL to: api.milan46.conlab.ocp:6443/
*   Trying 10.157.66.241...
* TCP_NODELAY set
* Connected to api.milan46.conlab.ocp (10.157.66.241) port 6443 (#0)
> GET / HTTP/1.1
> Host: api.milan46.conlab.ocp:6443
> User-Agent: curl/7.61.1
> Accept: */*
>
* HTTP 1.0, assume close after body
< HTTP/1.0 400 Bad Request
<
Client sent an HTTP request to an HTTPS server.
* Closing connection 0
[core@milan-m0 ~]$

=================
haproxy config:

[root@milan-proxy ~]# cat /etc/haproxy/haproxy.cfg
defaults
 mode  http
 log global
 option  httplog
 option  dontlognull
 option forwardfor except 127.0.0.0/8
 option  redispatch
 retries 3
 timeout http-request  10s
 timeout queue 1m
 timeout connect 10s
 timeout client  300s
 timeout server  300s
 timeout http-keep-alive 10s
 timeout check 10s
 maxconn 20000
listen stats
 bind :9000
 mode http
 stats enable
 stats uri /
frontend openshift-api-server
 bind *:6443
 default_backend openshift-api-server
 mode tcp
 option tcplog
backend openshift-api-server
 balance source
 mode tcp
 server  milan-bootstrap 10.157.66.242:6443 check
 server  milan-m0 10.157.66.243:6443 check
 server  milan-m1 10.157.66.244:6443 check
 server  milan-m2 10.157.66.245:6443 check
frontend machine-config-server
 bind *:22623
 default_backend machine-config-server
 mode tcp
 option tcplog
backend machine-config-server
 balance source
 mode tcp
 server  milan-bootstrap 10.157.66.242:22623 check
 server  milan-m0 10.157.66.243:22623 check
 server  milan-m1 10.157.66.244:22623 check
 server  milan-m2 10.157.66.245:22623 check
frontend ingress-http
 bind *:80
 default_backend ingress-http
 mode tcp
 option tcplog
backend ingress-http
 balance source
 mode tcp
 server milan-w0 10.157.66.246:80 check
 server milan-w1 10.157.66.247:80 check
frontend ingress-https
 bind *:443
 default_backend ingress-https
 mode tcp
 option tcplog
backend ingress-https
 balance source
 mode tcp
 server milan-w0 10.157.66.246:443 check
 server milan-w1 10.157.66.247:443 check

@staebler
Copy link
Contributor

Sorry, wrong curl command. It should have been curl -kv https://api.milan46.conlab.ocp:6443.

@fazlk28
Copy link
Author

fazlk28 commented Feb 17, 2021

Here is the output of curl command:

[core@milan-m0 ~]$ curl -kv https://api.milan46.conlab.ocp:6443
* Rebuilt URL to: https://api.milan46.conlab.ocp:6443/
*   Trying 10.157.66.241...
* TCP_NODELAY set
* Connected to api.milan46.conlab.ocp (10.157.66.241) port 6443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Request CERT (13):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, [no content] (0):
* TLSv1.3 (OUT), TLS handshake, Certificate (11):
* TLSv1.3 (OUT), TLS handshake, [no content] (0):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=api.milan46.conlab.ocp
*  start date: Feb 11 08:46:43 2021 GMT
*  expire date: Mar 13 08:46:44 2021 GMT
*  issuer: OU=openshift; CN=kube-apiserver-lb-signer
*  SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* TLSv1.3 (OUT), TLS app data, [no content] (0):
* TLSv1.3 (OUT), TLS app data, [no content] (0):
* TLSv1.3 (OUT), TLS app data, [no content] (0):
* Using Stream ID: 1 (easy handle 0x563838816740)
* TLSv1.3 (OUT), TLS app data, [no content] (0):
> GET / HTTP/2
> Host: api.milan46.conlab.ocp:6443
> User-Agent: curl/7.61.1
> Accept: */*
>
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS app data, [no content] (0):
* Connection state changed (MAX_CONCURRENT_STREAMS == 2000)!
* TLSv1.3 (OUT), TLS app data, [no content] (0):
* TLSv1.3 (IN), TLS app data, [no content] (0):
* TLSv1.3 (IN), TLS app data, [no content] (0):
* TLSv1.3 (IN), TLS app data, [no content] (0):
< HTTP/2 403
< audit-id: 48765dc5-61e8-41dd-8e65-e574a56ed51f
< cache-control: no-cache, private
< content-type: application/json
< x-content-type-options: nosniff
< x-kubernetes-pf-flowschema-uid: 2bb8c895-c824-45f9-9aff-b0044183bf2d
< x-kubernetes-pf-prioritylevel-uid: 5abba096-30ed-499d-a5fa-3d0f58d2a249
< content-length: 233
< date: Wed, 17 Feb 2021 18:30:22 GMT
<
* TLSv1.3 (IN), TLS app data, [no content] (0):
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {

  },
  "status": "Failure",
  "message": "forbidden: User \"system:anonymous\" cannot get path \"/\"",
  "reason": "Forbidden",
  "details": {

  },
  "code": 403
* Connection #0 to host api.milan46.conlab.ocp left intact
}[core@milan-m0 ~]$

@staebler
Copy link
Contributor

OK. The curl command made it all the way to the api server and back. Can you run oc get clusterversion --kubeconfig=auth/kubeconfig? If that fails, then add -v 9 to get logging info.

@fazlk28
Copy link
Author

fazlk28 commented Feb 18, 2021

[root@milan-installer ign_mani_files]# oc get clusterversion --kubeconfig=auth/kubeconfig -v 9
I0218 06:06:43.254663   10773 loader.go:375] Config loaded from file:  auth/kubeconfig
I0218 06:06:43.255573   10773 round_trippers.go:423] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: oc/4.6.0 (linux/amd64) kubernete                                                                            s/18d7461" 'https://api.milan46.conlab.ocp:6443/api?timeout=32s'
I0218 06:06:48.334429   10773 round_trippers.go:443] GET https://api.milan46.conlab.ocp:6443/api?timeout=32s  in 5078 milliseconds
I0218 06:06:48.334480   10773 round_trippers.go:449] Response Headers:
I0218 06:06:48.334623   10773 cached_discovery.go:121] skipped caching discovery info due to Get "https://api.milan46.conlab.ocp:6443/api?timeout=32s": Service Unavailable
I0218 06:06:48.335439   10773 round_trippers.go:423] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: oc/4.6.0 (linux/amd64) kubernetes/18d7461" 'https://api.milan46.conlab.ocp:6443/api?timeout=32s'
I0218 06:06:53.416962   10773 round_trippers.go:443] GET https://api.milan46.conlab.ocp:6443/api?timeout=32s  in 5081 milliseconds
I0218 06:06:53.417021   10773 round_trippers.go:449] Response Headers:
I0218 06:06:53.417107   10773 cached_discovery.go:121] skipped caching discovery info due to Get "https://api.milan46.conlab.ocp:6443/api?timeout=32s": Service Unavailable
I0218 06:06:53.417196   10773 shortcut.go:89] Error loading discovery information: Get "https://api.milan46.conlab.ocp:6443/api?timeout=32s": Service Unavailable
I0218 06:06:53.417376   10773 round_trippers.go:423] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: oc/4.6.0 (linux/amd64) kubernetes/18d7461" 'https://api.milan46.conlab.ocp:6443/api?timeout=32s'
I0218 06:06:58.497452   10773 round_trippers.go:443] GET https://api.milan46.conlab.ocp:6443/api?timeout=32s  in 5080 milliseconds
I0218 06:06:58.497516   10773 round_trippers.go:449] Response Headers:
I0218 06:06:58.497659   10773 cached_discovery.go:121] skipped caching discovery info due to Get "https://api.milan46.conlab.ocp:6443/api?timeout=32s": Service Unavailable
I0218 06:06:58.497969   10773 round_trippers.go:423] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: oc/4.6.0 (linux/amd64) kubernetes/18d7461" 'https://api.milan46.conlab.ocp:6443/api?timeout=32s'
I0218 06:07:03.577216   10773 round_trippers.go:443] GET https://api.milan46.conlab.ocp:6443/api?timeout=32s  in 5079 milliseconds
I0218 06:07:03.577262   10773 round_trippers.go:449] Response Headers:
I0218 06:07:03.577353   10773 cached_discovery.go:121] skipped caching discovery info due to Get "https://api.milan46.conlab.ocp:6443/api?timeout=32s": Service Unavailable
I0218 06:07:03.577535   10773 round_trippers.go:423] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: oc/4.6.0 (linux/amd64) kubernetes/18d7461" 'https://api.milan46.conlab.ocp:6443/api?timeout=32s'
I0218 06:07:08.657093   10773 round_trippers.go:443] GET https://api.milan46.conlab.ocp:6443/api?timeout=32s  in 5079 milliseconds
I0218 06:07:08.657140   10773 round_trippers.go:449] Response Headers:
I0218 06:07:08.657239   10773 cached_discovery.go:121] skipped caching discovery info due to Get "https://api.milan46.conlab.ocp:6443/api?timeout=32s": Service Unavailable
I0218 06:07:08.657464   10773 helpers.go:234] Connection error: Get https://api.milan46.conlab.ocp:6443/api?timeout=32s: Service Unavailable
F0218 06:07:08.657529   10773 helpers.go:115] Unable to connect to the server: Service Unavailable
goroutine 1 [running]:
k8s.io/klog/v2.stacks(0xc000010001, 0xc001737e00, 0x63, 0x1f9)
        /go/src/github.com/openshift/oc/vendor/k8s.io/klog/v2/klog.go:996 +0xb9
k8s.io/klog/v2.(*loggingT).output(0x4d37f20, 0xc000000003, 0x0, 0x0, 0xc001090af0, 0x48b6d30, 0xa, 0x73, 0x41d400)
        /go/src/github.com/openshift/oc/vendor/k8s.io/klog/v2/klog.go:945 +0x191
k8s.io/klog/v2.(*loggingT).printDepth(0x4d37f20, 0x3, 0x0, 0x0, 0x2, 0xc00179f9e8, 0x1, 0x1)
        /go/src/github.com/openshift/oc/vendor/k8s.io/klog/v2/klog.go:718 +0x165
k8s.io/klog/v2.FatalDepth(...)
        /go/src/github.com/openshift/oc/vendor/k8s.io/klog/v2/klog.go:1449
k8s.io/kubectl/pkg/cmd/util.fatal(0xc0010a5080, 0x34, 0x1)
        /go/src/github.com/openshift/oc/vendor/k8s.io/kubectl/pkg/cmd/util/helpers.go:93 +0x1f0
k8s.io/kubectl/pkg/cmd/util.checkErr(0x348a0e0, 0xc00111ff80, 0x31cf1f0)
        /go/src/github.com/openshift/oc/vendor/k8s.io/kubectl/pkg/cmd/util/helpers.go:188 +0x945
k8s.io/kubectl/pkg/cmd/util.CheckErr(...)
        /go/src/github.com/openshift/oc/vendor/k8s.io/kubectl/pkg/cmd/util/helpers.go:115
k8s.io/kubectl/pkg/cmd/get.NewCmdGet.func1(0xc001571340, 0xc000934000, 0x1, 0x4)
        /go/src/github.com/openshift/oc/vendor/k8s.io/kubectl/pkg/cmd/get/get.go:167 +0x159
github.com/spf13/cobra.(*Command).execute(0xc001571340, 0xc0016fdfc0, 0x4, 0x4, 0xc001571340, 0xc0016fdfc0)
        /go/src/github.com/openshift/oc/vendor/github.com/spf13/cobra/command.go:846 +0x2c2
github.com/spf13/cobra.(*Command).ExecuteC(0xc0012f1340, 0x2, 0xc0012f1340, 0x2)
        /go/src/github.com/openshift/oc/vendor/github.com/spf13/cobra/command.go:950 +0x375
github.com/spf13/cobra.(*Command).Execute(...)
        /go/src/github.com/openshift/oc/vendor/github.com/spf13/cobra/command.go:887
main.main()
        /go/src/github.com/openshift/oc/cmd/oc/oc.go:110 +0x885

goroutine 6 [chan receive]:
k8s.io/klog/v2.(*loggingT).flushDaemon(0x4d37f20)
        /go/src/github.com/openshift/oc/vendor/k8s.io/klog/v2/klog.go:1131 +0x8b
created by k8s.io/klog/v2.init.0
        /go/src/github.com/openshift/oc/vendor/k8s.io/klog/v2/klog.go:416 +0xd8

goroutine 9 [chan receive]:
k8s.io/klog.(*loggingT).flushDaemon(0x4d37e40)
        /go/src/github.com/openshift/oc/vendor/k8s.io/klog/klog.go:1010 +0x8b
created by k8s.io/klog.init.0
        /go/src/github.com/openshift/oc/vendor/k8s.io/klog/klog.go:411 +0xd8

goroutine 20 [select]:

@staebler
Copy link
Contributor

Try the following where you connect directly to one of the masters instead of going through your load balancer.

oc get clusterversion --kubeconfig=auth/kubeconfig --server=10.157.66.243:6443

Your load balancer is the one that is returning the 503 Service Unavailable. It has not found a backend server that it thinks is available. You are missing the configuration in your haproxy openshift-api-server backend to perform the health check against the /readyz endpoint. Instead you have the load balancer configured to do an HTTP check against the / endpoint, and that endpoint does not accept HTTP connections. Add the following to your openshift-api-server backend configuration.

 option httpchk GET /readyz

@fazlk28
Copy link
Author

fazlk28 commented Feb 18, 2021

Thanks for looking into it. It is surprising to know that it is missing in the haproxy configuration.
I used this same haproxyy to configure a openshift 4.6 cluster in another network and it worked fine.

let me update the haproxy and try again.
Have updated the config, It would be good if you could verify once

[root@milan-proxy ~]# cat /etc/haproxy/haproxy.cfg
defaults
 mode  http
 log global
 option  httplog
 option  dontlognull
 option forwardfor except 127.0.0.0/8
 option  redispatch
 retries 3
 timeout http-request  10s
 timeout queue 1m
 timeout connect 10s
 timeout client  300s
 timeout server  300s
 timeout http-keep-alive 10s
 timeout check 10s
 maxconn 20000
listen stats
 bind :9000
 mode http
 stats enable
 stats uri /
frontend openshift-api-server
 bind *:6443
 default_backend openshift-api-server
 mode tcp
 option tcplog
backend openshift-api-server
 balance source
 mode tcp
 option httpchk GET /readyz
 server  milan-bootstrap 10.157.66.242:6443 check
 server  milan-m0 10.157.66.243:6443 check
 server  milan-m1 10.157.66.244:6443 check
 server  milan-m2 10.157.66.245:6443 check
frontend machine-config-server
 bind *:22623
 default_backend machine-config-server
 mode tcp
 option tcplog
backend machine-config-server
 balance source
 mode tcp
 server  milan-bootstrap 10.157.66.242:22623 check
 server  milan-m0 10.157.66.243:22623 check
 server  milan-m1 10.157.66.244:22623 check
 server  milan-m2 10.157.66.245:22623 check
frontend ingress-http
 bind *:80
 default_backend ingress-http
 mode tcp
 option tcplog
backend ingress-http
 balance source
 mode tcp
 server milan-w0 10.157.66.246:80 check
 server milan-w1 10.157.66.247:80 check
frontend ingress-https
 bind *:443
 default_backend ingress-https
 mode tcp
 option tcplog
backend ingress-https
 balance source
 mode tcp
 server milan-w0 10.157.66.246:443 check
 server milan-w1 10.157.66.247:443 check


@staebler
Copy link
Contributor

I am not terribly versed in haproxy, but that seems right on a cursory glance.

@fazlk28
Copy link
Author

fazlk28 commented Feb 18, 2021

Sure thanks! Will give it a try and get back to you!
Appreciate your help on looking into it!

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 19, 2021
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 18, 2021
@openshift-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 19, 2021

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot closed this as completed Jul 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

4 participants