Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster api having networking issue while using docker provider #7330

Closed
umesh168 opened this issue Oct 3, 2022 · 18 comments
Closed

cluster api having networking issue while using docker provider #7330

umesh168 opened this issue Oct 3, 2022 · 18 comments
Labels
kind/support Categorizes issue or PR as a support question. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@umesh168
Copy link

umesh168 commented Oct 3, 2022

Steps to reproduce the issue
Follow cluster api quick start here

What did you expect to happen:
It show 3 replica after running command
kubectl get kubeadmcontrolplane

Anything else you would like to add:
NAMESPACE↑ NAME CLUSTER NODENAME PROVIDERID PHASE VERSION AGE │
│ default capi-quickstart-47d7b-vk7tg capi-quickstart docker:////capi-quickstart-47d7b-vk7tg Provisioned v1.25.0 3h6m │
│ default capi-quickstart-md-0-fj7rr-7d46d6c57f-6gjl4 capi-quickstart Pending v1.25.0 3h6m │
│ default capi-quickstart-md-0-fj7rr-7d46d6c57f-7c5jq capi-quickstart Pending v1.25.0 3h6m │
│ default capi-quickstart-md-0-fj7rr-7d46d6c57f-p4jb8 capi-quickstart Pending v1.25.0 3h6m

Environment:

  • Cluster-api version: HEAD
  • kind version: kind v0.16.0 go1.19.1 darwin/arm64
  • Kubernetes version: (use kubekctl version): Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.0", GitCommit:"a866cbe2e5bbaa01cfd5e969aa3e033f3282a8a2", GitTreeState:"clean", BuildDate:"2022-08-23T17:44:59Z", GoVersion:"go1.19", Compiler:"gc", Platform:"darwin/arm64"}
    Kustomize Version: v4.5.7
    Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.2", GitCommit:"5835544ca568b757a8ecae5c153f317e5736700e", GitTreeState:"clean", BuildDate:"2022-09-22T05:28:27Z", GoVersion:"go1.19.1", Compiler:"gc", Platform:"linux/arm64"}
  • OS (e.g. from /etc/os-release):Montery 12.5

/bug
When we check logs for capi-kubeadm-control-plane-controller-manager
E1003 10:22:18.784244 1 controller.go:182] "Failed to update KubeadmControlPlane Status" err="failed to create remote cluster client: error creating client and cache for remote cluster: error creating dynamic rest mapper for remote cluster \"default/capi-quickstart\": Get \"https://172.18.0.2:6443/api?timeout=10s\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)" controller="kubeadmcontrolplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="KubeadmControlPlane" kubeadmControlPlane="default/capi-quickstart-47d7b" namespace="default" name="capi-quickstart-47d7b" reconcileID=e2010afc-dac6-4acd-b60e-dc08944fb296 cluster="capi-quickstart"

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Oct 3, 2022
@k8s-ci-robot
Copy link
Contributor

@umesh168: This issue is currently awaiting triage.

If CAPI contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@killianmuldoon
Copy link
Contributor

This isn't necessarily a networking error - when the client is timing out like that it means the Cluster isn't contactable. That could be a network error, or it could mean that the target cluster isn't actually running.

Can you provide the output of docker ps? There's some additional steps to help debug / find the correct errors from the quick start at https://cluster-api.sigs.k8s.io/user/troubleshooting.html

It would be great to nail down the issue impacting you and see if there's steps that could be added to the troubleshooting guide to help with future debugging.

@umesh168
Copy link
Author

umesh168 commented Oct 3, 2022

@killianmuldoon here is the output for docker ps

umeshsonawane@UMESHs-MacBook-Pro ~ % docker ps
CONTAINER ID   IMAGE                                COMMAND                  CREATED       STATUS       PORTS                                  NAMES
48897e84aa0d   kindest/node:v1.25.0                 "/usr/local/bin/entr…"   8 hours ago   Up 8 hours   36893/tcp, 127.0.0.1:36893->6443/tcp   capi-quickstart-47d7b-vk7tg
8063eca1ea18   kindest/haproxy:v20210715-a6da3463   "haproxy -sf 7 -W -d…"   8 hours ago   Up 8 hours   41411/tcp, 0.0.0.0:41411->6443/tcp     capi-quickstart-lb
c08b3a3ba427   kindest/node:v1.25.2                 "/usr/local/bin/entr…"   9 hours ago   Up 9 hours   127.0.0.1:55130->6443/tcp              kind-control-plane

@umesh168
Copy link
Author

umesh168 commented Oct 3, 2022

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
48897e84aa0d kindest/node:v1.25.0 "/usr/local/bin/entr…" 4 hours ago Up 4 hours 36893/tcp, 127.0.0.1:36893->6443/tcp capi-quickstart-47d7b-vk7tg
8063eca1ea18 kindest/haproxy:v20210715-a6da3463 "haproxy -sf 7 -W -d…" 4 hours ago Up 4 hours 41411/tcp, 0.0.0.0:41411->6443/tcp capi-quickstart-lb
c08b3a3ba427 kindest/node:v1.25.2 "/usr/local/bin/entr…" 4 hours ago Up 4 hours 127.0.0.1:55130->6443/tcp kind-control-plane
umeshsonawane@UMESHs-MacBook-Pro ~ % docker logs capi-quickstart-47d7b-vk7tg
INFO: ensuring we can execute mount/umount even with userns-remap
INFO: remounting /sys read-only
INFO: making mounts shared
INFO: detected cgroup v2
INFO: clearing and regenerating /etc/machine-id
Initializing machine ID from random generator.
INFO: setting iptables to detected mode: legacy
INFO: Detected IPv4 address: 172.18.0.3
INFO: Detected IPv6 address: fc00:f853:ccd:e793::3
Failed to find module 'autofs4'
systemd 249.11-0ubuntu3.4 running in system mode (+PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP -LIBFDISK +PCRE2 -PWQUALITY -P11KIT -QRENCODE +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
Detected virtualization docker.
Detected architecture arm64.

Welcome to Ubuntu 22.04.1 LTS!

Queued start job for default target Graphical Interface.
[ OK ] Created slice slice used to run Kubernetes / Kubelet.
[ OK ] Created slice Slice /system/modprobe.
[ OK ] Started Dispatch Password …ts to Console Directory Watch.
[UNSUPP] Starting of Arbitrary Exec…m Automount Point unsupported.
[ OK ] Reached target Local Encrypted Volumes.
[ OK ] Reached target Path Units.
[ OK ] Reached target Slice Units.
[ OK ] Reached target Swaps.
[ OK ] Reached target Local Verity Protected Volumes.
[ OK ] Listening on Journal Audit Socket.
[ OK ] Listening on Journal Socket (/dev/log).
[ OK ] Listening on Journal Socket.
[ OK ] Reached target Socket Units.
Mounting Huge Pages File System...
Mounting Kernel Debug File System...
Mounting Kernel Trace File System...
Starting Journal Service...
Starting Create List of Static Device Nodes...
Starting Load Kernel Module configfs...
Starting Load Kernel Module fuse...
Starting Remount Root and Kernel File Systems...
Starting Apply Kernel Variables...
[ OK ] Mounted Huge Pages File System.
[ OK ] Mounted Kernel Debug File System.
[ OK ] Mounted Kernel Trace File System.
[ OK ] Finished Create List of Static Device Nodes.
modprobe@configfs.service: Deactivated successfully.
[ OK ] Finished Load Kernel Module configfs.
modprobe@fuse.service: Deactivated successfully.
[ OK ] Finished Load Kernel Module fuse.
[ OK ] Finished Remount Root and Kernel File Systems.
Mounting FUSE Control File System...
Starting Create System Users...
Starting Record System Boot/Shutdown in UTMP...
[ OK ] Started Journal Service.
[ OK ] Finished Apply Kernel Variables.
[ OK ] Mounted FUSE Control File System.
[ OK ] Finished Create System Users.
Starting Flush Journal to Persistent Storage...
Starting Create Static Device Nodes in /dev...
[ OK ] Finished Record System Boot/Shutdown in UTMP.
[ OK ] Finished Flush Journal to Persistent Storage.
[ OK ] Finished Create Static Device Nodes in /dev.
[ OK ] Reached target Preparation for Local File Systems.
[ OK ] Reached target Local File Systems.
[ OK ] Reached target System Initialization.
[ OK ] Started Daily Cleanup of Temporary Directories.
[ OK ] Reached target Basic System.
[ OK ] Reached target Timer Units.
Starting containerd container runtime...
[ OK ] Started containerd container runtime.
[ OK ] Reached target Multi-User System.
[ OK ] Reached target Graphical Interface.
Starting Record Runlevel Change in UTMP...
[ OK ] Finished Record Runlevel Change in UTMP.

@killianmuldoon
Copy link
Contributor

Can you check the status of the Machine objects? Can you see if there are errors in the logs of CAPD?

@umesh168
Copy link
Author

umesh168 commented Oct 3, 2022

@killianmuldoon here are some more details from docker desktop hxproxy not running
Screenshot 2022-10-03 at 6 43 43 PM
and logs for the same
[WARNING] 275/073913 (1) : config : missing timeouts for frontend 'controlPlane'.
2022-10-03T07:39:13.813476586Z | not properly invalid, you will certainly encounter various problems
2022-10-03T07:39:13.813483294Z | such a configuration. To fix this, please ensure that all following
2022-10-03T07:39:13.813485586Z | are set to a non-zero value: 'client', 'connect', 'server'.
[NOTICE] 275/073913 (1) : New worker #1 (7) forked
[WARNING] 275/073918 (1) : Reexecuting Master process
[NOTICE] 275/073918 (1) : haproxy version is 2.2.9-2~bpo10+1
[NOTICE] 275/073918 (1) : path to executable is /usr/sbin/haproxy
[ALERT] 275/073918 (1) : sendmsg()/writev() failed in logger #1: No such file or directory (errno=2)
[WARNING] 275/073918 (7) : Stopping frontend controlPlane in 0 ms.
[WARNING] 275/073918 (7) : Stopping backend kube-apiservers in 0 ms.
[WARNING] 275/073918 (7) : Stopping frontend GLOBAL in 0 ms.
[WARNING] 275/073918 (7) : Proxy controlPlane stopped (cumulated conns: FE: 0, BE: 0).
[WARNING] 275/073918 (7) : Proxy kube-apiservers stopped (cumulated conns: FE: 0, BE: 0).
[WARNING] 275/073918 (7) : Proxy GLOBAL stopped (cumulated conns: FE: 0, BE: 0).
[NOTICE] 275/073918 (1) : New worker #1 (25) forked
[WARNING] 275/073918 (1) : Former worker #1 (7) exited with code 0 (Exit)
[WARNING] 275/073918 (25) : Server kube-apiservers/capi-quickstart-47d7b-vk7tg is DOWN, reason: Layer4 connection problem, info: "SSL handshake failure", check duration: 0ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[ALERT] 275/073918 (25) : backend 'kube-apiservers' has no server available!
[WARNING] 275/073940 (25) : Server kube-apiservers/capi-quickstart-47d7b-vk7tg is UP, reason: Layer7 check passed, code: 200, check duration: 2ms. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.

@umesh168
Copy link
Author

umesh168 commented Oct 3, 2022

containerd.log
Oct 03 07:53:45 kind-control-plane containerd[99]: time="2022-10-03T07:53:45.389292253Z" level=info msg="RemoveContainer for "615835aa60d5a9cdded7cf75e8de9918cce71e37236ac19e6f0dce1a0add000d" returns successfully"
Oct 03 07:53:45 kind-control-plane containerd[99]: time="2022-10-03T07:53:45.389919420Z" level=error msg="ContainerStatus for "615835aa60d5a9cdded7cf75e8de9918cce71e37236ac19e6f0dce1a0add000d" failed" error="rpc error: code = NotFound desc = an error occurred when try to find container "615835aa60d5a9cdded7cf75e8de9918cce71e37236ac19e6f0dce1a0add000d": not found"

@killianmuldoon
Copy link
Contributor

I'm not sure what the issue is with the haproxy container - I've seen it fail due to resource exhaustion on Linux systems at times. Can you share what resources you're dedicating to Docker Desktop? Can you see what memory usage is like when you create a new cluster and haproxy fails?

It would also be really helpful if you could see if there's anything relevant in the capd infrastructure provider logs to help debug this issue.

@umesh168
Copy link
Author

umesh168 commented Oct 3, 2022

E1003 10:22:18.784244 1 controller.go:182] "Failed to update KubeadmControlPlane Status" err="failed to create remote cluster client: error creating client and cache for remote cluster: error creating dynamic rest mapper for remote cluster \"default/capi-quickstart\": Get \"https://172.18.0.2:6443/api?timeout=10s\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)" controller="kubeadmcontrolplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="KubeadmControlPlane" kubeadmControlPlane="default/capi-quickstart-47d7b" namespace="default" name="capi-quickstart-47d7b" reconcileID=e2010afc-dac6-4acd-b60e-dc08944fb296 cluster="capi-quickstart 

The issue is ip which is used 172.18.0.2 is not accessible.

root@kind-control-plane:/# telnet 172.18.0.2 6443 
Trying 172.18.0.2...
telnet: Unable to connect to remote host: Connection timed out```

@killianmuldoon
Copy link
Contributor

I really think the networking issue is the symptom rather than the cause here. The IP isn't reachable because the container isn't up. The first places I'd look for information are in resource usage and the CAPD logs as it really seems as if CAPD or docker are having a hard time bringing up haproxy.

To drill down on the issue it could be a good idea to run the same version of haproxy using docker run to see if you can start it and have it run correctly using the CLI. I encourage you to keep an eye on resources - in particular memory usage - when you do this.

@fabriziopandini
Copy link
Member

/kind support

from a quick look CAPD is failing to update the configuration inside the HA proxy image, CAPD logs should have an error if this is what is happening but this is a very uncommon error

@k8s-ci-robot k8s-ci-robot added the kind/support Categorizes issue or PR as a support question. label Oct 4, 2022
@killianmuldoon
Copy link
Contributor

@umesh168 did you manage to resolve this?

@umesh168
Copy link
Author

@killianmuldoon above issue has been resolved.
Thanks for you comments and suggestion.

@fabriziopandini
Copy link
Member

Happy to hear the problem has been fixed
/close

@k8s-ci-robot
Copy link
Contributor

@fabriziopandini: Closing this issue.

In response to this:

Happy to hear the problem has been fixed
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@nilsanderselde
Copy link

@killianmuldoon above issue has been resolved. Thanks for you comments and suggestion.

@umesh168 How did you end up fixing it? I'm encountering something similar.

@umesh168
Copy link
Author

umesh168 commented Nov 4, 2022

@nilsanderselde try to pull haproxy image manually.

@jingai-db
Copy link

I hit the same issue of Unable to connect to the server: net/http: TLS handshake timeout, though my 'capi-haproxy-lb' container is running properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Categorizes issue or PR as a support question. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

No branches or pull requests

6 participants