cluster api having networking issue while using docker provider #7330

umesh168 · 2022-10-03T10:51:58Z

Steps to reproduce the issue
Follow cluster api quick start here

What did you expect to happen:
It show 3 replica after running command
kubectl get kubeadmcontrolplane

Anything else you would like to add:
NAMESPACE↑ NAME CLUSTER NODENAME PROVIDERID PHASE VERSION AGE │
│ default capi-quickstart-47d7b-vk7tg capi-quickstart docker:////capi-quickstart-47d7b-vk7tg Provisioned v1.25.0 3h6m │
│ default capi-quickstart-md-0-fj7rr-7d46d6c57f-6gjl4 capi-quickstart Pending v1.25.0 3h6m │
│ default capi-quickstart-md-0-fj7rr-7d46d6c57f-7c5jq capi-quickstart Pending v1.25.0 3h6m │
│ default capi-quickstart-md-0-fj7rr-7d46d6c57f-p4jb8 capi-quickstart Pending v1.25.0 3h6m

Environment:

Cluster-api version: HEAD
kind version: kind v0.16.0 go1.19.1 darwin/arm64
Kubernetes version: (use kubekctl version): Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.0", GitCommit:"a866cbe2e5bbaa01cfd5e969aa3e033f3282a8a2", GitTreeState:"clean", BuildDate:"2022-08-23T17:44:59Z", GoVersion:"go1.19", Compiler:"gc", Platform:"darwin/arm64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.2", GitCommit:"5835544ca568b757a8ecae5c153f317e5736700e", GitTreeState:"clean", BuildDate:"2022-09-22T05:28:27Z", GoVersion:"go1.19.1", Compiler:"gc", Platform:"linux/arm64"}
OS (e.g. from /etc/os-release):Montery 12.5

/bug
When we check logs for capi-kubeadm-control-plane-controller-manager
E1003 10:22:18.784244 1 controller.go:182] "Failed to update KubeadmControlPlane Status" err="failed to create remote cluster client: error creating client and cache for remote cluster: error creating dynamic rest mapper for remote cluster \"default/capi-quickstart\": Get \"https://172.18.0.2:6443/api?timeout=10s\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)" controller="kubeadmcontrolplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="KubeadmControlPlane" kubeadmControlPlane="default/capi-quickstart-47d7b" namespace="default" name="capi-quickstart-47d7b" reconcileID=e2010afc-dac6-4acd-b60e-dc08944fb296 cluster="capi-quickstart"

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2022-10-03T10:52:05Z

@umesh168: This issue is currently awaiting triage.

If CAPI contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

killianmuldoon · 2022-10-03T10:59:30Z

This isn't necessarily a networking error - when the client is timing out like that it means the Cluster isn't contactable. That could be a network error, or it could mean that the target cluster isn't actually running.

Can you provide the output of docker ps? There's some additional steps to help debug / find the correct errors from the quick start at https://cluster-api.sigs.k8s.io/user/troubleshooting.html

It would be great to nail down the issue impacting you and see if there's steps that could be added to the troubleshooting guide to help with future debugging.

umesh168 · 2022-10-03T11:03:00Z

@killianmuldoon here is the output for docker ps

umeshsonawane@UMESHs-MacBook-Pro ~ % docker ps
CONTAINER ID   IMAGE                                COMMAND                  CREATED       STATUS       PORTS                                  NAMES
48897e84aa0d   kindest/node:v1.25.0                 "/usr/local/bin/entr…"   8 hours ago   Up 8 hours   36893/tcp, 127.0.0.1:36893->6443/tcp   capi-quickstart-47d7b-vk7tg
8063eca1ea18   kindest/haproxy:v20210715-a6da3463   "haproxy -sf 7 -W -d…"   8 hours ago   Up 8 hours   41411/tcp, 0.0.0.0:41411->6443/tcp     capi-quickstart-lb
c08b3a3ba427   kindest/node:v1.25.2                 "/usr/local/bin/entr…"   9 hours ago   Up 9 hours   127.0.0.1:55130->6443/tcp              kind-control-plane

umesh168 · 2022-10-03T12:09:05Z

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
48897e84aa0d kindest/node:v1.25.0 "/usr/local/bin/entr…" 4 hours ago Up 4 hours 36893/tcp, 127.0.0.1:36893->6443/tcp capi-quickstart-47d7b-vk7tg
8063eca1ea18 kindest/haproxy:v20210715-a6da3463 "haproxy -sf 7 -W -d…" 4 hours ago Up 4 hours 41411/tcp, 0.0.0.0:41411->6443/tcp capi-quickstart-lb
c08b3a3ba427 kindest/node:v1.25.2 "/usr/local/bin/entr…" 4 hours ago Up 4 hours 127.0.0.1:55130->6443/tcp kind-control-plane
umeshsonawane@UMESHs-MacBook-Pro ~ % docker logs capi-quickstart-47d7b-vk7tg
INFO: ensuring we can execute mount/umount even with userns-remap
INFO: remounting /sys read-only
INFO: making mounts shared
INFO: detected cgroup v2
INFO: clearing and regenerating /etc/machine-id
Initializing machine ID from random generator.
INFO: setting iptables to detected mode: legacy
INFO: Detected IPv4 address: 172.18.0.3
INFO: Detected IPv6 address: fc00:f853:ccd:e793::3
Failed to find module 'autofs4'
systemd 249.11-0ubuntu3.4 running in system mode (+PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP -LIBFDISK +PCRE2 -PWQUALITY -P11KIT -QRENCODE +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
Detected virtualization docker.
Detected architecture arm64.

Welcome to Ubuntu 22.04.1 LTS!

Queued start job for default target Graphical Interface.
[ OK ] Created slice slice used to run Kubernetes / Kubelet.
[ OK ] Created slice Slice /system/modprobe.
[ OK ] Started Dispatch Password …ts to Console Directory Watch.
[UNSUPP] Starting of Arbitrary Exec…m Automount Point unsupported.
[ OK ] Reached target Local Encrypted Volumes.
[ OK ] Reached target Path Units.
[ OK ] Reached target Slice Units.
[ OK ] Reached target Swaps.
[ OK ] Reached target Local Verity Protected Volumes.
[ OK ] Listening on Journal Audit Socket.
[ OK ] Listening on Journal Socket (/dev/log).
[ OK ] Listening on Journal Socket.
[ OK ] Reached target Socket Units.
Mounting Huge Pages File System...
Mounting Kernel Debug File System...
Mounting Kernel Trace File System...
Starting Journal Service...
Starting Create List of Static Device Nodes...
Starting Load Kernel Module configfs...
Starting Load Kernel Module fuse...
Starting Remount Root and Kernel File Systems...
Starting Apply Kernel Variables...
[ OK ] Mounted Huge Pages File System.
[ OK ] Mounted Kernel Debug File System.
[ OK ] Mounted Kernel Trace File System.
[ OK ] Finished Create List of Static Device Nodes.
modprobe@configfs.service: Deactivated successfully.
[ OK ] Finished Load Kernel Module configfs.
modprobe@fuse.service: Deactivated successfully.
[ OK ] Finished Load Kernel Module fuse.
[ OK ] Finished Remount Root and Kernel File Systems.
Mounting FUSE Control File System...
Starting Create System Users...
Starting Record System Boot/Shutdown in UTMP...
[ OK ] Started Journal Service.
[ OK ] Finished Apply Kernel Variables.
[ OK ] Mounted FUSE Control File System.
[ OK ] Finished Create System Users.
Starting Flush Journal to Persistent Storage...
Starting Create Static Device Nodes in /dev...
[ OK ] Finished Record System Boot/Shutdown in UTMP.
[ OK ] Finished Flush Journal to Persistent Storage.
[ OK ] Finished Create Static Device Nodes in /dev.
[ OK ] Reached target Preparation for Local File Systems.
[ OK ] Reached target Local File Systems.
[ OK ] Reached target System Initialization.
[ OK ] Started Daily Cleanup of Temporary Directories.
[ OK ] Reached target Basic System.
[ OK ] Reached target Timer Units.
Starting containerd container runtime...
[ OK ] Started containerd container runtime.
[ OK ] Reached target Multi-User System.
[ OK ] Reached target Graphical Interface.
Starting Record Runlevel Change in UTMP...
[ OK ] Finished Record Runlevel Change in UTMP.

killianmuldoon · 2022-10-03T12:12:05Z

Can you check the status of the Machine objects? Can you see if there are errors in the logs of CAPD?

umesh168 · 2022-10-03T13:17:27Z

@killianmuldoon here are some more details from docker desktop hxproxy not running

and logs for the same
[WARNING] 275/073913 (1) : config : missing timeouts for frontend 'controlPlane'.
2022-10-03T07:39:13.813476586Z | not properly invalid, you will certainly encounter various problems
2022-10-03T07:39:13.813483294Z | such a configuration. To fix this, please ensure that all following
2022-10-03T07:39:13.813485586Z | are set to a non-zero value: 'client', 'connect', 'server'.
[NOTICE] 275/073913 (1) : New worker #1 (7) forked
[WARNING] 275/073918 (1) : Reexecuting Master process
[NOTICE] 275/073918 (1) : haproxy version is 2.2.9-2~bpo10+1
[NOTICE] 275/073918 (1) : path to executable is /usr/sbin/haproxy
[ALERT] 275/073918 (1) : sendmsg()/writev() failed in logger #1: No such file or directory (errno=2)
[WARNING] 275/073918 (7) : Stopping frontend controlPlane in 0 ms.
[WARNING] 275/073918 (7) : Stopping backend kube-apiservers in 0 ms.
[WARNING] 275/073918 (7) : Stopping frontend GLOBAL in 0 ms.
[WARNING] 275/073918 (7) : Proxy controlPlane stopped (cumulated conns: FE: 0, BE: 0).
[WARNING] 275/073918 (7) : Proxy kube-apiservers stopped (cumulated conns: FE: 0, BE: 0).
[WARNING] 275/073918 (7) : Proxy GLOBAL stopped (cumulated conns: FE: 0, BE: 0).
[NOTICE] 275/073918 (1) : New worker #1 (25) forked
[WARNING] 275/073918 (1) : Former worker #1 (7) exited with code 0 (Exit)
[WARNING] 275/073918 (25) : Server kube-apiservers/capi-quickstart-47d7b-vk7tg is DOWN, reason: Layer4 connection problem, info: "SSL handshake failure", check duration: 0ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[ALERT] 275/073918 (25) : backend 'kube-apiservers' has no server available!
[WARNING] 275/073940 (25) : Server kube-apiservers/capi-quickstart-47d7b-vk7tg is UP, reason: Layer7 check passed, code: 200, check duration: 2ms. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.

umesh168 · 2022-10-03T13:28:21Z

containerd.log
Oct 03 07:53:45 kind-control-plane containerd[99]: time="2022-10-03T07:53:45.389292253Z" level=info msg="RemoveContainer for "615835aa60d5a9cdded7cf75e8de9918cce71e37236ac19e6f0dce1a0add000d" returns successfully"
Oct 03 07:53:45 kind-control-plane containerd[99]: time="2022-10-03T07:53:45.389919420Z" level=error msg="ContainerStatus for "615835aa60d5a9cdded7cf75e8de9918cce71e37236ac19e6f0dce1a0add000d" failed" error="rpc error: code = NotFound desc = an error occurred when try to find container "615835aa60d5a9cdded7cf75e8de9918cce71e37236ac19e6f0dce1a0add000d": not found"

killianmuldoon · 2022-10-03T14:35:10Z

I'm not sure what the issue is with the haproxy container - I've seen it fail due to resource exhaustion on Linux systems at times. Can you share what resources you're dedicating to Docker Desktop? Can you see what memory usage is like when you create a new cluster and haproxy fails?

It would also be really helpful if you could see if there's anything relevant in the capd infrastructure provider logs to help debug this issue.

umesh168 · 2022-10-03T14:41:20Z

E1003 10:22:18.784244 1 controller.go:182] "Failed to update KubeadmControlPlane Status" err="failed to create remote cluster client: error creating client and cache for remote cluster: error creating dynamic rest mapper for remote cluster \"default/capi-quickstart\": Get \"https://172.18.0.2:6443/api?timeout=10s\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)" controller="kubeadmcontrolplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="KubeadmControlPlane" kubeadmControlPlane="default/capi-quickstart-47d7b" namespace="default" name="capi-quickstart-47d7b" reconcileID=e2010afc-dac6-4acd-b60e-dc08944fb296 cluster="capi-quickstart

The issue is ip which is used 172.18.0.2 is not accessible.

root@kind-control-plane:/# telnet 172.18.0.2 6443 
Trying 172.18.0.2...
telnet: Unable to connect to remote host: Connection timed out```

killianmuldoon · 2022-10-03T16:30:33Z

I really think the networking issue is the symptom rather than the cause here. The IP isn't reachable because the container isn't up. The first places I'd look for information are in resource usage and the CAPD logs as it really seems as if CAPD or docker are having a hard time bringing up haproxy.

To drill down on the issue it could be a good idea to run the same version of haproxy using docker run to see if you can start it and have it run correctly using the CLI. I encourage you to keep an eye on resources - in particular memory usage - when you do this.

fabriziopandini · 2022-10-04T10:09:03Z

/kind support

from a quick look CAPD is failing to update the configuration inside the HA proxy image, CAPD logs should have an error if this is what is happening but this is a very uncommon error

killianmuldoon · 2022-10-10T07:23:22Z

@umesh168 did you manage to resolve this?

umesh168 · 2022-10-30T10:34:36Z

@killianmuldoon above issue has been resolved.
Thanks for you comments and suggestion.

fabriziopandini · 2022-10-30T10:50:19Z

Happy to hear the problem has been fixed
/close

k8s-ci-robot · 2022-10-30T10:50:23Z

@fabriziopandini: Closing this issue.

In response to this:

Happy to hear the problem has been fixed
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

nilsanderselde · 2022-11-03T14:19:53Z

@killianmuldoon above issue has been resolved. Thanks for you comments and suggestion.

@umesh168 How did you end up fixing it? I'm encountering something similar.

umesh168 · 2022-11-04T13:30:30Z

@nilsanderselde try to pull haproxy image manually.

jingai-db · 2022-11-21T07:15:39Z

I hit the same issue of Unable to connect to the server: net/http: TLS handshake timeout, though my 'capi-haproxy-lb' container is running properly.

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Oct 3, 2022

k8s-ci-robot added the kind/support Categorizes issue or PR as a support question. label Oct 4, 2022

k8s-ci-robot closed this as completed Oct 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cluster api having networking issue while using docker provider #7330

cluster api having networking issue while using docker provider #7330

umesh168 commented Oct 3, 2022 •

edited

Loading

k8s-ci-robot commented Oct 3, 2022

killianmuldoon commented Oct 3, 2022

umesh168 commented Oct 3, 2022 •

edited

Loading

umesh168 commented Oct 3, 2022

killianmuldoon commented Oct 3, 2022

umesh168 commented Oct 3, 2022

umesh168 commented Oct 3, 2022

killianmuldoon commented Oct 3, 2022

umesh168 commented Oct 3, 2022 •

edited

Loading

killianmuldoon commented Oct 3, 2022

fabriziopandini commented Oct 4, 2022

killianmuldoon commented Oct 10, 2022

umesh168 commented Oct 30, 2022

fabriziopandini commented Oct 30, 2022

k8s-ci-robot commented Oct 30, 2022

nilsanderselde commented Nov 3, 2022

umesh168 commented Nov 4, 2022 •

edited

Loading

jingai-db commented Nov 21, 2022

cluster api having networking issue while using docker provider #7330

cluster api having networking issue while using docker provider #7330

Comments

umesh168 commented Oct 3, 2022 • edited Loading

k8s-ci-robot commented Oct 3, 2022

killianmuldoon commented Oct 3, 2022

umesh168 commented Oct 3, 2022 • edited Loading

umesh168 commented Oct 3, 2022

killianmuldoon commented Oct 3, 2022

umesh168 commented Oct 3, 2022

umesh168 commented Oct 3, 2022

killianmuldoon commented Oct 3, 2022

umesh168 commented Oct 3, 2022 • edited Loading

killianmuldoon commented Oct 3, 2022

fabriziopandini commented Oct 4, 2022

killianmuldoon commented Oct 10, 2022

umesh168 commented Oct 30, 2022

fabriziopandini commented Oct 30, 2022

k8s-ci-robot commented Oct 30, 2022

nilsanderselde commented Nov 3, 2022

umesh168 commented Nov 4, 2022 • edited Loading

jingai-db commented Nov 21, 2022

umesh168 commented Oct 3, 2022 •

edited

Loading

umesh168 commented Oct 3, 2022 •

edited

Loading

umesh168 commented Oct 3, 2022 •

edited

Loading

umesh168 commented Nov 4, 2022 •

edited

Loading