Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Error: Failed waiting for log message 'Wrote kubeconfig' from node 'k3d-testing-server-1' #315

Closed
Filius-Patris opened this issue Jul 25, 2020 · 38 comments
Assignees
Labels
bug Something isn't working help wanted Extra attention is needed
Milestone

Comments

@Filius-Patris
Copy link
Contributor

What did you do?

  • How was the cluster created?

    • k3d cluster create testing --servers 2 --agents 3
  • What did you do afterwards?

    • wait & observe

What did you expect to happen?

Some info messages similar to the demo video on k3d.io and success.

Screenshots or terminal output
Recording on Asciinema

tobtobxx@MacBook-Air ~ % k3d cluster create testing --servers 2 --agents 3
INFO[0000] Created network 'k3d-testing'                
INFO[0000] Created volume 'k3d-testing-images'          
INFO[0000] Creating initializing server node            
INFO[0000] Creating node 'k3d-testing-server-0'         
INFO[0012] Creating node 'k3d-testing-server-1'         
INFO[0012] Creating node 'k3d-testing-agent-0'          
INFO[0015] Creating node 'k3d-testing-agent-1'          
INFO[0017] Creating node 'k3d-testing-agent-2'          
INFO[0020] Creating LoadBalancer 'k3d-testing-serverlb' 
ERRO[0021] Failed waiting for log message 'Wrote kubeconfig' from node 'k3d-testing-server-1' 
ERRO[0021] Failed to start container                    
ERRO[0021] Failed to create node 'k3d-testing-serverlb' 
ERRO[0021] Failed to create loadbalancer                
ERRO[0021] error during connect: Post "http://%2Fvar%2Frun%2Fdocker.sock/v1.40/containers/01fb21fd8a32b0ae561ea51dfedd44f3a3bb477307ff36fdb62d240fceb32b13/start": context canceled 
ERRO[0021] Failed to create cluster >>> Rolling Back    
INFO[0021] Deleting cluster 'testing'                   
INFO[0022] Deleted k3d-testing-server-0                 
INFO[0022] Deleted k3d-testing-server-1                 
INFO[0022] Deleted k3d-testing-agent-0                  
INFO[0023] Deleted k3d-testing-agent-1                  
INFO[0023] Deleted k3d-testing-agent-2                  
INFO[0023] Deleted k3d-testing-serverlb                 
INFO[0023] Deleting cluster network '84ac3cec99c71cfe2ac12f097bf9f6194bfd20594fa2539b07f0aec6fcede58e' 
FATA[0023] Cluster creation FAILED, all changes have been rolled back! 

Which OS & Architecture?
MacOS Catalina (10.15.5)

Which version of k3d?

tobtobxx@MacBook-Air ~ % k3d version
k3d version v3.0.0
k3s version latest (default)

(installed via homebrew)

Which version of docker?

tobtobxx@MacBook-Air ~ % k3d version
k3d version v3.0.0
k3s version latest (default)
tobtobxx@MacBook-Air ~ % docker version
Client: Docker Engine - Community
 Azure integration  0.1.7
 Version:           19.03.12
 API version:       1.40
 Go version:        go1.13.10
 Git commit:        48a66213fe
 Built:             Mon Jun 22 15:41:33 2020
 OS/Arch:           darwin/amd64
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          19.03.12
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.10
  Git commit:       48a66213fe
  Built:            Mon Jun 22 15:49:27 2020
  OS/Arch:          linux/amd64
  Experimental:     true
 containerd:
  Version:          v1.2.13
  GitCommit:        7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683
@Filius-Patris Filius-Patris added the bug Something isn't working label Jul 25, 2020
@iwilltry42
Copy link
Member

Hi @Filius-Patris , thanks for opening this issue.
On a first glance this looks like the server-1 container doesn't start up properly.
As a side note: the embedded dqlite doesn't work well with less-than-quota number of server nodes, please use 3 or more 👍

Can you try to run the command again and abort it (e.g. Ctrl+C) after seeing INFO[0012] Creating node 'k3d-testing-server-1' and then follow the logs of the k3d-testing-server-1 container via e.g. docker logs -f k3d-testing-server-1 to see what's going on there?

@Filius-Patris
Copy link
Contributor Author

  1. Quite a similar failure with three servers, so that's probalby not the cause.

  2. Any hints what to look out for in the logs? There are well over 1k lines in each of them, and search for 'fail' or similar yield hundreds of results.

For now I can just append them as complete files, if you find them helpful:

tobtobxx@MacBook-Air ~ % k3d cluster create testing --servers 3 --agents 3
INFO[0000] Created network 'k3d-testing'                
INFO[0000] Created volume 'k3d-testing-images'          
INFO[0000] Creating initializing server node            
INFO[0000] Creating node 'k3d-testing-server-0'         
INFO[0012] Creating node 'k3d-testing-server-1'         
INFO[0013] Creating node 'k3d-testing-server-2'         
^C

server-0.log
server-1.log
server-2.log

@iwilltry42
Copy link
Member

1. Quite a similar failure with three servers, so that's probalby not the cause.

What do you mean with "similar"? I can see the log message that k3d is looking for in all three logs you attached 🤔
Also all three servers seem to be running just fine since none of them exited early according to the logs.

@Filius-Patris
Copy link
Contributor Author

The error with 3 servers and 3 agents:

tobtobxx@MacBook-Air ~ % k3d cluster create testing --servers 3 --agents 3
INFO[0000] Created network 'k3d-testing'                
INFO[0000] Created volume 'k3d-testing-images'          
INFO[0000] Creating initializing server node            
INFO[0000] Creating node 'k3d-testing-server-0'         
INFO[0015] Creating node 'k3d-testing-server-1'         
INFO[0017] Creating node 'k3d-testing-server-2'         
INFO[0020] Creating node 'k3d-testing-agent-0'          
INFO[0024] Creating node 'k3d-testing-agent-1'          
ERRO[0026] Failed waiting for log message 'Wrote kubeconfig' from node 'k3d-testing-server-2' 
ERRO[0026] Failed to inspect container '96e15301987ffefe962f000251c13813d8067f11ea28e51bc190cc62f98f6dc9' 
ERRO[0026] Failed waiting for log message 'Wrote kubeconfig' from node 'k3d-testing-server-1' 
ERRO[0026] Failed to start container                    
ERRO[0026] Failed to create node 'k3d-testing-agent-1'  
ERRO[0026] Failed to create node                        
ERRO[0026] error during connect: Post "http://%2Fvar%2Frun%2Fdocker.sock/v1.40/containers/6a60a18f6abc57a149083af7a550d25ae2355eeeb47876cda8bb8e132c785239/start": context canceled 
ERRO[0026] Failed to create cluster >>> Rolling Back    
INFO[0026] Deleting cluster 'testing'                   
INFO[0026] Deleted k3d-testing-server-0                 
INFO[0027] Deleted k3d-testing-server-1                 
INFO[0027] Deleted k3d-testing-server-2                 
INFO[0027] Deleted k3d-testing-agent-0                  
INFO[0027] Deleted k3d-testing-agent-1                  
ERRO[0027] Failed to delete container ''                
WARN[0027] Failed to delete node '': Try to delete it manually 
INFO[0027] Deleting cluster network '570f85829fa81a24338848c0ef6bf40c4d94f50d3fefc43d46d2a83190e573f1' 
ERRO[0027] Failed to delete 1 nodes: Try to delete them manually 
FATA[0027] Cluster creation FAILED, also FAILED to rollback changes!

I tried debugging again, but this time, I tried to abort just before k3d starts rolling back the cluster. It took a few tries, but then I noticed that docker ps do not list the k3d-testing-server-X container, but docker ps -a list them with the status Exited (1) 26 seconds ago.

For some reason though k3d-testing-server-0 is nowhere to be seen. Maybe it was already deleted at the time of abortion.

This time the logs are extremely short:

server-1:

time="2020-08-07T07:37:46.389666000Z" level=info msg="Starting k3s v1.17.9+k3s1 (000a361f)"
time="2020-08-07T07:37:46.914361000Z" level=info msg="Active TLS secret  (ver=) (count 8): map[listener.cattle.io/cn-0.0.0.0:0.0.0.0 listener.cattle.io/cn-10.43.0.1:10.43.0.1 listener.cattle.io/cn-127.0.0.1:127.0.0.1 listener.cattle.io/cn-172.26.0.3:172.26.0.3 listener.cattle.io/cn-kubernetes:kubernetes listener.cattle.io/cn-kubernetes.default:kubernetes.default listener.cattle.io/cn-kubernetes.default.svc.cluster.local:kubernetes.default.svc.cluster.local listener.cattle.io/cn-localhost:localhost listener.cattle.io/hash:9440c7fabd1b3f2b6e7f5770db25885562060ab3c6c7f08bd65cb0639aeaeedb]"
time="2020-08-07T07:37:47.063180500Z" level=info msg="Joining dqlite cluster as address=172.26.0.3:6443, id=1044008"
time="2020-08-07T07:37:54.440749400Z" level=fatal msg="starting kubernetes: preparing server: post join: no connection to remote server available (1)"

server-2:

time="2020-08-07T07:37:52.002513700Z" level=info msg="Starting k3s v1.17.9+k3s1 (000a361f)"
time="2020-08-07T07:37:53.340722700Z" level=info msg="Active TLS secret  (ver=) (count 8): map[listener.cattle.io/cn-0.0.0.0:0.0.0.0 listener.cattle.io/cn-10.43.0.1:10.43.0.1 listener.cattle.io/cn-127.0.0.1:127.0.0.1 listener.cattle.io/cn-172.26.0.4:172.26.0.4 listener.cattle.io/cn-kubernetes:kubernetes listener.cattle.io/cn-kubernetes.default:kubernetes.default listener.cattle.io/cn-kubernetes.default.svc.cluster.local:kubernetes.default.svc.cluster.local listener.cattle.io/cn-localhost:localhost listener.cattle.io/hash:6c930871bfdaaa80d53cb7601cc08b7cd87ff70a4fea036624eff18ecfb95627]"
time="2020-08-07T07:37:53.825305800Z" level=info msg="Joining dqlite cluster as address=172.26.0.4:6443, id=514386"
time="2020-08-07T07:37:53.849071200Z" level=fatal msg="starting kubernetes: preparing server: post join: a configuration change is already in progress (5)"

I will try to get the logs of server-0, but this might be more difficult and needs some scripts...

@Filius-Patris
Copy link
Contributor Author

Oh, btw, I just realized you can run the cluster create command with --verbose. Here the relevant parts:

INFO[0026] Creating LoadBalancer 'k3d-testing-serverlb' 
DEBU[0026] Creating node from spec
&{Name:k3d-testing-serverlb Role:loadbalancer Image:docker.io/rancher/k3d-proxy:v3.0.0 Volumes:[] Env:[SERVERS=k3d-testing-server-0,k3d-testing-server-1,k3d-testing-server-2 PORTS=6443] Cmd:[] Args:[] Ports:[0.0.0.0:52151:6443/tcp] Restart:false Labels:map[app:k3d k3d.cluster:testing] Network:1bab04850309e79f344f7a08246c12395eb95cc8e63ed9dc3371caf53d09e248 ServerOpts:{IsInit:false ExposeAPI:{Host: HostIP: Port:}} AgentOpts:{}} 
DEBU[0026] Volumes: []                                  
DEBU[0027] Creating docker container with translated config
&{ContainerConfig:{Hostname:k3d-testing-serverlb Domainname: User: AttachStdin:false AttachStdout:false AttachStderr:false ExposedPorts:map[6443/tcp:{}] Tty:false OpenStdin:false StdinOnce:false Env:[SERVERS=k3d-testing-server-0,k3d-testing-server-1,k3d-testing-server-2 PORTS=6443 K3S_KUBECONFIG_OUTPUT=/output/kubeconfig.yaml] Cmd:[] Healthcheck:<nil> ArgsEscaped:false Image:docker.io/rancher/k3d-proxy:v3.0.0 Volumes:map[] WorkingDir: Entrypoint:[] NetworkDisabled:false MacAddress: OnBuild:[] Labels:map[app:k3d k3d.cluster:testing k3d.role:loadbalancer] StopSignal: StopTimeout:<nil> Shell:[]} HostConfig:{Binds:[] ContainerIDFile: LogConfig:{Type: Config:map[]} NetworkMode: PortBindings:map[6443/tcp:[{HostIP:0.0.0.0 HostPort:52151}]] RestartPolicy:{Name: MaximumRetryCount:0} AutoRemove:false VolumeDriver: VolumesFrom:[] CapAdd:[] CapDrop:[] Capabilities:[] CgroupnsMode: DNS:[] DNSOptions:[] DNSSearch:[] ExtraHosts:[] GroupAdd:[] IpcMode: Cgroup: Links:[] OomScoreAdj:0 PidMode: Privileged:true PublishAllPorts:false ReadonlyRootfs:false SecurityOpt:[] StorageOpt:map[] Tmpfs:map[/run: /var/run:] UTSMode: UsernsMode: ShmSize:0 Sysctls:map[] Runtime: ConsoleSize:[0 0] Isolation: Resources:{CPUShares:0 Memory:0 NanoCPUs:0 CgroupParent: BlkioWeight:0 BlkioWeightDevice:[] BlkioDeviceReadBps:[] BlkioDeviceWriteBps:[] BlkioDeviceReadIOps:[] BlkioDeviceWriteIOps:[] CPUPeriod:0 CPUQuota:0 CPURealtimePeriod:0 CPURealtimeRuntime:0 CpusetCpus: CpusetMems: Devices:[] DeviceCgroupRules:[] DeviceRequests:[] KernelMemory:0 KernelMemoryTCP:0 MemoryReservation:0 MemorySwap:0 MemorySwappiness:<nil> OomKillDisable:<nil> PidsLimit:<nil> Ulimits:[] CPUCount:0 CPUPercent:0 IOMaximumIOps:0 IOMaximumBandwidth:0} Mounts:[] MaskedPaths:[] ReadonlyPaths:[] Init:0xc000580790} NetworkingConfig:{EndpointsConfig:map[1bab04850309e79f344f7a08246c12395eb95cc8e63ed9dc3371caf53d09e248:0xc0003ba240]}} 
ERRO[0027] Failed waiting for log message 'Wrote kubeconfig' from node 'k3d-testing-server-2' 
DEBU[0027] FIXME: Got an status-code for which error does not match any expected type!!!: -1  module=api status_code=-1
DEBU[0027] FIXME: Got an status-code for which error does not match any expected type!!!: -1  module=api status_code=-1
ERRO[0027] Failed to pull image 'docker.io/rancher/k3d-proxy:v3.0.0' 
ERRO[0027] Failed to create container 'k3d-testing-serverlb' 
ERRO[0027] Failed to create node 'k3d-testing-serverlb' 
ERRO[0027] Failed to create loadbalancer                
ERRO[0027] Failed to get logs from node 'k3d-testing-server-1' (container '49cdc67b0b25bb7f8795df2f3b10cfc4f6ca57c4a1794dc53d10094b6b6efdaf') 
ERRO[0027] Failed waiting for log message 'Wrote kubeconfig' from node 'k3d-testing-server-1' 
ERRO[0027] error during connect: Post "http://%2Fvar%2Frun%2Fdocker.sock/v1.40/images/create?fromImage=rancher%2Fk3d-proxy&tag=v3.0.0": context canceled 
ERRO[0027] Failed to create cluster >>> Rolling Back    
INFO[0027] Deleting cluster 'testing'                   
DEBU[0027] Cluster Details: &{Name:testing Network:{Name:1bab04850309e79f344f7a08246c12395eb95cc8e63ed9dc3371caf53d09e248 External:false} Token:bnCpprfSgmGyeiHthiER Nodes:[0xc000457680 0xc0004577a0 0xc0004578c0 0xc0004579e0 0xc000457b00 0xc000457c20 0xc0007fe6c0] InitNode:0xc000457680 ExternalDatastore:{Endpoint: CAFile: CertFile: KeyFile: Network:} CreateClusterOpts:0xc000375630 ExposeAPI:{Host:0.0.0.0 HostIP:0.0.0.0 Port:52151} ServerLoadBalancer:0xc000457560 ImageVolume:} 
INFO[0028] Deleted k3d-testing-server-0                 
INFO[0028] Deleted k3d-testing-server-1                 
INFO[0028] Deleted k3d-testing-server-2                 
INFO[0029] Deleted k3d-testing-agent-0                  
INFO[0029] Deleted k3d-testing-agent-1                  
INFO[0030] Deleted k3d-testing-agent-2                  
ERRO[0030] Failed to delete container 'k3d-testing-serverlb' 
WARN[0030] Failed to delete node 'k3d-testing-serverlb': Try to delete it manually 
INFO[0030] Deleting cluster network '1bab04850309e79f344f7a08246c12395eb95cc8e63ed9dc3371caf53d09e248' 
ERRO[0030] Failed to delete 1 nodes: Try to delete them manually 
FATA[0030] Cluster creation FAILED, also FAILED to rollback changes! 

@iwilltry42
Copy link
Member

Something seems to be really off there 🤔
There are multiple issues everywhere...

ERRO[0027] Failed to pull image 'docker.io/rancher/k3d-proxy:v3.0.0'
This line is suspicious.. can you pull that image manually?

Additionally, you could try cleaning up your docker environment (if possible):

  • docker rm -f $(docker ps -qa) to remove all running containers
  • docker system prune -a (please check if everything mentioned in the question popping up can be removed)

@Filius-Patris
Copy link
Contributor Author

Filius-Patris commented Aug 7, 2020

Pulling the image

Sure, works fine:

tobtobxx@MacBook-Air ~ % docker pull docker.io/rancher/k3d-proxy:v3.0.0
v3.0.0: Pulling from rancher/k3d-proxy
e7c96db7181b: Pull complete 
8985e402e050: Pull complete 
986b35da0011: Pull complete 
7c96bf099c58: Pull complete 
490fe1a32c6d: Pull complete 
5f0121973a8b: Pull complete 
Digest: sha256:c18ec09a5cd9efb4017f48261e9ba8cbaa3c74b012b60c33341bb6ec5ef01286
Status: Downloaded newer image for rancher/k3d-proxy:v3.0.0
docker.io/rancher/k3d-proxy:v3.0.0
tobtobxx@MacBook-Air ~ % echo $?
0

Resetting

I went as far as removing every image, completely uninstalling k3d and Docker Desktop and reinstalling them. After reinstalling them, I ran the command and sure enough had the same error. I know that all images were deleted too, because there was the log message about pulling k3s:latest.

Getting more logs

I was never able to get the logs of the aborted server-0 container. k3d just deletes this container too fast. Any tips on how to abort the cluster create command just after the failure, or prevent k3d from rolling back failed actions?

@Filius-Patris
Copy link
Contributor Author

Filius-Patris commented Aug 7, 2020

Doing some time math, lead me to the following maybe-clues:

  • If k3d is stopped from destroying the cluster, the log message does appear (see logs in first reply)

  • If k3d does destroy the cluster (and thus stops all running containers), the log message is not found (at least in the second and third server, wasn't able to get the post-mortem logs of the first; see this comment).

  • Servers 1 and 2 (starting from 0) need about 13s from startup to the log message 'Wrote kubeconfig' on my machine (again, see logs in first reply), however when we look at the timestamps in this comment, we get a difference of mere 9s between startup and failing to see the log message. Although these are different runs and may have different timings, the difference is probably large enough to justify this thought.

That's where we get into source code. I'd like to hack k3d in the following way, but I don't know golang, so help is very much needed:
The error seems to be emitted by the NodeWaitForLogMessage function. The comment mentions it to respect a timeout. How do I modify this timeout?

@iwilltry42 iwilltry42 added this to the v3.0.1 milestone Aug 12, 2020
@iwilltry42
Copy link
Member

So this line in the logs you've posted looks very suspicious
ERRO[0026] error during connect: Post "http://%2Fvar%2Frun%2Fdocker.sock/v1.40/containers/6a60a18f6abc57a149083af7a550d25ae2355eeeb47876cda8bb8e132c785239/start": context canceled
... to me this looks like something is wrong in your docker setup/environment. So k3d fails to manage the containers because it cannot connect to the docker socket/API. 🤔

The timeout you mention in your last comment is the context timeout that you can set with k3d cluster create --timeout 50s (by default, there is no such timeout).

@iwilltry42 iwilltry42 self-assigned this Aug 12, 2020
@iwilltry42 iwilltry42 removed this from the v3.0.1 milestone Aug 12, 2020
@rr-appadaptive
Copy link

I have the same problem although I try to get k3d (version 3.0.1) running on a Linux machine with Ubuntu version 20.04.1 LTS as OS. My Docker-Version is exactly the same as here mentioned.

Initially I got k3d working with 3 servers but after I deleted a cluster once it didn't work anymore.

@iwilltry42
Copy link
Member

@rr-appadaptive do you also get the same logs?
As mentioned before, there are severe issues with dqlite and running (only) 3 server nodes.

@rr-appadaptive
Copy link

Could that mean that more than 3 nodes (e.g. 3 Servers and 2 Agents) would solve the problem. I'll going to try that...

@Filius-Patris
Copy link
Contributor Author

Still present in v3.0.1, just FYI.

@iwilltry42
Copy link
Member

@Filius-Patris there was no related change in v3.0.1 and so far I also couldn't detect any issue within the k3d code.
You can try creating the cluster with --wait=false to skip all the waiting altogether 🤔

@franviera92
Copy link

franviera92 commented Sep 6, 2020

I have the same problem although I try to get k3d (version 3.0.1) running on a Linux machine with Ubuntu version 20.04.1 LTS

root@fran-X580VD:/home/fran# k3d cluster create multiserver --servers 3
INFO[0000] Created network 'k3d-multiserver'
INFO[0000] Created volume 'k3d-multiserver-images'
INFO[0000] Creating initializing server node
INFO[0000] Creating node 'k3d-multiserver-server-0'
INFO[0040] Creating node 'k3d-multiserver-server-1'
INFO[0059] Creating node 'k3d-multiserver-server-2'
ERRO[0062] Failed waiting for log message 'Wrote kubeconfig' from node 'k3d-multiserver-server-1'
ERRO[0062] Failed to create container 'k3d-multiserver-server-2'
ERRO[0062] Failed to create node 'k3d-multiserver-server-2'
ERRO[0062] Failed to create node
ERRO[0062] error during connect: Post "http://%2Fvar%2Frun%2Fdocker.sock/v1.40/containers/create?name=k3d-multiserver-server-2": context canceled
ERRO[0062] Failed to create cluster >>> Rolling Back
INFO[0062] Deleting cluster 'multiserver'
INFO[0065] Deleted k3d-multiserver-server-0
INFO[0068] Deleted k3d-multiserver-server-1
INFO[0070] Deleted k3d-multiserver-server-2
INFO[0070] Deleting cluster network 'd0e5103479c1f504659ac2c00afc0c1f735fb02c4ba5051a8df81073812e2301'
FATA[0070] Cluster creation FAILED, all changes have been rolled back!

@Filius-Patris
Copy link
Contributor Author

Filius-Patris commented Sep 6, 2020

Ok, now THIS is concerning: I could make it reproducible in a DigitalOcean VPS.

  • Fire up a brand new VPS on DigitalOcean
    • OS: Ubuntu 20.04 (LTS) x64
    • Profile: Basic / 40$ per mo. (8G RAM, 4 vCPU, 160G SSD)
    • The other options seem irrelevant. Do as you whish.
  • Log into it (ssh, or whatever)
  • Install docker
    • snap install docker
    • Current version is 19.03.11
  • Install k3d
    • curl -s https://raw.githubusercontent.com/rancher/k3d/main/install.sh | bash (snippet from https://k3d.io)
    • Current version is v3.0.1, default k3s version is v1.18.6-k3s1
  • Launch a cluster
    • k3d cluster create testing --servers 3 --agents 3
    • Wait for the error.

What am I missing? How can this happen???

EDIT: When you reboot the machine, then launching a cluster works. What?

@iwilltry42
Copy link
Member

iwilltry42 commented Sep 7, 2020

@Filius-Patris I seriously have no clue, why you're experiencing this.
I just tried your setup step by step and I don't experience this problem on the first try (without rebooting the droplet).
Here's my full output: https://pastebin.com/BvA8xmXc

EDIT: just installed kubectl afterwards to verify that the cluster is actually up:

root@ubuntu-s-4vcpu-8gb-fra1-01:~# kubectl get nodes
NAME                   STATUS   ROLES    AGE     VERSION
k3d-testing-server-0   Ready    master   3m8s    v1.18.6+k3s1
k3d-testing-agent-0    Ready    <none>   3m1s    v1.18.6+k3s1
k3d-testing-agent-1    Ready    <none>   3m      v1.18.6+k3s1
k3d-testing-agent-2    Ready    <none>   2m58s   v1.18.6+k3s1
k3d-testing-server-1   Ready    master   2m49s   v1.18.6+k3s1
k3d-testing-server-2   Ready    master   2m45s   v1.18.6+k3s1

@zasherif
Copy link

Is there any update how to fix this? I also experience the same issue...

# k3d cluster create multiserver -s 3 --verbose --wait=false
DEBU[0000] Selected runtime is 'docker.Docker'
DEBU[0000] API-Port Mapping didn't specify hostPort, choosing one randomly...
DEBU[0000] Got free port for API: '41801'
DEBU[0000] PortFilterMap: map[]
DEBU[0000] '--update-default-kubeconfig set: enabling wait-for-server
INFO[0000] Created network 'k3d-multiserver'
INFO[0000] Created volume 'k3d-multiserver-images'
INFO[0000] Creating initializing server node
INFO[0000] Creating node 'k3d-multiserver-server-0'
DEBU[0000] Creating node from spec
&{Name:k3d-multiserver-server-0 Role:server Image:docker.io/rancher/k3s:v1.18.6-k3s1 Volumes:[k3d-multiserver-images:/k3d/images] Env:[K3S_TOKEN=NPlYrJMlajHJcWrewdwp] Cmd:[] Args:[--cluster-init] Ports:[] Restart:false Labels:map[k3d.cluster:multiserver k3d.cluster.imageVolume:k3d-multiserver-images k3d.cluster.network:727933b590549a5d51d3e80d6ed4e5676e55da441b783651af750589127b0e1f k3d.cluster.network.external:false k3d.cluster.token:NPlYrJMlajHJcWrewdwp k3d.cluster.url:https://k3d-multiserver-server-0:6443] Network:727933b590549a5d51d3e80d6ed4e5676e55da441b783651af750589127b0e1f ServerOpts:{IsInit:true ExposeAPI:{Host:0.0.0.0 HostIP:0.0.0.0 Port:41801}} AgentOpts:{}}
DEBU[0000] Volumes: [k3d-multiserver-images:/k3d/images]
DEBU[0000] Creating docker container with translated config
&{ContainerConfig:{Hostname:k3d-multiserver-server-0 Domainname: User: AttachStdin:false AttachStdout:false AttachStderr:false ExposedPorts:map[] Tty:false OpenStdin:false StdinOnce:false Env:[K3S_TOKEN=NPlYrJMlajHJcWrewdwp K3S_KUBECONFIG_OUTPUT=/output/kubeconfig.yaml] Cmd:[server --cluster-init --tls-san 0.0.0.0] Healthcheck:<nil> ArgsEscaped:false Image:docker.io/rancher/k3s:v1.18.6-k3s1 Volumes:map[] WorkingDir: Entrypoint:[] NetworkDisabled:false MacAddress: OnBuild:[] Labels:map[app:k3d k3d.cluster:multiserver k3d.cluster.imageVolume:k3d-multiserver-images k3d.cluster.network:727933b590549a5d51d3e80d6ed4e5676e55da441b783651af750589127b0e1f k3d.cluster.network.external:false k3d.cluster.token:NPlYrJMlajHJcWrewdwp k3d.cluster.url:https://k3d-multiserver-server-0:6443 k3d.role:server k3d.server.api.host:0.0.0.0 k3d.server.api.hostIP:0.0.0.0 k3d.server.api.port:41801] StopSignal: StopTimeout:<nil> Shell:[]} HostConfig:{Binds:[k3d-multiserver-images:/k3d/images] ContainerIDFile: LogConfig:{Type: Config:map[]} NetworkMode: PortBindings:map[] RestartPolicy:{Name: MaximumRetryCount:0} AutoRemove:false VolumeDriver: VolumesFrom:[] CapAdd:[] CapDrop:[] Capabilities:[] CgroupnsMode: DNS:[] DNSOptions:[] DNSSearch:[] ExtraHosts:[] GroupAdd:[] IpcMode: Cgroup: Links:[] OomScoreAdj:0 PidMode: Privileged:true PublishAllPorts:false ReadonlyRootfs:false SecurityOpt:[] StorageOpt:map[] Tmpfs:map[/run: /var/run:] UTSMode: UsernsMode: ShmSize:0 Sysctls:map[] Runtime: ConsoleSize:[0 0] Isolation: Resources:{CPUShares:0 Memory:0 NanoCPUs:0 CgroupParent: BlkioWeight:0 BlkioWeightDevice:[] BlkioDeviceReadBps:[] BlkioDeviceWriteBps:[] BlkioDeviceReadIOps:[] BlkioDeviceWriteIOps:[] CPUPeriod:0 CPUQuota:0 CPURealtimePeriod:0 CPURealtimeRuntime:0 CpusetCpus: CpusetMems: Devices:[] DeviceCgroupRules:[] DeviceRequests:[] KernelMemory:0 KernelMemoryTCP:0 MemoryReservation:0 MemorySwap:0 MemorySwappiness:<nil> OomKillDisable:<nil> PidsLimit:<nil> Ulimits:[] CPUCount:0 CPUPercent:0 IOMaximumIOps:0 IOMaximumBandwidth:0} Mounts:[] MaskedPaths:[] ReadonlyPaths:[] Init:0xc0003d65f8} NetworkingConfig:{EndpointsConfig:map[727933b590549a5d51d3e80d6ed4e5676e55da441b783651af750589127b0e1f:0xc00010c840]}}
DEBU[0000] Created container 2723a68745d032b0ea171f32edab0913a21e41c12d9c41dd7a952fc013420309
DEBU[0001] Created node 'k3d-multiserver-server-0'
DEBU[0001] Waiting for initializing server node...
DEBU[0002] Waiting for initializing server node...
DEBU[0003] Waiting for initializing server node...
DEBU[0004] Waiting for initializing server node...
DEBU[0005] Waiting for initializing server node...
DEBU[0006] Waiting for initializing server node...
DEBU[0007] Waiting for initializing server node...
DEBU[0008] Waiting for initializing server node...
DEBU[0009] Waiting for initializing server node...
DEBU[0010] Waiting for initializing server node...
DEBU[0011] Waiting for initializing server node...
DEBU[0012] Waiting for initializing server node...
DEBU[0012] Initializing server node is up... continuing
INFO[0013] Creating node 'k3d-multiserver-server-1'
DEBU[0013] Creating node from spec
&{Name:k3d-multiserver-server-1 Role:server Image:docker.io/rancher/k3s:v1.18.6-k3s1 Volumes:[k3d-multiserver-images:/k3d/images] Env:[K3S_TOKEN=NPlYrJMlajHJcWrewdwp K3S_URL=https://k3d-multiserver-server-0:6443] Cmd:[] Args:[] Ports:[] Restart:false Labels:map[k3d.cluster:multiserver k3d.cluster.imageVolume:k3d-multiserver-images k3d.cluster.network:727933b590549a5d51d3e80d6ed4e5676e55da441b783651af750589127b0e1f k3d.cluster.network.external:false k3d.cluster.token:NPlYrJMlajHJcWrewdwp k3d.cluster.url:https://k3d-multiserver-server-0:6443] Network:727933b590549a5d51d3e80d6ed4e5676e55da441b783651af750589127b0e1f ServerOpts:{IsInit:false ExposeAPI:{Host:0.0.0.0 HostIP:0.0.0.0 Port:41801}} AgentOpts:{}}
DEBU[0013] Volumes: [k3d-multiserver-images:/k3d/images]
DEBU[0013] Creating docker container with translated config
&{ContainerConfig:{Hostname:k3d-multiserver-server-1 Domainname: User: AttachStdin:false AttachStdout:false AttachStderr:false ExposedPorts:map[] Tty:false OpenStdin:false StdinOnce:false Env:[K3S_TOKEN=NPlYrJMlajHJcWrewdwp K3S_URL=https://k3d-multiserver-server-0:6443 K3S_KUBECONFIG_OUTPUT=/output/kubeconfig.yaml] Cmd:[server --tls-san 0.0.0.0] Healthcheck:<nil> ArgsEscaped:false Image:docker.io/rancher/k3s:v1.18.6-k3s1 Volumes:map[] WorkingDir: Entrypoint:[] NetworkDisabled:false MacAddress: OnBuild:[] Labels:map[app:k3d k3d.cluster:multiserver k3d.cluster.imageVolume:k3d-multiserver-images k3d.cluster.network:727933b590549a5d51d3e80d6ed4e5676e55da441b783651af750589127b0e1f k3d.cluster.network.external:false k3d.cluster.token:NPlYrJMlajHJcWrewdwp k3d.cluster.url:https://k3d-multiserver-server-0:6443 k3d.role:server k3d.server.api.host:0.0.0.0 k3d.server.api.hostIP:0.0.0.0 k3d.server.api.port:41801] StopSignal: StopTimeout:<nil> Shell:[]} HostConfig:{Binds:[k3d-multiserver-images:/k3d/images] ContainerIDFile: LogConfig:{Type: Config:map[]} NetworkMode: PortBindings:map[] RestartPolicy:{Name: MaximumRetryCount:0} AutoRemove:false VolumeDriver: VolumesFrom:[] CapAdd:[] CapDrop:[] Capabilities:[] CgroupnsMode: DNS:[] DNSOptions:[] DNSSearch:[] ExtraHosts:[] GroupAdd:[] IpcMode: Cgroup: Links:[] OomScoreAdj:0 PidMode: Privileged:true PublishAllPorts:false ReadonlyRootfs:false SecurityOpt:[] StorageOpt:map[] Tmpfs:map[/run: /var/run:] UTSMode: UsernsMode: ShmSize:0 Sysctls:map[] Runtime: ConsoleSize:[0 0] Isolation: Resources:{CPUShares:0 Memory:0 NanoCPUs:0 CgroupParent: BlkioWeight:0 BlkioWeightDevice:[] BlkioDeviceReadBps:[] BlkioDeviceWriteBps:[] BlkioDeviceReadIOps:[] BlkioDeviceWriteIOps:[] CPUPeriod:0 CPUQuota:0 CPURealtimePeriod:0 CPURealtimeRuntime:0 CpusetCpus: CpusetMems: Devices:[] DeviceCgroupRules:[] DeviceRequests:[] KernelMemory:0 KernelMemoryTCP:0 MemoryReservation:0 MemorySwap:0 MemorySwappiness:<nil> OomKillDisable:<nil> PidsLimit:<nil> Ulimits:[] CPUCount:0 CPUPercent:0 IOMaximumIOps:0 IOMaximumBandwidth:0} Mounts:[] MaskedPaths:[] ReadonlyPaths:[] Init:0xc0001f75a0} NetworkingConfig:{EndpointsConfig:map[727933b590549a5d51d3e80d6ed4e5676e55da441b783651af750589127b0e1f:0xc00010d680]}}
DEBU[0013] Created container 42541a5ba5ab0de08857e4bd8d19a5f5e6c677b09a49740b9f2a67a89a43dd97
DEBU[0014] Created node 'k3d-multiserver-server-1'
DEBU[0014] Starting to wait for server node 'k3d-multiserver-server-1'
INFO[0015] Creating node 'k3d-multiserver-server-2'
DEBU[0015] Creating node from spec
&{Name:k3d-multiserver-server-2 Role:server Image:docker.io/rancher/k3s:v1.18.6-k3s1 Volumes:[k3d-multiserver-images:/k3d/images] Env:[K3S_TOKEN=NPlYrJMlajHJcWrewdwp K3S_URL=https://k3d-multiserver-server-0:6443] Cmd:[] Args:[] Ports:[] Restart:false Labels:map[k3d.cluster:multiserver k3d.cluster.imageVolume:k3d-multiserver-images k3d.cluster.network:727933b590549a5d51d3e80d6ed4e5676e55da441b783651af750589127b0e1f k3d.cluster.network.external:false k3d.cluster.token:NPlYrJMlajHJcWrewdwp k3d.cluster.url:https://k3d-multiserver-server-0:6443] Network:727933b590549a5d51d3e80d6ed4e5676e55da441b783651af750589127b0e1f ServerOpts:{IsInit:false ExposeAPI:{Host:0.0.0.0 HostIP:0.0.0.0 Port:41801}} AgentOpts:{}}
DEBU[0015] Volumes: [k3d-multiserver-images:/k3d/images]
DEBU[0015] Creating docker container with translated config
&{ContainerConfig:{Hostname:k3d-multiserver-server-2 Domainname: User: AttachStdin:false AttachStdout:false AttachStderr:false ExposedPorts:map[] Tty:false OpenStdin:false StdinOnce:false Env:[K3S_TOKEN=NPlYrJMlajHJcWrewdwp K3S_URL=https://k3d-multiserver-server-0:6443 K3S_KUBECONFIG_OUTPUT=/output/kubeconfig.yaml] Cmd:[server --tls-san 0.0.0.0] Healthcheck:<nil> ArgsEscaped:false Image:docker.io/rancher/k3s:v1.18.6-k3s1 Volumes:map[] WorkingDir: Entrypoint:[] NetworkDisabled:false MacAddress: OnBuild:[] Labels:map[app:k3d k3d.cluster:multiserver k3d.cluster.imageVolume:k3d-multiserver-images k3d.cluster.network:727933b590549a5d51d3e80d6ed4e5676e55da441b783651af750589127b0e1f k3d.cluster.network.external:false k3d.cluster.token:NPlYrJMlajHJcWrewdwp k3d.cluster.url:https://k3d-multiserver-server-0:6443 k3d.role:server k3d.server.api.host:0.0.0.0 k3d.server.api.hostIP:0.0.0.0 k3d.server.api.port:41801] StopSignal: StopTimeout:<nil> Shell:[]} HostConfig:{Binds:[k3d-multiserver-images:/k3d/images] ContainerIDFile: LogConfig:{Type: Config:map[]} NetworkMode: PortBindings:map[] RestartPolicy:{Name: MaximumRetryCount:0} AutoRemove:false VolumeDriver: VolumesFrom:[] CapAdd:[] CapDrop:[] Capabilities:[] CgroupnsMode: DNS:[] DNSOptions:[] DNSSearch:[] ExtraHosts:[] GroupAdd:[] IpcMode: Cgroup: Links:[] OomScoreAdj:0 PidMode: Privileged:true PublishAllPorts:false ReadonlyRootfs:false SecurityOpt:[] StorageOpt:map[] Tmpfs:map[/run: /var/run:] UTSMode: UsernsMode: ShmSize:0 Sysctls:map[] Runtime: ConsoleSize:[0 0] Isolation: Resources:{CPUShares:0 Memory:0 NanoCPUs:0 CgroupParent: BlkioWeight:0 BlkioWeightDevice:[] BlkioDeviceReadBps:[] BlkioDeviceWriteBps:[] BlkioDeviceReadIOps:[] BlkioDeviceWriteIOps:[] CPUPeriod:0 CPUQuota:0 CPURealtimePeriod:0 CPURealtimeRuntime:0 CpusetCpus: CpusetMems: Devices:[] DeviceCgroupRules:[] DeviceRequests:[] KernelMemory:0 KernelMemoryTCP:0 MemoryReservation:0 MemorySwap:0 MemorySwappiness:<nil> OomKillDisable:<nil> PidsLimit:<nil> Ulimits:[] CPUCount:0 CPUPercent:0 IOMaximumIOps:0 IOMaximumBandwidth:0} Mounts:[] MaskedPaths:[] ReadonlyPaths:[] Init:0xc000348390} NetworkingConfig:{EndpointsConfig:map[727933b590549a5d51d3e80d6ed4e5676e55da441b783651af750589127b0e1f:0xc00010d080]}}
DEBU[0015] Created container 2999be98ff3961dd64979a402075b4bce6fcd89c1a7a018f6c5e0ed46cc86255
DEBU[0018] Created node 'k3d-multiserver-server-2'
DEBU[0018] Node NAME: k3d-multiserver-server-0
DEBU[0018] Node NAME: k3d-multiserver-server-1
DEBU[0018] Node NAME: k3d-multiserver-server-2
INFO[0018] Creating LoadBalancer 'k3d-multiserver-serverlb'
DEBU[0018] Creating node from spec
&{Name:k3d-multiserver-serverlb Role:loadbalancer Image:docker.io/rancher/k3d-proxy:v3.0.0 Volumes:[] Env:[SERVERS=k3d-multiserver-server-0,k3d-multiserver-server-1,k3d-multiserver-server-2 PORTS=6443] Cmd:[] Args:[] Ports:[0.0.0.0:41801:6443/tcp] Restart:false Labels:map[app:k3d k3d.cluster:multiserver] Network:727933b590549a5d51d3e80d6ed4e5676e55da441b783651af750589127b0e1f ServerOpts:{IsInit:false ExposeAPI:{Host: HostIP: Port:}} AgentOpts:{}}
DEBU[0018] Volumes: []
DEBU[0018] Starting to wait for server node 'k3d-multiserver-server-2'
DEBU[0018] Creating docker container with translated config
&{ContainerConfig:{Hostname:k3d-multiserver-serverlb Domainname: User: AttachStdin:false AttachStdout:false AttachStderr:false ExposedPorts:map[6443/tcp:{}] Tty:false OpenStdin:false StdinOnce:false Env:[SERVERS=k3d-multiserver-server-0,k3d-multiserver-server-1,k3d-multiserver-server-2 PORTS=6443 K3S_KUBECONFIG_OUTPUT=/output/kubeconfig.yaml] Cmd:[] Healthcheck:<nil> ArgsEscaped:false Image:docker.io/rancher/k3d-proxy:v3.0.0 Volumes:map[] WorkingDir: Entrypoint:[] NetworkDisabled:false MacAddress: OnBuild:[] Labels:map[app:k3d k3d.cluster:multiserver k3d.role:loadbalancer] StopSignal: StopTimeout:<nil> Shell:[]} HostConfig:{Binds:[] ContainerIDFile: LogConfig:{Type: Config:map[]} NetworkMode: PortBindings:map[6443/tcp:[{HostIP:0.0.0.0 HostPort:41801}]] RestartPolicy:{Name: MaximumRetryCount:0} AutoRemove:false VolumeDriver: VolumesFrom:[] CapAdd:[] CapDrop:[] Capabilities:[] CgroupnsMode: DNS:[] DNSOptions:[] DNSSearch:[] ExtraHosts:[] GroupAdd:[] IpcMode: Cgroup: Links:[] OomScoreAdj:0 PidMode: Privileged:true PublishAllPorts:false ReadonlyRootfs:false SecurityOpt:[] StorageOpt:map[] Tmpfs:map[/run: /var/run:] UTSMode: UsernsMode: ShmSize:0 Sysctls:map[] Runtime: ConsoleSize:[0 0] Isolation: Resources:{CPUShares:0 Memory:0 NanoCPUs:0 CgroupParent: BlkioWeight:0 BlkioWeightDevice:[] BlkioDeviceReadBps:[] BlkioDeviceWriteBps:[] BlkioDeviceReadIOps:[] BlkioDeviceWriteIOps:[] CPUPeriod:0 CPUQuota:0 CPURealtimePeriod:0 CPURealtimeRuntime:0 CpusetCpus: CpusetMems: Devices:[] DeviceCgroupRules:[] DeviceRequests:[] KernelMemory:0 KernelMemoryTCP:0 MemoryReservation:0 MemorySwap:0 MemorySwappiness:<nil> OomKillDisable:<nil> PidsLimit:<nil> Ulimits:[] CPUCount:0 CPUPercent:0 IOMaximumIOps:0 IOMaximumBandwidth:0} Mounts:[] MaskedPaths:[] ReadonlyPaths:[] Init:0xc0009fa520} NetworkingConfig:{EndpointsConfig:map[727933b590549a5d51d3e80d6ed4e5676e55da441b783651af750589127b0e1f:0xc000978900]}}
INFO[0018] Pulling image 'docker.io/rancher/k3d-proxy:v3.0.0'
{"status":"Pulling from rancher/k3d-proxy","id":"v3.0.0"}
{"status":"Pulling fs layer","progressDetail":{},"id":"e7c96db7181b"}
{"status":"Pulling fs layer","progressDetail":{},"id":"8985e402e050"}
{"status":"Pulling fs layer","progressDetail":{},"id":"986b35da0011"}
{"status":"Pulling fs layer","progressDetail":{},"id":"7c96bf099c58"}
{"status":"Pulling fs layer","progressDetail":{},"id":"490fe1a32c6d"}
{"status":"Pulling fs layer","progressDetail":{},"id":"5f0121973a8b"}
{"status":"Waiting","progressDetail":{},"id":"7c96bf099c58"}
{"status":"Waiting","progressDetail":{},"id":"490fe1a32c6d"}
{"status":"Waiting","progressDetail":{},"id":"5f0121973a8b"}
{"status":"Downloading","progressDetail":{"current":27916,"total":2757034},"progress":"[\u003e
                       ]  27.92kB/2.757MB","id":"e7c96db7181b"}
{"status":"Downloading","progressDetail":{"current":58121,"total":5659470},"progress":"[\u003e
                       ]  58.12kB/5.659MB","id":"8985e402e050"}
{"status":"Downloading","progressDetail":{"current":77587,"total":7653507},"progress":"[\u003e
                       ]  77.59kB/7.654MB","id":"986b35da0011"}
{"status":"Verifying Checksum","progressDetail":{},"id":"e7c96db7181b"}
{"status":"Download complete","progressDetail":{},"id":"e7c96db7181b"}
{"status":"Extracting","progressDetail":{"current":32768,"total":2757034},"progress":"[\u003e
                      ]  32.77kB/2.757MB","id":"e7c96db7181b"}
{"status":"Downloading","progressDetail":{"current":363775,"total":5659470},"progress":"[===\u003e
                        ]  363.8kB/5.659MB","id":"8985e402e050"}
{"status":"Downloading","progressDetail":{"current":650495,"total":7653507},"progress":"[====\u003e
                        ]  650.5kB/7.654MB","id":"986b35da0011"}
{"status":"Extracting","progressDetail":{"current":196608,"total":2757034},"progress":"[===\u003e
                       ]  196.6kB/2.757MB","id":"e7c96db7181b"}
{"status":"Downloading","progressDetail":{"current":1092059,"total":5659470},"progress":"[=========\u003e
                         ]  1.092MB/5.659MB","id":"8985e402e050"}
{"status":"Downloading","progressDetail":{"current":1477079,"total":7653507},"progress":"[=========\u003e
                         ]  1.477MB/7.654MB","id":"986b35da0011"}
{"status":"Downloading","progressDetail":{"current":424,"total":494},"progress":"[==========================================\u003e        ]     424B/494B","id":"7c96bf099c58"}
{"status":"Downloading","progressDetail":{"current":494,"total":494},"progress":"[==================================================\u003e]     494B/494B","id":"7c96bf099c58"}
{"status":"Verifying Checksum","progressDetail":{},"id":"7c96bf099c58"}
{"status":"Download complete","progressDetail":{},"id":"7c96bf099c58"}
{"status":"Downloading","progressDetail":{"current":3054043,"total":5659470},"progress":"[==========================\u003e                        ]  3.054MB/5.659MB","id":"8985e402e050"}
{"status":"Downloading","progressDetail":{"current":2378199,"total":7653507},"progress":"[===============\u003e
                         ]  2.378MB/7.654MB","id":"986b35da0011"}
{"status":"Extracting","progressDetail":{"current":294912,"total":2757034},"progress":"[=====\u003e
                       ]  294.9kB/2.757MB","id":"e7c96db7181b"}
{"status":"Downloading","progressDetail":{"current":5405147,"total":5659470},"progress":"[===============================================\u003e   ]  5.405MB/5.659MB","id":"8985e402e050"}
{"status":"Downloading","progressDetail":{"current":5659470,"total":5659470},"progress":"[==================================================\u003e]  5.659MB/5.659MB","id":"8985e402e050"}
{"status":"Verifying Checksum","progressDetail":{},"id":"8985e402e050"}
{"status":"Download complete","progressDetail":{},"id":"8985e402e050"}
{"status":"Downloading","progressDetail":{"current":4938199,"total":7653507},"progress":"[================================\u003e                  ]  4.938MB/7.654MB","id":"986b35da0011"}
{"status":"Verifying Checksum","progressDetail":{},"id":"986b35da0011"}
{"status":"Download complete","progressDetail":{},"id":"986b35da0011"}
{"status":"Downloading","progressDetail":{"current":281,"total":281},"progress":"[==================================================\u003e]     281B/281B","id":"490fe1a32c6d"}
{"status":"Verifying Checksum","progressDetail":{},"id":"490fe1a32c6d"}
{"status":"Download complete","progressDetail":{},"id":"490fe1a32c6d"}
{"status":"Extracting","progressDetail":{"current":622592,"total":2757034},"progress":"[===========\u003e
                       ]  622.6kB/2.757MB","id":"e7c96db7181b"}
{"status":"Downloading","progressDetail":{"current":293,"total":293},"progress":"[==================================================\u003e]     293B/293B","id":"5f0121973a8b"}
{"status":"Verifying Checksum","progressDetail":{},"id":"5f0121973a8b"}
{"status":"Download complete","progressDetail":{},"id":"5f0121973a8b"}
{"status":"Extracting","progressDetail":{"current":1277952,"total":2757034},"progress":"[=======================\u003e                           ]  1.278MB/2.757MB","id":"e7c96db7181b"}
{"status":"Extracting","progressDetail":{"current":1966080,"total":2757034},"progress":"[===================================\u003e               ]  1.966MB/2.757MB","id":"e7c96db7181b"}
{"status":"Extracting","progressDetail":{"current":2555904,"total":2757034},"progress":"[==============================================\u003e    ]  2.556MB/2.757MB","id":"e7c96db7181b"}
{"status":"Extracting","progressDetail":{"current":2757034,"total":2757034},"progress":"[==================================================\u003e]  2.757MB/2.757MB","id":"e7c96db7181b"}
{"status":"Pull complete","progressDetail":{},"id":"e7c96db7181b"}
{"status":"Extracting","progressDetail":{"current":65536,"total":5659470},"progress":"[\u003e
                      ]  65.54kB/5.659MB","id":"8985e402e050"}
{"status":"Extracting","progressDetail":{"current":196608,"total":5659470},"progress":"[=\u003e
                       ]  196.6kB/5.659MB","id":"8985e402e050"}
{"status":"Extracting","progressDetail":{"current":917504,"total":5659470},"progress":"[========\u003e
                       ]  917.5kB/5.659MB","id":"8985e402e050"}
{"status":"Extracting","progressDetail":{"current":1572864,"total":5659470},"progress":"[=============\u003e
                        ]  1.573MB/5.659MB","id":"8985e402e050"}
{"status":"Extracting","progressDetail":{"current":2031616,"total":5659470},"progress":"[=================\u003e
                        ]  2.032MB/5.659MB","id":"8985e402e050"}
{"status":"Extracting","progressDetail":{"current":2359296,"total":5659470},"progress":"[====================\u003e                              ]  2.359MB/5.659MB","id":"8985e402e050"}
{"status":"Extracting","progressDetail":{"current":3014656,"total":5659470},"progress":"[==========================\u003e                        ]  3.015MB/5.659MB","id":"8985e402e050"}
{"status":"Extracting","progressDetail":{"current":3211264,"total":5659470},"progress":"[============================\u003e                      ]  3.211MB/5.659MB","id":"8985e402e050"}
{"status":"Extracting","progressDetail":{"current":3538944,"total":5659470},"progress":"[===============================\u003e                   ]  3.539MB/5.659MB","id":"8985e402e050"}
{"status":"Extracting","progressDetail":{"current":3735552,"total":5659470},"progress":"[=================================\u003e                 ]  3.736MB/5.659MB","id":"8985e402e050"}
{"status":"Extracting","progressDetail":{"current":3932160,"total":5659470},"progress":"[==================================\u003e                ]  3.932MB/5.659MB","id":"8985e402e050"}
{"status":"Extracting","progressDetail":{"current":4784128,"total":5659470},"progress":"[==========================================\u003e        ]  4.784MB/5.659MB","id":"8985e402e050"}
{"status":"Extracting","progressDetail":{"current":5111808,"total":5659470},"progress":"[=============================================\u003e     ]  5.112MB/5.659MB","id":"8985e402e050"}
{"status":"Extracting","progressDetail":{"current":5177344,"total":5659470},"progress":"[=============================================\u003e     ]  5.177MB/5.659MB","id":"8985e402e050"}
{"status":"Extracting","progressDetail":{"current":5242880,"total":5659470},"progress":"[==============================================\u003e    ]  5.243MB/5.659MB","id":"8985e402e050"}
{"status":"Extracting","progressDetail":{"current":5308416,"total":5659470},"progress":"[==============================================\u003e    ]  5.308MB/5.659MB","id":"8985e402e050"}
{"status":"Extracting","progressDetail":{"current":5373952,"total":5659470},"progress":"[===============================================\u003e   ]  5.374MB/5.659MB","id":"8985e402e050"}
ERRO[0022] Failed waiting for log message 'Wrote kubeconfig' from node 'k3d-multiserver-server-2'
WARN[0022] Couldn't get docker output
WARN[0022] context canceled
DEBU[0022] FIXME: Got an status-code for which error does not match any expected type!!!: -1  module=api status_code=-1
ERRO[0022] Failed to create container 'k3d-multiserver-serverlb'
ERRO[0022] Failed to create node 'k3d-multiserver-serverlb'
ERRO[0022] Failed to create loadbalancer
ERRO[0022] error during connect: Post "http://%2Fvar%2Frun%2Fdocker.sock/v1.40/containers/create?name=k3d-multiserver-serverlb": context canceled
ERRO[0022] Failed to create cluster >>> Rolling Back
INFO[0022] Deleting cluster 'multiserver'
DEBU[0022] Cluster Details: &{Name:multiserver Network:{Name:727933b590549a5d51d3e80d6ed4e5676e55da441b783651af750589127b0e1f External:false} Token:NPlYrJMlajHJcWrewdwp Nodes:[0xc00034d440 0xc00034d560 0xc00034d680 0xc0009d7440] InitNode:0xc00034d440 ExternalDatastore:{Endpoint: CAFile: CertFile: KeyFile: Network:} CreateClusterOpts:0xc000269680 ExposeAPI:{Host:0.0.0.0 HostIP:0.0.0.0 Port:41801} ServerLoadBalancer:0xc00034d320 ImageVolume:}
DEBU[0022] FIXME: Got an status-code for which error does not match any expected type!!!: -1  module=api status_code=-1
ERRO[0022] Failed to inspect container '42541a5ba5ab0de08857e4bd8d19a5f5e6c677b09a49740b9f2a67a89a43dd97'
ERRO[0022] Failed waiting for log message 'Wrote kubeconfig' from node 'k3d-multiserver-server-1'
INFO[0024] Deleted k3d-multiserver-server-0
INFO[0024] Deleted k3d-multiserver-server-1
INFO[0024] Deleted k3d-multiserver-server-2
ERRO[0024] Failed to delete container 'k3d-multiserver-serverlb'
WARN[0024] Failed to delete node 'k3d-multiserver-serverlb': Try to delete it manually
INFO[0024] Deleting cluster network '727933b590549a5d51d3e80d6ed4e5676e55da441b783651af750589127b0e1f'
ERRO[0025] Failed to delete 1 nodes: Try to delete them manually
FATA[0025] Cluster creation FAILED, also FAILED to rollback changes!

@Filius-Patris
Copy link
Contributor Author

@zasherif Could you

1.) Give some hints about your machine? (VPS/Mac/Ubuntu/whatever...)

2.) Try rebooting the machine?

@zasherif
Copy link

@Filius-Patris I use 18.04.1-Ubuntu, with 5.4.0 kerenel and Docker version 19.03.12, build 48a66213fe
I did reboot the machine, and the problem still ongoing...

@Filius-Patris
Copy link
Contributor Author

Another piece of information btw:

The docker daemon logs (or in my case, terminal output, I started it manually) show an error in the docker engine:

...
INFO[2020-09-10T20:21:46.246721265Z] API listen on /var/run/docker.sock           
INFO[2020-09-10T20:23:20.015412014Z] ignoring event                                module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
ERRO[2020-09-10T20:23:20.359166510Z] Not continuing with pull after error: context canceled 
INFO[2020-09-10T20:23:20.644963854Z] ignoring event                                module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
INFO[2020-09-10T20:23:20.923477965Z] ignoring event                                module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
...

Can't tell if this is cause or symptom, but I figured documenting doesn't hurt.

@rogeliodh
Copy link

I also reproduce the issue in two machines:

k3d-v3.0.1 | ubuntu-20.04LTS | docker-ce 5:19.03.123-0ubuntu-focal
k3d-v3.0.1 | ubuntu-20.04LTS | docker-ce 5:18.09.53-0ubuntu-cosmic

by running
k3d cluster create testing --servers 3

@iwilltry42 I can provide you with full root access to one of these machines (it is a VPS in OVH) for as long as you need. Just confirm if you want it.

@iwilltry42
Copy link
Member

iwilltry42 commented Sep 20, 2020

Hey 👋
I finally found the time to investigate on this a bit again.
I can verify, that this issue happens "reliably" for me when using k3s >= v1.19.1-k3s1 (so using etcd instead of dqlite already).

@Filius-Patris I think the docker daemon logs you posted are just a "side-effect" of the actual problem: because we're failing to wait for the log message from a secondary master node, we're cancelling the overall used context and thus aborting all ongoing operations (like pulling the image for the loadbalancer and creating the other nodes).

So far I couldn't reproduce this with versions lower than 1.19, so @rogeliodh , I'd definitely like to make use of your offer to have an environment where I can reproduce it 👍

Good for debugging this: I just released k3d v3.0.2 which includes a new --no-rollback flag which prevents the automatic rollback upon failure so you can easier inspect the docker logs of the failed containers.

@iwilltry42
Copy link
Member

Test run on @rogeliodh server:

I figured, that it will also fail sometimes, if you only create a single-server cluster (with the --cluster-init flag).
However, I found that

  • ivps231673:~$ k3d cluster create test --verbose --image rancher/k3s:v1.17.6-k3s1 --no-rollback --k3s-server-arg '--cluster-init'
    worked, while
  • k3d cluster create test --verbose --image rancher/k3s:v1.17.6-k3s1 --no-rollback --servers 3
    fails.

I could imagine, that the environment comes into play here, as the CPU maxes out even when starting only a single server. SO I could imagine, that the leader election fails due to resource limitations, making the whole cluster crash 🤔

@iwilltry42 iwilltry42 added the help wanted Extra attention is needed label Sep 20, 2020
@deromemont
Copy link

deromemont commented Sep 26, 2020

Hello !

Thx for your help on this bug

It's the same for me with multiple server creation "--servers 3"
But I have the same error with udp port parameter on the creation

k3d cluster create mycluster -p 0.0.0.0:32000:32000/udp@loadbalancer

INFO[0000] Created network 'k3d-mycluster'
INFO[0000] Created volume 'k3d-mycluster-images'
INFO[0001] Creating node 'k3d-mycluster-server-0'
INFO[0001] Creating LoadBalancer 'k3d-mycluster-serverlb'
ERRO[0005] Failed waiting for log message 'start worker processes' from node 'k3d-mycluster-serverlb'
ERRO[0005] Failed to inspect container '4f65c893c8407de7b0537cf8cbcc9ffeabc91d24eb3495dc9d908227ece228b8'
ERRO[0005] Failed waiting for log message 'Wrote kubeconfig' from node 'k3d-mycluster-server-0'
ERRO[0005] Failed to bring up all server nodes (and loadbalancer) in time. Check the logs:
ERRO[0005] >>> Node 'k3d-mycluster-serverlb' (container '6d6c25fd5f2c55df3c1663448315411d34d8141a978806d131e5bff5b1bc0412') not running
ERRO[0005] Failed to bring up cluster
ERRO[0005] Failed to create cluster >>> Rolling Back
INFO[0005] Deleting cluster 'mycluster'
INFO[0005] Deleted k3d-mycluster-server-0
INFO[0005] Deleted k3d-mycluster-serverlb
INFO[0005] Deleting cluster network '3a321f383fe51fb6da9cd1617bc2020a24b33cc2a4dbfd171c5b3113e96eb110'
FATA[0005] Cluster creation FAILED, all changes have been rolled back!

I hope it's the same error and I hope it can help

@iwilltry42
Copy link
Member

Hi @deromemont , thanks for adding your comment here :)
However, I think in your case, the issue is due to the udp port, which is currently not supported on the loadbalancer.
The reason is, that with the default tcp ports, we're creating streaming rules in the nginx config, which doesn't really work for UDP right now. You may want to try exposing the UDP port on any other node though.
You may want to hook up to this issue: #282

@Filius-Patris
Copy link
Contributor Author

Filius-Patris commented Sep 29, 2020

I could imagine, that the environment comes into play here, as the CPU maxes out even when starting only a single server. SO I could imagine, that the leader election fails due to resource limitations, making the whole cluster crash 🤔

This is a testable statement. So I went ahead and tested it. I have some spare credits on vultr, thus I used this provider.

I made the following install script:

#!/bin/sh

snap install docker && \
curl -s https://raw.githubusercontent.com/rancher/k3d/main/install.sh | bash && \
k3d cluster create testing --servers 3 --agents 3 && \
shutdown -h now

Then I made the following VPSes and specified above as a start script:

  • 1CPU, 1024MB -- ❌ (was still up)
  • 1CPU, 2048MB -- ❌
  • 2CPU, 4096MB -- ✅ (shut down)
  • 4CPU, 8192MB -- ✅
  • 6CPU, 16384MB -- ✅

This assertion might be correct, and if I can make another thesis on top: One CPU isn't enough.

@iwilltry42
Copy link
Member

Thanks for your tests @Filius-Patris !
I would've hoped, that the resource requirements are a little smaller, but so be it... I guess those are the trade-offs for having a container-inception 😬
I could imagine, that, just as the VM on Docker for Desktop, Snap on Ubuntu/Linux may play a role here, as I know from other programs on my own machine, that if they run as a snap package (like Docker in your examples), they use way more resources than usual 🤔

@mamiu
Copy link

mamiu commented Oct 7, 2020

I've assigned Docker 4 CPUs and 8GB RAM and I get this error on 3 or more servers (--servers 3).

SO I could imagine, that the leader election fails due to resource limitations, making the whole cluster crash

@iwilltry42 If you are right with your assumption, this is a bug in k3s isn't it? 🤔 Because k3s was designed exactly for edge cases and shouldn't crash when only limited resources are available. Or is it not related to k3s and merely a docker issue?

Off-topic:
@iwilltry42 Wie es scheint arbeitest du für @trivago aber maintainst ein Projekt von @rancher. Machst du das ehrenamtlich oder hast du oder trivago vertragliche Vereinbarungen mit rancher? Schöne Grüße aus Australien. 😀

@iwilltry42
Copy link
Member

I've assigned Docker 4 CPUs and 8GB RAM and I get this error on 3 or more servers (--servers 3).

SO I could imagine, that the leader election fails due to resource limitations, making the whole cluster crash

@iwilltry42 If you are right with your assumption, this is a bug in k3s isn't it? thinking Because k3s was designed exactly for edge cases and shouldn't crash when only limited resources are available. Or is it not related to k3s and merely a docker issue?

You're right, that k3s shouldn't have an issue with limited resources, so I would assume that it's the additional layer (docker) here that introduces those issues. Though I cannot verify this right now..

Off-topic:
@iwilltry42 Wie es scheint arbeitest du für @trivago aber maintainst ein Projekt von @rancher. Machst du das ehrenamtlich oder hast du oder trivago vertragliche Vereinbarungen mit rancher? Schöne Grüße aus Australien. grinning

Das ist richtig 😁
k3d habe ich als mein eigenes Projekt gestartet (inspiriert durch ein Problem bei meiner Arbeit bei trivago).
Das wurde dann auf GitHub durch Rancher übernommen, damit es näher beim "Schwesterprojekt" k3s ist und mehr Reichweite erlangt. Das habe ich eine ganze Zeit lang privat gemacht, aber mittlerweile unterstützt mich Rancher in dieser Rolle und zukünftig werde ich meine Arbeit als Freelancer fortsetzen. k3d ist und bleibt aber ein Community-Projekt 👍 Schöne Grüße zurück rund um die Welt 😉 Du kannst mich gerne über Twitter o.ä. anschreiben, wenn du mehr wissen möchtest, sodass wir nicht die Issues hier zuspammen 😁

@sdghchj
Copy link

sdghchj commented Nov 4, 2020

Any progress on this issue?

@Filius-Patris
Copy link
Contributor Author

Filius-Patris commented Nov 4, 2020

@sdghchj If you could test the above scripts but don't install Docker via snap, but rather their PPA or something other native. Run the tests again and report if (what) you have different. That would be helpful!

@zengzhengrong
Copy link

zengzhengrong commented Jan 2, 2021

I juest meet same problem by command

k3d cluster create demo --volume "/etc/rancher/k3d/registries.yaml:/etc/rancher/k3s/registries.yaml@server[0]" -p "8081:80@loadbalancer" -a 2

It show Infinite output log

INFO[0000] Prep: Network                                
INFO[0000] Created network 'k3d-demo'                   
INFO[0000] Created volume 'k3d-demo-images'             
INFO[0001] Creating node 'k3d-demo-server-0'            
INFO[0001] Creating node 'k3d-demo-agent-0'             
INFO[0001] Creating node 'k3d-demo-agent-1'             
INFO[0002] Creating LoadBalancer 'k3d-demo-serverlb'    
INFO[0003] Starting cluster 'demo'                      
INFO[0003] Starting Node 'k3d-demo-server-0'            
WARN[0123] Node 'k3d-demo-server-0' is restarting for more than a minute now. Possibly it will recover soon (e.g. when it's waiting to join). Consider using a creation timeout to avoid waiting forever in a Restart Loop. 
WARN[0123] Node 'k3d-demo-server-0' is restarting for more than a minute now. Possibly it will recover soon (e.g. when it's waiting to join). Consider using a creation timeout to avoid waiting forever in a Restart Loop. 
WARN[0124] Node 'k3d-demo-server-0' is restarting for more than a minute now. Possibly it will recover soon (e.g. when it's waiting to join). Consider using a creation timeout to avoid waiting forever in a Restart Loop. 
WARN[0124] Node 'k3d-demo-server-0' is restarting for more than a minute now. Possibly it will recover soon (e.g. when it's waiting to join). Consider using a creation timeout to avoid waiting forever in a Restart Loop. 
WARN[0125] Node 'k3d-demo-server-0' is restarting for more than a minute now. Possibly it will recover soon (e.g. when it's waiting to join). Consider using a creation timeout to avoid waiting forever in a Restart Loop. 
WARN[0125] Node 'k3d-demo-server-0' is restarting for more than a minute now. Possibly it will recover soon (e.g. when it's waiting to join). Consider using a creation timeout to avoid waiting forever in a Restart Loop. 
WARN[0126] Node 'k3d-demo-server-0' is restarting for more than a minute now. Possibly it will recover soon (e.g. when it's waiting to join). Consider using a creation timeout to avoid waiting forever in a Restart Loop. 
WARN[0126] Node 'k3d-demo-server-0' is restarting for more than a minute now. Possibly it will recover soon (e.g. when it's waiting to join). Consider using a creation timeout to avoid waiting forever in a Restart Loop. 
WARN[0127] Node 'k3d-demo-server-0' is restarting for more than a minute now. Possibly it will recover soon (e.g. when it's waiting to join). Consider using a creation timeout to avoid waiting forever in a Restart Loop. 

Docker version (docker desdktop for windows)

Client: Docker Engine - Community
 Version:           20.10.1
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        831ebea
 Built:             Tue Dec 15 04:34:48 2020
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.0
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       eeddea2
  Built:            Tue Dec  8 18:58:04 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.4.3
  GitCommit:        269548fa27e0089a8b8278fc4fc781d7f65a939b
 runc:
  Version:          1.0.0-rc92
  GitCommit:        ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

But i don't use -v or --vloume flag ,it can create cluster successfully

@iwilltry42
Copy link
Member

Hi @zengzhengrong , seems like your issue is different (different cause). You may want to use the --no-rollback flag to disable automatic deletion of containers after the cluster creation failed and then use docker logs k3d-demo-server-0 to get the error logs from the node container.
I assume that something may be wrong with the file that you're trying to mount.
If you need further help, please open a new issue or discussion 👍

@iwilltry42
Copy link
Member

Is there any update on this? If not, I think we may want to close this issue as "stale" 🤔

@iwilltry42 iwilltry42 added this to the Backlog milestone Feb 5, 2021
@Filius-Patris
Copy link
Contributor Author

AFAWK it was due to constrained hardware resources. Because k3d is designed for such environments, it might be sensible to document some minimal hw requirements (especially bc I couldn't run it on my dev machine, MB Air 17).
However, anyone searching for this issue should stumble upon this thread.
IMHO we could close this as stale.

Dear future reader: Upgrade your hardware / rent a server with more beef.

@iwilltry42
Copy link
Member

Good disclaimer 😅
Still don't get it though as I have it running in a RasPi 🤔
Anyway..

@Filius-Patris
Copy link
Contributor Author

Yeah, I ran it on Docker Desktop. I have reinstalled it since, but by default it only allocates 2GiB of RAM to the VM running Docker. The rather recent RasPis suffice the (supposed) requirement of 2 cores and 4GiB RAM.

btw, I've just prepared a PR for the docs. If further discussions arise, we could discuss them there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

10 participants