Skip to content
This repository has been archived by the owner on Dec 8, 2023. It is now read-only.

Requesting for resources using kubectl throws an error #47

Closed
rudimk opened this issue Apr 26, 2019 · 5 comments
Closed

Requesting for resources using kubectl throws an error #47

rudimk opened this issue Apr 26, 2019 · 5 comments

Comments

@rudimk
Copy link

rudimk commented Apr 26, 2019

I'm running the latest release of k3OS on VirtualBox 6.0.6 r130049. Running kubectl get nodes throws the following error:

The connection to the server localhost:6443 was refused - did you specify the right host or port?

I tried other resources - I get the same result. At first I thought there may be an issue with the VM's networking - but on further reflection, I no longer think so. Probably a configuration issue of some sort.

Attaching a screenshot.

image

@kewinbrand
Copy link

kewinbrand commented Jul 26, 2019

Any update on this? I'm facing exactly the same problem
Tested on v0.2.1 and v0.2.0

@rudimk
Copy link
Author

rudimk commented Jul 27, 2019

I haven't played with it since. I've got some new bare metal servers coming in next week, and I'm thinking I'll try running this on KVM, see if the issue still persists.

@metahertz
Copy link

metahertz commented Jul 30, 2019

+1 same issue. Both live booted and once installed. Occurs on KVM-based and bare metal.

It seems containerd is crashing on startup as it's trying to listen on a very random IP which the host doesn't own.

containerd-failing-k3os

This comes from the log in /var/lib/rancher/k3s/agent/containerd/containerd.log.
Adding this IP as a loopback (so that it can be used to open a listening tcp socket) works around the issue, the node comes up and kubectl can be used;

k3os-16394 [/var/lib/rancher/k3s/agent/containerd]# ip addr add 104.18.46.239/32 dev lo

k3os-16394 [/var/lib/rancher/k3s/agent/containerd]# kubectl get nodes
NAME         STATUS   ROLES    AGE     VERSION
k3os-16394   Ready    <none>   6m27s   v1.14.1-k3s.4

Confirmed this in both the live and installed versions of the 2.0 release.

Still hunting down where the IP is coming from, i haven't found a containerd config file as such yet.

containerd appears to be started via the k3s server command which is called by this init script:

cat /etc/init.d/k3s-service 
#!/sbin/openrc-run

depend() {
    after net-online
    need net
}

start_pre() {
    rm -f /tmp/k3s.*
}

supervisor=supervise-daemon
name="k3s-service"
command="/sbin/k3s"
command_args="server >>/var/log/k3s-service.log 2>&1"
pidfile="/var/run/k3s-service.pid"
respawn_delay=5
--SNIP--

IP belongs to cloudflare, wondering if it's an "are we online" kinda check in the startup scripts which then gets wrongly used in a config template. I'll keep digging.

@metahertz
Copy link

metahertz commented Jul 30, 2019

OK, found it.

TL;DR There's probably something up with your DNS settings, which you're probably getting via DHCP. Test this by changing /etc/resolv.conf to just read nameserver 8.8.8.8 and removing any search statements and see if everything starts working within a couple of mins (ie kubectl get nodes works).

Issue
The containerd configuration file in /var/lib/rancher/k3s/agent/etc/containerd/config.toml refers to a listen address as the unqualified hostname of the k3os machine.

So containerd is trying to resolve this name into an IP, so that it knows what IP it needs to listen on for starting the service.

The issue occurs if you have a search path configured in your DNS settings (or the ones delivered by your DHCP server), AND the domain in your search path also happens to respond with an IP, for example if the search path you have configured has a wildcard DNS record, so anything.search.path.domain.com will respond with an answer.

This is what was happening to me, i get a cloudflare IP back because of a wildcard DNS entry i configured years ago, and so containerd then tries to use it as the IP to listen on, which obviously fails because my k3os machine doesn't have a local IP matching that.

What should happen, is nothing should respond, the DNS query will fall through to the local hosts file, without a domain, where it will match localhost, resolve to 127.0.0.1 and containerd will start listening properly.

Although why that couldn't just be hard-coded to 127.0.0.1 i dont know
Hope this helps someone else! I really should clean up my DNS lmao.

containerd-config

bad-dhcp-dns

@dweomer
Copy link
Contributor

dweomer commented Jan 4, 2020

Fixed via #170

@dweomer dweomer closed this as completed Jan 4, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants