You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Total memory: 1.9 GiB (pass)Disk space available for /var/lib/k0s: 17.9 GiB (pass)Name resolution: localhost: [::1 127.0.0.1] (pass)Operating system: Linux (pass) Linux kernel release: 6.1.0-13-cloud-amd64 (pass) Max. file descriptors per process: current: 1048576 / max: 1048576 (pass) AppArmor: active (pass) Executable in PATH: modprobe: /usr/sbin/modprobe (pass) Executable in PATH: mount: /usr/bin/mount (pass) Executable in PATH: umount: /usr/bin/umount (pass) /proc file system: mounted (0x9fa0) (pass) Control Groups: version 2 (pass) cgroup controller "cpu": available (is a listed root controller) (pass) cgroup controller "cpuacct": available (via cpu in version 2) (pass) cgroup controller "cpuset": available (is a listed root controller) (pass) cgroup controller "memory": available (is a listed root controller) (pass) cgroup controller "devices": available (device filters attachable) (pass) cgroup controller "freezer": available (cgroup.freeze exists) (pass) cgroup controller "pids": available (is a listed root controller) (pass) cgroup controller "hugetlb": available (is a listed root controller) (pass) cgroup controller "blkio": available (via io in version 2) (pass) CONFIG_CGROUPS: Control Group support: built-in (pass) CONFIG_CGROUP_FREEZER: Freezer cgroup subsystem: built-in (pass) CONFIG_CGROUP_PIDS: PIDs cgroup subsystem: built-in (pass) CONFIG_CGROUP_DEVICE: Device controller for cgroups: built-in (pass) CONFIG_CPUSETS: Cpuset support: built-in (pass) CONFIG_CGROUP_CPUACCT: Simple CPU accounting cgroup subsystem: built-in (pass) CONFIG_MEMCG: Memory Resource Controller for Control Groups: built-in (pass) CONFIG_CGROUP_HUGETLB: HugeTLB Resource Controller for Control Groups: built-in (pass) CONFIG_CGROUP_SCHED: Group CPU scheduler: built-in (pass) CONFIG_FAIR_GROUP_SCHED: Group scheduling for SCHED_OTHER: built-in (pass) CONFIG_CFS_BANDWIDTH: CPU bandwidth provisioning for FAIR_GROUP_SCHED: built-in (pass) CONFIG_BLK_CGROUP: Block IO controller: built-in (pass) CONFIG_NAMESPACES: Namespaces support: built-in (pass) CONFIG_UTS_NS: UTS namespace: built-in (pass) CONFIG_IPC_NS: IPC namespace: built-in (pass) CONFIG_PID_NS: PID namespace: built-in (pass) CONFIG_NET_NS: Network namespace: built-in (pass) CONFIG_NET: Networking support: built-in (pass) CONFIG_INET: TCP/IP networking: built-in (pass) CONFIG_IPV6: The IPv6 protocol: built-in (pass) CONFIG_NETFILTER: Network packet filtering framework (Netfilter): built-in (pass) CONFIG_NETFILTER_ADVANCED: Advanced netfilter configuration: built-in (pass) CONFIG_NF_CONNTRACK: Netfilter connection tracking support: module (pass) CONFIG_NETFILTER_XTABLES: Netfilter Xtables support: module (pass) CONFIG_NETFILTER_XT_TARGET_REDIRECT: REDIRECT target support: module (pass) CONFIG_NETFILTER_XT_MATCH_COMMENT: "comment" match support: module (pass) CONFIG_NETFILTER_XT_MARK: nfmark target and match support: module (pass) CONFIG_NETFILTER_XT_SET: set target and match support: module (pass) CONFIG_NETFILTER_XT_TARGET_MASQUERADE: MASQUERADE target support: module (pass) CONFIG_NETFILTER_XT_NAT: "SNAT and DNAT" targets support: module (pass) CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: "addrtype" address type match support: module (pass) CONFIG_NETFILTER_XT_MATCH_CONNTRACK: "conntrack" connection tracking match support: module (pass) CONFIG_NETFILTER_XT_MATCH_MULTIPORT: "multiport" Multiple port match support: module (pass) CONFIG_NETFILTER_XT_MATCH_RECENT: "recent" match support: module (pass) CONFIG_NETFILTER_XT_MATCH_STATISTIC: "statistic" match support: module (pass) CONFIG_NETFILTER_NETLINK: module (pass) CONFIG_NF_NAT: module (pass) CONFIG_IP_SET: IP set support: module (pass) CONFIG_IP_SET_HASH_IP: hash:ip set support: module (pass) CONFIG_IP_SET_HASH_NET: hash:net set support: module (pass) CONFIG_IP_VS: IP virtual server support: module (pass) CONFIG_IP_VS_NFCT: Netfilter connection tracking: built-in (pass) CONFIG_IP_VS_SH: Source hashing scheduling: module (pass) CONFIG_IP_VS_RR: Round-robin scheduling: module (pass) CONFIG_IP_VS_WRR: Weighted round-robin scheduling: module (pass) CONFIG_NF_CONNTRACK_IPV4: IPv4 connetion tracking support (required for NAT): unknown (warning) CONFIG_NF_REJECT_IPV4: IPv4 packet rejection: module (pass) CONFIG_NF_NAT_IPV4: IPv4 NAT: unknown (warning) CONFIG_IP_NF_IPTABLES: IP tables support: module (pass) CONFIG_IP_NF_FILTER: Packet filtering: module (pass) CONFIG_IP_NF_TARGET_REJECT: REJECT target support: module (pass) CONFIG_IP_NF_NAT: iptables NAT support: module (pass) CONFIG_IP_NF_MANGLE: Packet mangling: module (pass) CONFIG_NF_DEFRAG_IPV4: module (pass) CONFIG_NF_CONNTRACK_IPV6: IPv6 connetion tracking support (required for NAT): unknown (warning) CONFIG_NF_NAT_IPV6: IPv6 NAT: unknown (warning) CONFIG_IP6_NF_IPTABLES: IP6 tables support: module (pass) CONFIG_IP6_NF_FILTER: Packet filtering: module (pass) CONFIG_IP6_NF_MANGLE: Packet mangling: module (pass) CONFIG_IP6_NF_NAT: ip6tables NAT support: module (pass) CONFIG_NF_DEFRAG_IPV6: module (pass) CONFIG_BRIDGE: 802.1d Ethernet Bridging: module (pass) CONFIG_LLC: module (pass) CONFIG_STP: module (pass) CONFIG_EXT4_FS: The Extended 4 (ext4) filesystem: built-in (pass) CONFIG_PROC_FS: /proc file system support: built-in (pass)
What happened?
I have three k0s controllers running on AWS ec2 instances behind an AWS NLB (network load balancer) and using k0sctl to deploy k0s to them. I'm finding that a secondary controller will sometimes fail to join the primary controller. k0sctl gives up waiting for the join and fails the job.
I'm finding it hard to diagnose the issue due to a lack of debug log telling me what's going on (see full log output below). For example, secondary controller's last log message is:
After the log line "initialized join client successfully", the second controller attempts to make an HTTP request to obtain certificates and key material from the first controller. My educated guess is that this request is blocking on network I/O. The load balancer is probably not responding about 50% of the time because it is balancing the load between the two controllers. (I assume the load balancer is using a round-robin or random load balancing strategy). When the load balancer tries to connect to the secondary controller, it fails because the secondary controller cannot respond to a join request from itself. It's not even listening on the join port yet. So I think the LB is trying to reach the secondary controller, keeping the HTTP request open and not responding yet. K0s retries join requests up to ten times. However, it doesn't define any timeouts, so it probably gets stuck on the first request.
Can you maybe check the configuration of your load balancer? Does it have some backend health checks defined? What are the frontend and backend timeouts?
Thankyou for taking the time to respond. My hunch is that the issue lies with the load balancer. Yesterday I tore down the LB and backend hosts and recreated them. Since then, I have done many reset/apply actions and I haven't seen the issue occur. This time the 'client IP preservation' setting is disabled. I had previously disabled it, but still saw the issue occur (or so I thought), so I'm confused why it seems to be consistently working now...
However, it doesn't define any timeouts, so it probably gets stuck on the first request.
Would you be willing to add timeouts for read/write? I think this would be good in terms of best practice and to aid in debugging these kinds of unusual scenarios.
Before creating an issue, make sure you've checked the following:
Platform
Version
v1.30.1+k0s.0
Sysinfo
`k0s sysinfo`
What happened?
I have three k0s controllers running on AWS ec2 instances behind an AWS NLB (network load balancer) and using k0sctl to deploy k0s to them. I'm finding that a secondary controller will sometimes fail to join the primary controller. k0sctl gives up waiting for the join and fails the job.
I'm finding it hard to diagnose the issue due to a lack of debug log telling me what's going on (see full log output below). For example, secondary controller's last log message is:
If I do
k0sctl reset
and try again it often will work. Failure vs success is perhaps 50/50. I can't see any pattern to it otherwise.I did wonder if it might be related to a feature AWS NLB Client IP Preservation which is enabled by default. However turning this off doesn't help.
Steps to reproduce
Expected behavior
No response
Actual behavior
No response
Screenshots and logs
172.31.47.173 = primary controller
172.31.37.26 = secondary controller
primary k0scontroller log (full log) - https://gist.github.com/ianb-mp/326a61a02927fbf3d112c41e73d40a18#file-k0scontroller-primary-log
secondary k0scontroller log (full log)
k0sctl output (partial log - secondary controller failing to join)
Additional context
No response
The text was updated successfully, but these errors were encountered: