Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k0s reset hangs #4211

Closed
4 tasks done
ianb-mp opened this issue Mar 28, 2024 · 14 comments
Closed
4 tasks done

k0s reset hangs #4211

ianb-mp opened this issue Mar 28, 2024 · 14 comments
Assignees
Labels
bug Something isn't working Stale
Milestone

Comments

@ianb-mp
Copy link
Contributor

ianb-mp commented Mar 28, 2024

Before creating an issue, make sure you've checked the following:

  • You are running the latest released version of k0s
  • Make sure you've searched for existing issues, both open and closed
  • Make sure you've searched for PRs too, a fix might've been merged already
  • You're looking at docs for the released version, "main" branch docs are usually ahead of released versions.

Platform

Linux 5.14.0-362.18.1.el9_3.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Jan 24 23:11:18 UTC 2024 x86_64 GNU/Linux
NAME="Rocky Linux"
VERSION="9.3 (Blue Onyx)"
ID="rocky"
ID_LIKE="rhel centos fedora"
VERSION_ID="9.3"
PLATFORM_ID="platform:el9"
PRETTY_NAME="Rocky Linux 9.3 (Blue Onyx)"
ANSI_COLOR="0;32"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:rocky:rocky:9::baseos"
HOME_URL="https://rockylinux.org/"
BUG_REPORT_URL="https://bugs.rockylinux.org/"
SUPPORT_END="2032-05-31"
ROCKY_SUPPORT_PRODUCT="Rocky-Linux-9"
ROCKY_SUPPORT_PRODUCT_VERSION="9.3"
REDHAT_SUPPORT_PRODUCT="Rocky Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="9.3"

Version

v1.29.2+k0s.0

Sysinfo

`k0s sysinfo`
Machine ID: "03d6d8298235d58e9e2dbf6289372c0785833708115585ace6c868b2eb1bd173" (from machine) (pass)
Total memory: 125.0 GiB (pass)
Disk space available for /var/lib/k0s: 12.1 GiB (pass)
Name resolution: localhost: [::1 127.0.0.1] (pass)
Operating system: Linux (pass)
  Linux kernel release: 5.14.0-362.18.1.el9_3.x86_64 (pass)
  Max. file descriptors per process: current: 524288 / max: 524288 (pass)
  AppArmor: unavailable (pass)
  Executable in PATH: modprobe: /sbin/modprobe (pass)
  Executable in PATH: mount: /bin/mount (pass)
  Executable in PATH: umount: /bin/umount (pass)
  /proc file system: mounted (0x9fa0) (pass)
  Control Groups: version 2 (pass)
    cgroup controller "cpu": available (is a listed root controller) (pass)
    cgroup controller "cpuacct": available (via cpu in version 2) (pass)
    cgroup controller "cpuset": available (is a listed root controller) (pass)
    cgroup controller "memory": available (is a listed root controller) (pass)
    cgroup controller "devices": available (device filters attachable) (pass)
    cgroup controller "freezer": available (cgroup.freeze exists) (pass)
    cgroup controller "pids": available (is a listed root controller) (pass)
    cgroup controller "hugetlb": available (is a listed root controller) (pass)
    cgroup controller "blkio": available (via io in version 2) (pass)
  CONFIG_CGROUPS: Control Group support: built-in (pass)
    CONFIG_CGROUP_FREEZER: Freezer cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_PIDS: PIDs cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_DEVICE: Device controller for cgroups: built-in (pass)
    CONFIG_CPUSETS: Cpuset support: built-in (pass)
    CONFIG_CGROUP_CPUACCT: Simple CPU accounting cgroup subsystem: built-in (pass)
    CONFIG_MEMCG: Memory Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_HUGETLB: HugeTLB Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_SCHED: Group CPU scheduler: built-in (pass)
      CONFIG_FAIR_GROUP_SCHED: Group scheduling for SCHED_OTHER: built-in (pass)
        CONFIG_CFS_BANDWIDTH: CPU bandwidth provisioning for FAIR_GROUP_SCHED: built-in (pass)
    CONFIG_BLK_CGROUP: Block IO controller: built-in (pass)
  CONFIG_NAMESPACES: Namespaces support: built-in (pass)
    CONFIG_UTS_NS: UTS namespace: built-in (pass)
    CONFIG_IPC_NS: IPC namespace: built-in (pass)
    CONFIG_PID_NS: PID namespace: built-in (pass)
    CONFIG_NET_NS: Network namespace: built-in (pass)
  CONFIG_NET: Networking support: built-in (pass)
    CONFIG_INET: TCP/IP networking: built-in (pass)
      CONFIG_IPV6: The IPv6 protocol: built-in (pass)
    CONFIG_NETFILTER: Network packet filtering framework (Netfilter): built-in (pass)
      CONFIG_NETFILTER_ADVANCED: Advanced netfilter configuration: built-in (pass)
      CONFIG_NF_CONNTRACK: Netfilter connection tracking support: module (pass)
      CONFIG_NETFILTER_XTABLES: Netfilter Xtables support: built-in (pass)
        CONFIG_NETFILTER_XT_TARGET_REDIRECT: REDIRECT target support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_COMMENT: "comment" match support: module (pass)
        CONFIG_NETFILTER_XT_MARK: nfmark target and match support: module (pass)
        CONFIG_NETFILTER_XT_SET: set target and match support: module (pass)
        CONFIG_NETFILTER_XT_TARGET_MASQUERADE: MASQUERADE target support: module (pass)
        CONFIG_NETFILTER_XT_NAT: "SNAT and DNAT" targets support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: "addrtype" address type match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_CONNTRACK: "conntrack" connection tracking match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_MULTIPORT: "multiport" Multiple port match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_RECENT: "recent" match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_STATISTIC: "statistic" match support: module (pass)
      CONFIG_NETFILTER_NETLINK: module (pass)
      CONFIG_NF_NAT: module (pass)
      CONFIG_IP_SET: IP set support: module (pass)
        CONFIG_IP_SET_HASH_IP: hash:ip set support: module (pass)
        CONFIG_IP_SET_HASH_NET: hash:net set support: module (pass)
      CONFIG_IP_VS: IP virtual server support: module (pass)
        CONFIG_IP_VS_NFCT: Netfilter connection tracking: built-in (pass)
        CONFIG_IP_VS_SH: Source hashing scheduling: module (pass)
        CONFIG_IP_VS_RR: Round-robin scheduling: module (pass)
        CONFIG_IP_VS_WRR: Weighted round-robin scheduling: module (pass)
      CONFIG_NF_CONNTRACK_IPV4: IPv4 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_REJECT_IPV4: IPv4 packet rejection: module (pass)
      CONFIG_NF_NAT_IPV4: IPv4 NAT: unknown (warning)
      CONFIG_IP_NF_IPTABLES: IP tables support: module (pass)
        CONFIG_IP_NF_FILTER: Packet filtering: module (pass)
          CONFIG_IP_NF_TARGET_REJECT: REJECT target support: module (pass)
        CONFIG_IP_NF_NAT: iptables NAT support: module (pass)
        CONFIG_IP_NF_MANGLE: Packet mangling: module (pass)
      CONFIG_NF_DEFRAG_IPV4: module (pass)
      CONFIG_NF_CONNTRACK_IPV6: IPv6 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_NAT_IPV6: IPv6 NAT: unknown (warning)
      CONFIG_IP6_NF_IPTABLES: IP6 tables support: module (pass)
        CONFIG_IP6_NF_FILTER: Packet filtering: module (pass)
        CONFIG_IP6_NF_MANGLE: Packet mangling: module (pass)
        CONFIG_IP6_NF_NAT: ip6tables NAT support: module (pass)
      CONFIG_NF_DEFRAG_IPV6: module (pass)
    CONFIG_BRIDGE: 802.1d Ethernet Bridging: module (pass)
      CONFIG_LLC: module (pass)
      CONFIG_STP: module (pass)
  CONFIG_EXT4_FS: The Extended 4 (ext4) filesystem: module (pass)
  CONFIG_PROC_FS: /proc file system support: built-in (pass)

What happened?

Running k0s reset hangs (I've left it for over 15min)

$ k0s reset
W0328 10:55:19.811067    2859 logging.go:59] [core] [Channel #1 SubChannel #2] grpc: addrConn.createTransport failed to connect to {Addr: "/run/k0s/containerd.sock", ServerName: "localhost", Attributes: {"<%!p(networktype.keyType=grpc.internal.transport.networktype)>": "unix" }, }. Err: connection error: desc = "transport: Error while dialing: dial unix /run/k0s/containerd.sock: connect: no such file or directory"

I tried rebooting the host, however had the same issue when running k0s stop; k0s resetafter boot.

k0s happily starts & stops, it just won't uninstall.

$ k0s start
$ k0s status
Version: v1.29.2+k0s.0
Process ID: 4257
Role: controller
Workloads: true
SingleNode: true
Kube-api probing successful: true
Kube-api probing last error:  
$ k0s stop
$

Steps to reproduce

Not sure how to reproduce this reliably.

Expected behavior

Reset should not hang. At least it should timeout with an error.

Actual behavior

Hangs

Screenshots and logs

No response

Additional context

No response

@ianb-mp ianb-mp added the bug Something isn't working label Mar 28, 2024
@ianb-mp
Copy link
Contributor Author

ianb-mp commented Mar 28, 2024

I tried again with debug enabled: k0s reset -d

DEBU[2024-03-28 13:39:03] Starting debug server                         debug_server=":6060"
INFO[2024-03-28 13:39:03] * containers steps                           
DEBU[2024-03-28 13:39:03] starting containerd                          
DEBU[2024-03-28 13:39:03] started containerd successfully              
DEBU[2024-03-28 13:39:03] trying to list all pods                      
W0328 13:39:03.874524   13933 logging.go:59] [core] [Channel #1 SubChannel #2] grpc: addrConn.createTransport failed to connect to {Addr: "/run/k0s/containerd.sock", ServerName: "localhost", Attributes: {"<%!p(networktype.keyType=grpc.internal.transport.networktype)>": "unix" }, }. Err: connection error: desc = "transport: Error while dialing: dial unix /run/k0s/containerd.sock: connect: no such file or directory"
DEBU[2024-03-28 13:39:04] connected successfully using endpoint: unix:///run/k0s/containerd.sock 
DEBU[2024-03-28 13:39:04] ListPodSandboxRequest: &ListPodSandboxRequest{Filter:nil,} 
DEBU[2024-03-28 13:39:04] ListPodSandboxResponse: &ListPodSandboxResponse{Items:[]*PodSandbox{&PodSandbox{Id:3eb1ba5491da71ce5b0b633e6fa1d9f666978321b6e55d74f74b385671eb53ed,Metadata:&PodSandboxMetadata{Name:rke2-sriov-rancher-nfd-master-7f5f499cc6-l59vs,Uid:303b7f75-e440-4d6d-b35e-f8e55f180a04,Namespace:kube-system,Attempt:2,},State:SANDBOX_NOTREADY,CreatedAt:1711596264411923498,Labels:map[string]string{app.kubernetes.io/instance: rke2-sriov,app.kubernetes.io/name: rancher-nfd,io.kubernetes.pod.name: rke2-sriov-rancher-nfd-master-7f5f499cc6-l59vs,io.kubernetes.pod.namespace: kube-system,io.kubernetes.pod.uid: 303b7f75-e440-4d6d-b35e-f8e55f180a04,pod-template-hash: 7f5f499cc6,role: master,},Annotations:map[string]string{kubernetes.io/config.seen: 2024-03-28T13:17:23.128205152+10:00,kubernetes.io/config.source: api,},RuntimeHandler:,},&PodSandbox{Id:79c1ab6b5bccc22041a362a88e44521770ebcaed2abd07405a2f0c7465aaf366,Metadata:&PodSandboxMetadata{Name:rke2-sriov-rancher-nfd-master-7f5f499cc6-l59vs,Uid:303b7f75-e440-4d6d-b35e-f8e55f180a04,Namespace:kube-system,Attempt:0,},State:SANDBOX_NOTREADY,CreatedAt:1711595843436442759,Labels:map[string]string{app.kubernetes.io/instance: rke2-sriov,app.kubernetes.io/name: rancher-nfd,io.kubernetes.pod.name: rke2-sriov-rancher-nfd-master-7f5f499cc6-l59vs,io.kubernetes.pod.namespace: kube-system,io.kubernetes.pod.uid: 303b7f75-e440-4d6d-b35e-f8e55f180a04,pod-template-hash: 7f5f499cc6,role: master,},Annotations:map[string]string{kubernetes.io/config.seen: 2024-03-28T13:17:23.128205152+10:00,kubernetes.io/config.source: api,},RuntimeHandler:,},&PodSandbox{Id:a953151f1b0a4684f35579fcf89e661f9314340045b8d30ab4fe7a822623245f,Metadata:&PodSandboxMetadata{Name:calico-kube-controllers-7b9ffcdcf6-97thn,Uid:f2ce0d4e-d0cd-4187-aac1-4e3800a148c5,Namespace:calico-system,Attempt:0,},State:SANDBOX_NOTREADY,CreatedAt:1711596554669549507,Labels:map[string]string{app.kubernetes.io/name: calico-kube-controllers,io.kubernetes.pod.name: calico-kube-controllers-7b9ffcdcf6-97thn,io.kubernetes.pod.namespace: calico-system,io.kubernetes.pod.uid: f2ce0d4e-d0cd-4187-aac1-4e3800a148c5,k8s-app: calico-kube-controllers,pod-template-hash: 7b9ffcdcf6,},Annotations:map[string]string{hash.operator.tigera.io/system: fdde45054a8ae4f629960ce37570929502e59449,kubernetes.io/config.seen: 2024-03-28T13:27:03.008972924+10:00,kubernetes.io/config.source: api,tigera-operator.hash.operator.tigera.io/tigera-ca-private: 8d04665520ee20212b673f1126569392c3b6887c,},RuntimeHandler:,},&PodSandbox{Id:95584005577124a67c3137a9cdea55a250f1b68407e96fe0758b180ed6b6b2b4,Metadata:&PodSandboxMetadata{Name:rke2-sriov-rancher-nfd-master-7f5f499cc6-l59vs,Uid:303b7f75-e440-4d6d-b35e-f8e55f180a04,Namespace:kube-system,Attempt:3,},State:SANDBOX_NOTREADY,CreatedAt:1711596711683981625,Labels:map[string]string{app.kubernetes.io/instance: rke2-sriov,app.kubernetes.io/name: rancher-nfd,io.kubernetes.pod.name: rke2-sriov-rancher-nfd-master-7f5f499cc6-l59vs,io.kubernetes.pod.namespace: kube-system,io.kubernetes.pod.uid: 303b7f75-e440-4d6d-b35e-f8e55f180a04,pod-template-hash: 7f5f499cc6,role: master,},Annotations:map[string]string{kubernetes.io/config.seen: 2024-03-28T13:17:23.128205152+10:00,kubernetes.io/config.source: api,},RuntimeHandler:,},&PodSandbox{Id:265fe4477ba3c2eb73d2449510b23fa22171c74735e83a660777ede4d9e2e1be,Metadata:&PodSandboxMetadata{Name:kube-multus-ds-vp4rs,Uid:b41be92b-cdbe-41b2-9a26-6e95c66ee193,Namespace:kube-system,Attempt:0,},State:SANDBOX_READY,CreatedAt:1711595680999852460,Labels:map[string]string{app: multus,controller-revision-hash: 6dc58bc595,io.kubernetes.pod.name: kube-multus-ds-vp4rs,io.kubernetes.pod.namespace: kube-system,io.kubernetes.pod.uid: b41be92b-cdbe-41b2-9a26-6e95c66ee193,name: multus,pod-template-generation: 1,tier: node,},Annotations:map[string]string{kubernetes.io/config.seen: 2024-03-28T13:14:36.657093921+10:00,kubernetes.io/config.source: api,},RuntimeHandler:,},&PodSandbox{Id:48c4ca4108421bf31ca8e71013897e7992753fc68f2e8116acf4f1d76d2293f4,Metadata:&PodSandboxMetadata{Name:rke2-sriov-rancher-nfd-worker-vvlgn,Uid:e5bf2000-c63b-405a-ab44-769538d12b03,Namespace:kube-system,Attempt:1,},State:SANDBOX_NOTREADY,CreatedAt:1711596832448321261,Labels:map[string]string{app.kubernetes.io/instance: rke2-sriov,app.kubernetes.io/name: rancher-nfd,controller-revision-hash: 6b847f477d,io.kubernetes.pod.name: rke2-sriov-rancher-nfd-worker-vvlgn,io.kubernetes.pod.namespace: kube-system,io.kubernetes.pod.uid: e5bf2000-c63b-405a-ab44-769538d12b03,pod-template-generation: 1,role: worker,},Annotations:map[string]string{kubernetes.io/config.seen: 2024-03-28T13:17:23.125956524+10:00,kubernetes.io/config.source: api,},RuntimeHandler:,},&PodSandbox{Id:9602603411ef0e66dfb42615fdb2b69ce4eef6314c379baa1de27e60f37f332a,Metadata:&PodSandboxMetadata{Name:rke2-sriov-rancher-nfd-worker-vvlgn,Uid:e5bf2000-c63b-405a-ab44-769538d12b03,Namespace:kube-system,Attempt:0,},State:SANDBOX_NOTREADY,CreatedAt:1711595843433512362,Labels:map[string]string{app.kubernetes.io/instance: rke2-sriov,app.kubernetes.io/name: rancher-nfd,controller-revision-hash: 6b847f477d,io.kubernetes.pod.name: rke2-sriov-rancher-nfd-worker-vvlgn,io.kubernetes.pod.namespace: kube-system,io.kubernetes.pod.uid: e5bf2000-c63b-405a-ab44-769538d12b03,pod-template-generation: 1,role: worker,},Annotations:map[string]string{kubernetes.io/config.seen: 2024-03-28T13:17:23.125956524+10:00,kubernetes.io/config.source: api,},RuntimeHandler:,},&PodSandbox{Id:b8b7654605b2e04128d799d8b311d4ca5457c8cac9559af58892ba057c864e5e,Metadata:&PodSandboxMetadata{Name:rke2-sriov-rancher-nfd-gc-5fb8c4d68b-vmvtv,Uid:d38e4f71-a33e-4a5b-8f6d-6cd8a7c9438a,Namespace:kube-system,Attempt:0,},State:SANDBOX_NOTREADY,CreatedAt:1711595843434712607,Labels:map[string]string{app.kubernetes.io/instance: rke2-sriov,app.kubernetes.io/name: rancher-nfd,io.kubernetes.pod.name: rke2-sriov-rancher-nfd-gc-5fb8c4d68b-vmvtv,io.kubernetes.pod.namespace: kube-system,io.kubernetes.pod.uid: d38e4f71-a33e-4a5b-8f6d-6cd8a7c9438a,pod-template-hash: 5fb8c4d68b,role: gc,},Annotations:map[string]string{kubernetes.io/config.seen: 2024-03-28T13:17:23.126514187+10:00,kubernetes.io/config.source: api,},RuntimeHandler:,},&PodSandbox{Id:45c7ae67c73abb91f5756087334abce706e18505847486978ce4609bf5867c50,Metadata:&PodSandboxMetadata{Name:coredns-6cd46fb86c-mkn6f,Uid:3b31d34f-102d-453f-b17d-c0bdb0cdf91f,Namespace:kube-system,Attempt:3,},State:SANDBOX_NOTREADY,CreatedAt:1711596515700103349,Labels:map[string]string{io.kubernetes.pod.name: coredns-6cd46fb86c-mkn6f,io.kubernetes.pod.namespace: kube-system,io.kubernetes.pod.uid: 3b31d34f-102d-453f-b17d-c0bdb0cdf91f,k8s-app: kube-dns,pod-template-hash: 6cd46fb86c,},Annotations:map[string]string{kubernetes.io/config.seen: 2024-03-28T13:14:37.352432717+10:00,kubernetes.io/config.source: api,prometheus.io/port: 9153,prometheus.io/scrape: true,},RuntimeHandler:,},&PodSandbox{Id:310517eed58e60632ecf6a5a3343e8cfc48ad3341aad12fb986c8b5141bd027f,Metadata:&PodSandboxMetadata{Name:calico-node-hjlbr,Uid:b20eda08-091f-4e22-ab2e-4810eafc8063,Namespace:calico-system,Attempt:0,},State:SANDBOX_READY,CreatedAt:1711596423196314387,Labels:map[string]string{app.kubernetes.io/name: calico-node,controller-revision-hash: 5dcd899449,io.kubernetes.pod.name: calico-node-hjlbr,io.kubernetes.pod.namespace: calico-system,io.kubernetes.pod.uid: b20eda08-091f-4e22-ab2e-4810eafc8063,k8s-app: calico-node,pod-template-generation: 1,},Annotations:map[string]string{hash.operator.tigera.io/cni-config: 7c3d4f43aed6c9f83376215a5e9a368bef0902a0,hash.operator.tigera.io/system: fdde45054a8ae4f629960ce37570929502e59449,kubernetes.io/config.seen: 2024-03-28T13:27:02.891241493+10:00,kubernetes.io/config.source: api,tigera-operator.hash.operator.tigera.io/tigera-ca-private: 8d04665520ee20212b673f1126569392c3b6887c,},RuntimeHandler:,},&PodSandbox{Id:bbc49bb4f197fdae86a5e2b218b30a66949c7757a394e8f7f8d583bb02341265,Metadata:&PodSandboxMetadata{Name:coredns-6cd46fb86c-mkn6f,Uid:3b31d34f-102d-453f-b17d-c0bdb0cdf91f,Namespace:kube-system,Attempt:0,},State:SANDBOX_NOTREADY,CreatedAt:1711595680996449920,Labels:map[string]string{io.kubernetes.pod.name: coredns-6cd46fb86c-mkn6f,io.kubernetes.pod.namespace: kube-system,io.kubernetes.pod.uid: 3b31d34f-102d-453f-b17d-c0bdb0cdf91f,k8s-app: kube-dns,pod-template-hash: 6cd46fb86c,},Annotations:map[string]string{kubernetes.io/config.seen: 2024-03-28T13:14:37.352432717+10:00,kubernetes.io/config.source: api,prometheus.io/port: 9153,prometheus.io/scrape: true,},RuntimeHandler:,},&PodSandbox{Id:76487f6dbaddf54b58f1bab72b217a0320ec9e0490d7abbd844a4cdfd0bef98d,Metadata:&PodSandboxMetadata{Name:rke2-sriov-5857dd76c-sq2j4,Uid:061c2aa6-ded0-47fd-b5eb-2fa29b72d0ef,Namespace:kube-system,Attempt:1,},State:SANDBOX_NOTREADY,CreatedAt:1711596024106550898,Labels:map[string]string{io.kubernetes.pod.name: rke2-sriov-5857dd76c-sq2j4,io.kubernetes.pod.namespace: kube-system,io.kubernetes.pod.uid: 061c2aa6-ded0-47fd-b5eb-2fa29b72d0ef,name: sriov-network-operator,pod-template-hash: 5857dd76c,},Annotations:map[string]string{kubernetes.io/config.seen: 2024-03-28T13:17:23.127139358+10:00,kubernetes.io/config.source: api,},RuntimeHandler:,},&PodSandbox{Id:f747fe4396fe49ed165c57607e66024c130b6c58fc9d859e03594d1457764fe7,Metadata:&PodSandboxMetadata{Name:metrics-server-7556957bb7-4rvs2,Uid:abd20415-25ce-4ee6-b1c3-136f45e7a225,Namespace:kube-system,Attempt:4,},State:SANDBOX_NOTREADY,CreatedAt:1711596711682672993,Labels:map[string]string{io.kubernetes.pod.name: metrics-server-7556957bb7-4rvs2,io.kubernetes.pod.namespace: kube-system,io.kubernetes.pod.uid: abd20415-25ce-4ee6-b1c3-136f45e7a225,k8s-app: metrics-server,pod-template-hash: 7556957bb7,},Annotations:map[string]string{kubernetes.io/config.seen: 2024-03-28T13:14:37.352555942+10:00,kubernetes.io/config.source: api,},RuntimeHandler:,},&PodSandbox{Id:b94762e9caec53b2cf662a30830b29e9647992355cc88e03bbff66dbb7cce7da,Metadata:&PodSandboxMetadata{Name:kube-proxy-ltxnf,Uid:7958329a-0716-45e1-9c11-2cee747bae87,Namespace:kube-system,Attempt:0,},State:SANDBOX_READY,CreatedAt:1711595680998326136,Labels:map[string]string{controller-revision-hash: 597446b5c4,io.kubernetes.pod.name: kube-proxy-ltxnf,io.kubernetes.pod.namespace: kube-system,io.kubernetes.pod.uid: 7958329a-0716-45e1-9c11-2cee747bae87,k8s-app: kube-proxy,pod-template-generation: 1,},Annotations:map[string]string{kubernetes.io/config.seen: 2024-03-28T13:14:36.657100656+10:00,kubernetes.io/config.source: api,prometheus.io/port: 10249,prometheus.io/scrape: true,},RuntimeHandler:,},&PodSandbox{Id:228142b90b4adf5c2fcc1cdaaaad2c423419b5c0073aa2db6d92313e149808ba,Metadata:&PodSandboxMetadata{Name:calico-typha-89844548c-4w8qd,Uid:0e69c94e-81ff-4a04-8f20-f34944433ecc,Namespace:calico-system,Attempt:0,},State:SANDBOX_READY,CreatedAt:1711596423165025426,Labels:map[string]string{app.kubernetes.io/name: calico-typha,io.kubernetes.pod.name: calico-typha-89844548c-4w8qd,io.kubernetes.pod.namespace: calico-system,io.kubernetes.pod.uid: 0e69c94e-81ff-4a04-8f20-f34944433ecc,k8s-app: calico-typha,pod-template-hash: 89844548c,},Annotations:map[string]string{hash.operator.tigera.io/system: fdde45054a8ae4f629960ce37570929502e59449,kubernetes.io/config.seen: 2024-03-28T13:27:02.860310221+10:00,kubernetes.io/config.source: api,tigera-operator.hash.operator.tigera.io/tigera-ca-private: 8d04665520ee20212b673f1126569392c3b6887c,tigera-operator.hash.operator.tigera.io/typha-certs: 77e7cf1a85e6c7bc5fcee2abc517a81d42e75cb1,},RuntimeHandler:,},&PodSandbox{Id:fe66cf214439ad7a0fb74165e52f59a1df36bbd0bf8e4c76a8dae15d2bc3aa8a,Metadata:&PodSandboxMetadata{Name:calico-kube-controllers-7b9ffcdcf6-97thn,Uid:f2ce0d4e-d0cd-4187-aac1-4e3800a148c5,Namespace:calico-system,Attempt:1,},State:SANDBOX_NOTREADY,CreatedAt:1711596711684334275,Labels:map[string]string{app.kubernetes.io/name: calico-kube-controllers,io.kubernetes.pod.name: calico-kube-controllers-7b9ffcdcf6-97thn,io.kubernetes.pod.namespace: calico-system,io.kubernetes.pod.uid: f2ce0d4e-d0cd-4187-aac1-4e3800a148c5,k8s-app: calico-kube-controllers,pod-template-hash: 7b9ffcdcf6,},Annotations:map[string]string{hash.operator.tigera.io/system: fdde45054a8ae4f629960ce37570929502e59449,kubernetes.io/config.seen: 2024-03-28T13:27:03.008972924+10:00,kubernetes.io/config.source: api,tigera-operator.hash.operator.tigera.io/tigera-ca-private: 8d04665520ee20212b673f1126569392c3b6887c,},RuntimeHandler:,},&PodSandbox{Id:37978cf3d71118d7e671ae42613508cc15f78be4ded71901ac90a97252c34776,Metadata:&PodSandboxMetadata{Name:rke2-sriov-5857dd76c-sq2j4,Uid:061c2aa6-ded0-47fd-b5eb-2fa29b72d0ef,Namespace:kube-system,Attempt:2,},State:SANDBOX_NOTREADY,CreatedAt:1711596264412873335,Labels:map[string]string{io.kubernetes.pod.name: rke2-sriov-5857dd76c-sq2j4,io.kubernetes.pod.namespace: kube-system,io.kubernetes.pod.uid: 061c2aa6-ded0-47fd-b5eb-2fa29b72d0ef,name: sriov-network-operator,pod-template-hash: 5857dd76c,},Annotations:map[string]string{kubernetes.io/config.seen: 2024-03-28T13:17:23.127139358+10:00,kubernetes.io/config.source: api,},RuntimeHandler:,},&PodSandbox{Id:0e6e701fa7378d4b2bbde5fc6084ffd0bc37c4d201c3757f25cf33f5689c1474,Metadata:&PodSandboxMetadata{Name:metrics-server-7556957bb7-4rvs2,Uid:abd20415-25ce-4ee6-b1c3-136f45e7a225,Namespace:kube-system,Attempt:3,},State:SANDBOX_NOTREADY,CreatedAt:1711596515701215124,Labels:map[string]string{io.kubernetes.pod.name: metrics-server-7556957bb7-4rvs2,io.kubernetes.pod.namespace: kube-system,io.kubernetes.pod.uid: abd20415-25ce-4ee6-b1c3-136f45e7a225,k8s-app: metrics-server,pod-template-hash: 7556957bb7,},Annotations:map[string]string{kubernetes.io/config.seen: 2024-03-28T13:14:37.352555942+10:00,kubernetes.io/config.source: api,},RuntimeHandler:,},&PodSandbox{Id:41a9903414c9db3c1c206a7a634ec5d339de744dda20f7339bbcc7d5ac14cde9,Metadata:&PodSandboxMetadata{Name:rke2-sriov-rancher-nfd-gc-5fb8c4d68b-vmvtv,Uid:d38e4f71-a33e-4a5b-8f6d-6cd8a7c9438a,Namespace:kube-system,Attempt:1,},State:SANDBOX_NOTREADY,CreatedAt:1711596772431032021,Labels:map[string]string{app.kubernetes.io/instance: rke2-sriov,app.kubernetes.io/name: rancher-nfd,io.kubernetes.pod.name: rke2-sriov-rancher-nfd-gc-5fb8c4d68b-vmvtv,io.kubernetes.pod.namespace: kube-system,io.kubernetes.pod.uid: d38e4f71-a33e-4a5b-8f6d-6cd8a7c9438a,pod-template-hash: 5fb8c4d68b,role: gc,},Annotations:map[string]string{kubernetes.io/config.seen: 2024-03-28T13:17:23.126514187+10:00,kubernetes.io/config.source: api,},RuntimeHandler:,},&PodSandbox{Id:a1d213e3c67f7e553e551b8a9d6f10fb41a72789ef30d07b78ad3f25f579e163,Metadata:&PodSandboxMetadata{Name:coredns-6cd46fb86c-mkn6f,Uid:3b31d34f-102d-453f-b17d-c0bdb0cdf91f,Namespace:kube-system,Attempt:2,},State:SANDBOX_NOTREADY,CreatedAt:1711596102152979845,Labels:map[string]string{io.kubernetes.pod.name: coredns-6cd46fb86c-mkn6f,io.kubernetes.pod.namespace: kube-system,io.kubernetes.pod.uid: 3b31d34f-102d-453f-b17d-c0bdb0cdf91f,k8s-app: kube-dns,pod-template-hash: 6cd46fb86c,},Annotations:map[string]string{kubernetes.io/config.seen: 2024-03-28T13:14:37.352432717+10:00,kubernetes.io/config.source: api,prometheus.io/port: 9153,prometheus.io/scrape: true,},RuntimeHandler:,},&PodSandbox{Id:a4420032924c70feb3eccf626e6c8477abe912d90c19fc695159794f9a34f7c0,Metadata:&PodSandboxMetadata{Name:tigera-operator-748c69cf45-rmntt,Uid:66513287-e9f4-4f49-8744-0e8c70f39a00,Namespace:tigera-operator,Attempt:0,},State:SANDBOX_READY,CreatedAt:1711596394356166303,Labels:map[string]string{io.kubernetes.pod.name: tigera-operator-748c69cf45-rmntt,io.kubernetes.pod.namespace: tigera-operator,io.kubernetes.pod.uid: 66513287-e9f4-4f49-8744-0e8c70f39a00,k8s-app: tigera-operator,name: tigera-operator,pod-template-hash: 748c69cf45,},Annotations:map[string]string{kubernetes.io/config.seen: 2024-03-28T13:26:34.051738948+10:00,kubernetes.io/config.source: api,},RuntimeHandler:,},},} 
DEBU[2024-03-28 13:39:04] stopping container: 3eb1ba5491da71ce5b0b633e6fa1d9f666978321b6e55d74f74b385671eb53ed 
DEBU[2024-03-28 13:39:04] connected successfully using endpoint: unix:///run/k0s/containerd.sock 
DEBU[2024-03-28 13:39:04] StopPodSandboxRequest: &StopPodSandboxRequest{PodSandboxId:3eb1ba5491da71ce5b0b633e6fa1d9f666978321b6e55d74f74b385671eb53ed,} 
DEBU[2024-03-28 13:40:04] StopPodSandboxResponse: &StopPodSandboxResponse{} 
DEBU[2024-03-28 13:40:04] Stopped pod sandbox 3eb1ba5491da71ce5b0b633e6fa1d9f666978321b6e55d74f74b385671eb53ed 
DEBU[2024-03-28 13:40:04] connected successfully using endpoint: unix:///run/k0s/containerd.sock 
DEBU[2024-03-28 13:40:04] RemovePodSandboxRequest: &RemovePodSandboxRequest{PodSandboxId:3eb1ba5491da71ce5b0b633e6fa1d9f666978321b6e55d74f74b385671eb53ed,} 
DEBU[2024-03-28 13:41:04] RemovePodSandboxResponse: &RemovePodSandboxResponse{} 
DEBU[2024-03-28 13:41:04] Removed pod sandbox 3eb1ba5491da71ce5b0b633e6fa1d9f666978321b6e55d74f74b385671eb53ed 
DEBU[2024-03-28 13:41:04] stopping container: 79c1ab6b5bccc22041a362a88e44521770ebcaed2abd07405a2f0c7465aaf366 
DEBU[2024-03-28 13:41:04] connected successfully using endpoint: unix:///run/k0s/containerd.sock 
DEBU[2024-03-28 13:41:04] StopPodSandboxRequest: &StopPodSandboxRequest{PodSandboxId:79c1ab6b5bccc22041a362a88e44521770ebcaed2abd07405a2f0c7465aaf366,} 
DEBU[2024-03-28 13:42:04] StopPodSandboxResponse: &StopPodSandboxResponse{} 
DEBU[2024-03-28 13:42:04] Stopped pod sandbox 79c1ab6b5bccc22041a362a88e44521770ebcaed2abd07405a2f0c7465aaf366 
DEBU[2024-03-28 13:42:04] connected successfully using endpoint: unix:///run/k0s/containerd.sock 
DEBU[2024-03-28 13:42:04] RemovePodSandboxRequest: &RemovePodSandboxRequest{PodSandboxId:79c1ab6b5bccc22041a362a88e44521770ebcaed2abd07405a2f0c7465aaf366,} 
DEBU[2024-03-28 13:43:04] RemovePodSandboxResponse: &RemovePodSandboxResponse{} 
DEBU[2024-03-28 13:43:04] Removed pod sandbox 79c1ab6b5bccc22041a362a88e44521770ebcaed2abd07405a2f0c7465aaf366 
DEBU[2024-03-28 13:43:04] stopping container: a953151f1b0a4684f35579fcf89e661f9314340045b8d30ab4fe7a822623245f 
DEBU[2024-03-28 13:43:04] connected successfully using endpoint: unix:///run/k0s/containerd.sock 
DEBU[2024-03-28 13:43:04] StopPodSandboxRequest: &StopPodSandboxRequest{PodSandboxId:a953151f1b0a4684f35579fcf89e661f9314340045b8d30ab4fe7a822623245f,} 
DEBU[2024-03-28 13:44:04] StopPodSandboxResponse: &StopPodSandboxResponse{} 
DEBU[2024-03-28 13:44:04] Stopped pod sandbox a953151f1b0a4684f35579fcf89e661f9314340045b8d30ab4fe7a822623245f 
DEBU[2024-03-28 13:44:04] connected successfully using endpoint: unix:///run/k0s/containerd.sock 
DEBU[2024-03-28 13:44:04] RemovePodSandboxRequest: &RemovePodSandboxRequest{PodSandboxId:a953151f1b0a4684f35579fcf89e661f9314340045b8d30ab4fe7a822623245f,} 
DEBU[2024-03-28 13:45:05] RemovePodSandboxResponse: &RemovePodSandboxResponse{} 
DEBU[2024-03-28 13:45:05] Removed pod sandbox a953151f1b0a4684f35579fcf89e661f9314340045b8d30ab4fe7a822623245f 
DEBU[2024-03-28 13:45:05] stopping container: 95584005577124a67c3137a9cdea55a250f1b68407e96fe0758b180ed6b6b2b4 
DEBU[2024-03-28 13:45:05] connected successfully using endpoint: unix:///run/k0s/containerd.sock 
DEBU[2024-03-28 13:45:05] StopPodSandboxRequest: &StopPodSandboxRequest{PodSandboxId:95584005577124a67c3137a9cdea55a250f1b68407e96fe0758b180ed6b6b2b4,}
[...]
DEBU[2024-03-28 14:16:15] ListPodSandboxRequest: &ListPodSandboxRequest{Filter:nil,} 
DEBU[2024-03-28 14:16:15] ListPodSandboxResponse: &ListPodSandboxResponse{Items:[]*PodSandbox{},} 
INFO[2024-03-28 14:16:15] successfully removed k0s containers!         
DEBU[2024-03-28 14:16:15] attempting to stop containerd                
DEBU[2024-03-28 14:16:15] found containerd pid: 15325                  
DEBU[2024-03-28 14:16:15] successfully stopped containerd              
INFO[2024-03-28 14:16:15] * remove k0s users step:                     
DEBU[2024-03-28 14:16:15] deleting user: etcd                          
DEBU[2024-03-28 14:16:15] deleting user: kube-apiserver                
DEBU[2024-03-28 14:16:15] deleting user: konnectivity-server           
DEBU[2024-03-28 14:16:15] deleting user: kube-scheduler                
INFO[2024-03-28 14:16:15] * uninstall service step                     
INFO[2024-03-28 14:16:15] * remove directories step                    
DEBU[2024-03-28 14:16:15] removing k0s generated data-dir (/var/lib/k0s) 
DEBU[2024-03-28 14:16:16] deleting k0s generated run-dir (/run/k0s)    
INFO[2024-03-28 14:16:16] * CNI leftovers cleanup step                 
INFO[2024-03-28 14:16:16] * kube-bridge leftovers cleanup step         
INFO[2024-03-28 14:16:16] k0s cleanup operations done.                 
WARN[2024-03-28 14:16:16] To ensure a full reset, a node reboot is recommended. 

So it is doing something, just very slowly. Are these delays necessary i.e. could the removal be forced somehow? (e.g. rm -rf /var/lib/k0s/containerd)

@twz123
Copy link
Member

twz123 commented Apr 8, 2024

From the logs, it looks like your worker ran a lot of pods, each of which took a minute to shut down. This could be an indication of a graceful shutdown timeout. K0s needs to stop all running containers before it can clean up the data directory, because running containers usually prevent certain paths from being deleted because they still have some active mount points. Usually, k0s reset is used on a node that has already been drained, so it's not necessarily optimized to deal with a large number of running pods.

There are some ways this could be improved, such as parallelizing pod removal. Forcibly terminating pods could also speed things up, although this is quite destructive, especially if the node is still part of an active cluster, and the pods don't get time to complete their shutdown tasks.

@ianb-mp
Copy link
Contributor Author

ianb-mp commented Apr 8, 2024

Thanks for the explanation. I'm coming to k0s from k3s & rke2 which uninstall in less than 30 seconds (on a system with same previously running workloads), so was curious to understand the differences.

@twz123
Copy link
Member

twz123 commented Apr 9, 2024

I reckon k3s will forcibly kill all the container processes. This is of course faster and arguably a reasonable choice for non-production clusters. However, for a cluster running more valuable workloads, proper pod termination will be a safer bet.

Nevertheless, k0s could definitely do things concurrently during reset in order to speed up the process.

@chrischen
Copy link

Mine is also hanging, and I have restarted the node and it's still hanging. I don't care about data loss, so what's the easiest way to force reset without reformatting the drive?

DEBU[2024-04-09 12:23:37] Starting debug server                         debug_server=":6060"
INFO[2024-04-09 12:23:37] * containers steps                           
DEBU[2024-04-09 12:23:37] starting containerd                          
DEBU[2024-04-09 12:23:37] started containerd successfully              
DEBU[2024-04-09 12:23:37] trying to list all pods 

@j1m-ryan
Copy link

j1m-ryan commented May 6, 2024

In the same boat as those above. Would definitely appreciate an easy way to force remove all of k0s when in this state so I can apply it again. At the moment I can neither apply nor reset as both fail. To get k0s running again it looks like I will have to format my drives.

Edit:
What worked for me was stopping the k0s worker services, deleting the service, and deleting everything in /etc/k0s manually on each node. I was using k0sctl

@teldredge
Copy link

Same exact issue as chrischen

@jnummelin
Copy link
Member

hmm, I wonder what makes stopping&killing pods so slow in some cases. Just tested with 106 pods:

[root@ip-172-31-29-67 ~]# k0s kc get pod -A | wc -l
106
[root@ip-172-31-29-67 ~]# time k0s reset
WARN[2024-05-31 16:06:33] To ensure a full reset, a node reboot is recommended. 
real	0m19.518s

Looking at the code, the stop signalling is probably not the optimal and as @twz123 said, we could easily parallelise cleaning of the pods.

One things I'm thinking is that if there's pods with long running shutdown sequences (handling of SIGTERM) and/or stop hooks, that might affect this heavily. Currently I believe the code waits 60secs for all containers in a pod to stop before SIGKILLing it, hence you probably get ~60sec per pod for resetting.

@jnummelin
Copy link
Member

Looking at some optimizations in the code raises couple questions.

Should the grace-period actually be user settable? With some sensible 30s default maybe?

How would user apply force? using --grace-period=0?

@jnummelin jnummelin self-assigned this Jun 3, 2024
@jnummelin jnummelin added this to the 1.31 milestone Jun 6, 2024
@jnummelin
Copy link
Member

One things I'm thinking is that if there's pods with long running shutdown sequences (handling of SIGTERM) and/or stop hooks, that might affect this heavily. Currently I believe the code waits 60secs for all containers in a pod to stop before SIGKILLing it, hence you probably get ~60sec per pod for resetting.

Not sure anymore at all. 😄

I've been testing this with pods that deliberately refuse to shutdown properly and I still don't see any major slowness in resetting a node with 100 pods in.

@ianb-mp Do you have any idea if those pods you've had running have some long shutdown hooks/handling in place?

@twz123
Copy link
Member

twz123 commented Jun 7, 2024

@chrischen @teldredge Are you still experiencing this? If so, would you mind sharing the console output of k0s reset --debug? That might give a clue as to why/where it's hanging.

One reason for k0s reset to hang when trying to stop containers has been fixed in #4434. However, this does not solve the original poster's problem, which is that stopping containers makes progress, but takes a very long time.

@ianb-mp
Copy link
Contributor Author

ianb-mp commented Jun 7, 2024

Do you have any idea if those pods you've had running have some long shutdown hooks/handling in place?

@jnummelin unfortunately my test environment has changed significantly and I can't (easily) test the exact same scenario as I had when I first created the ticket. That said, I done many resets since then and not experienced any significant slowness so perhaps it's fixed now.

@chrischen
Copy link

@chrischen @teldredge Are you still experiencing this? If so, would you mind sharing the console output of k0s reset --debug? That might give a clue as to why/where it's hanging.

One reason for k0s reset to hang when trying to stop containers has been fixed in #4434. However, this does not solve the original poster's problem, which is that stopping containers makes progress, but takes a very long time.

Seems to be fine now.

Copy link
Contributor

The issue is marked as stale since no activity has been recorded in 30 days

@github-actions github-actions bot added the Stale label Jul 19, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Stale
Projects
None yet
Development

No branches or pull requests

6 participants