cAdvisor stops updating metrics after k3s upgrade #3035

mnorrsken · 2021-03-09T21:52:12Z

Environmental Info:
K3s Version:
v1.20.4+k3s1

Node(s) CPU architecture, OS, and Version:
Linux mandalore 5.10.16-meson64 #21.02.2 SMP PREEMPT Sun Feb 14 21:50:52 CET 2021 aarch64 GNU/Linux
Linux alderaan 5.10.17-v8+ #1403 SMP PREEMPT Mon Feb 22 11:37:54 GMT 2021 aarch64 GNU/Linux
Linux glados 4.19.0-14-amd64 #1 SMP Debian 4.19.171-2 (2021-01-30) x86_64 GNU/Linux

Cluster Configuration:
Cluster#1: 1 master 3 workers
Cluster#2: 1 master (x86)

Describe the bug:
cAdvisor stops reporting new stats for containers (cpu usage, memory usage) after doing an in-place upgrade of the system
curl -sfL https://get.k3s.io | sh -s -
this seems to happen regardless of k3s version
The only way to get the stats working again is to restart containers. For every new container starting after upgrading, stats are correct

It seems to happen on both my arm64 cluster and my amd64 single master.
Other things I tried without luck:

Restarting k3s or k3s-agent
Restarting metrics-server
Restarting prometheus

Steps To Reproduce:

Upgrade k3s "in place" curl -sfL https://get.k3s.io | sh -s -
Container running before upgrade are now stuck on the same cAdvisor stats until "restarted"

Expected behavior:

cAdvisor stats continue to report correctly after upgrade

Actual behavior:

I have to restart all containers for example by rolling node restarts

Additional context / logs:
I reported this earlier in #2895 but at the time I didn't know how to reproduce this issue.

The text was updated successfully, but these errors were encountered:

mnorrsken · 2021-03-09T22:24:48Z

This also seems to happen if no upgrade is taking place, by running the below again
The full command I use for installing is

      curl -sfL https://get.k3s.io | INSTALL_K3S_CHANNEL=v1.20 sh -s - 
      --write-kubeconfig-mode 640 
      --disable local-storage 
      --disable servicelb 
      --disable traefik 
      --kubelet-arg=image-gc-high-threshold=85
      --kubelet-arg=image-gc-low-threshold=75
      --kubelet-arg=container-log-max-files=2 
      --kubelet-arg=container-log-max-size=5Mi

After that I install nfs-client-provisioner, traefik 2 and metallb

brandond · 2021-03-10T09:06:35Z

Can you provide a sample command to retrieve the cadvisor metrics that you're seeing go stale?

mnorrsken · 2021-03-10T21:22:10Z

Before update:

$ kubectl get --raw /apis/metrics.k8s.io/v1beta1/namespaces/prom/pods/prom-prometheus-server-78bcb7cd77-55xs5  | jq
{
  "kind": "PodMetrics",
  "apiVersion": "metrics.k8s.io/v1beta1",
  "metadata": {
    "name": "prom-prometheus-server-78bcb7cd77-55xs5",
    "namespace": "prom",
    "selfLink": "/apis/metrics.k8s.io/v1beta1/namespaces/prom/pods/prom-prometheus-server-78bcb7cd77-55xs5",
    "creationTimestamp": "2021-03-10T21:15:40Z"
  },
  "timestamp": "2021-03-10T21:14:31Z",
  "window": "30s",
  "containers": [
    {
      "name": "prometheus-server",
      "usage": {
        "cpu": "13935439n",
        "memory": "512052Ki"
      }
    }
  ]
}

After "update" of k3s cpu usage reports 0 and memory usage seems to report the same value every time:

$ kubectl get --raw /apis/metrics.k8s.io/v1beta1/namespaces/prom/pods/prom-prometheus-server-78bcb7cd77-55xs5  | jq
{
  "kind": "PodMetrics",
  "apiVersion": "metrics.k8s.io/v1beta1",
  "metadata": {
    "name": "prom-prometheus-server-78bcb7cd77-55xs5",
    "namespace": "prom",
    "selfLink": "/apis/metrics.k8s.io/v1beta1/namespaces/prom/pods/prom-prometheus-server-78bcb7cd77-55xs5",
    "creationTimestamp": "2021-03-10T21:18:21Z"
  },
  "timestamp": "2021-03-10T21:17:42Z",
  "window": "30s",
  "containers": [
    {
      "name": "prometheus-server",
      "usage": {
        "cpu": "0",
        "memory": "517012Ki"
      }
    }
  ]
}

brandond · 2021-03-11T00:58:06Z

Do you have any errors in your metrics-server pod logs?

mnorrsken · 2021-03-12T07:23:44Z

No errors, i just did another "install/update" on my cluster. I checked other logs too but I don't see anything related to metrics or cgroups.

mnorrsken · 2021-03-13T14:06:53Z

This is reproducible:
install debian 10 minimal on amd64 VM

apt install curl jq
curl -sfL https://get.k3s.io | sh -

create file loop.yaml:

apiVersion: v1
kind: Pod
metadata:
  name: busybox
spec:
  containers:
  - name: busybox
    image: busybox:latest
    command: [ "/bin/sh", "-ec", "--" ]
    args: [ "while true; do sleep 0.0001; done;" ]

kubectl apply -f loop.yaml

Running the following will now report proper cpu usage
while true; do kubectl get --raw /apis/metrics.k8s.io/v1beta1/namespaces/default/pods/busybox | jq -r '.containers[0].usage.cpu'; done

Start another shell:
curl -sfL https://get.k3s.io | sh -

Cpu metric will soon start to report 0 (zero)

Recreate pod
kubectl delete -f loop.yaml ; kubectl apply -f loop.yaml

Cpu metric will start to report proper values again after a few errors

mnorrsken · 2021-03-13T14:39:46Z

I debugged the install script. The cause of the issue seems to be the line
systemctl disable k3s
in the "systemd_disable" function

Commenting this line in install script makes the issue disappear. However I don't know enough of systemd specifics to know why this happens.

brandond · 2021-03-13T19:13:35Z

That doesn't sound right - disabling the service doesn't actually stop it, it just prevents it from starting automatically on the next boot.

Does this happen when you upgrade the k3s binary and restart the service manually, using curl and systemctl restart?

Does this happen if you stop/disable/enable/start the service without upgrading the binary?

mnorrsken · 2021-03-13T23:38:21Z

Yes I tested doing just ”systemctl disable k3s” without running the install script and the same thing happens. i suspect systemd/systemctl (at least on debian buster) does something more than just disabling the service.

mnorrsken · 2021-03-14T09:12:15Z

Ive tried to read a bit in systemctl code and it could be that something is sent to the ”cgroup subsystem” when disabling a service, because of the additional cgroup settings in the k3s unit file.

Oats87 · 2021-03-16T17:17:56Z

@mnorrsken On a Debian 10 system running K3s

k3s version v1.20.4+k3s1 (838a906a)

PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

(specifically the Debian 10 AMI in AWS) I'm unable to reproduce this issue through systemctl disable k3s. Metrics continue to be updated and CPU metrics do not go to zero.

Is there anything else that may be special about your system configuration?

brandond · 2021-03-16T17:31:07Z

@mnorrsken what do you mean when you say:

because of the additional cgroup settings in the k3s unit file.

Have you customized your systemd unit to add additional settings not present in the one we install by default? As far as I know we don't do any cgroup-specific configuration in the unit generated by the install script.

mnorrsken · 2021-03-16T21:38:36Z

Sorry I thought these had something to do with cGroup but it was ulimits.

LimitNOFILE=1048576
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity

Anyway I can still reproduce this.. on a pristine Debian 10.8 VM

After some new testing it seems this only happens when running systemctl disable k3s after running rm -f /etc/systemd/system/k3s.service

mnorrsken · 2021-03-16T22:00:57Z

Even more debugging. What happens is that stuff in /sys/fs/cgroup/cpu,cpuacct disappears if removing the service file before doing systemctl disable:

# ls /sys/fs/cgroup/cpu,cpuacct
apparmor.service
cgroup.clone_children
cgroup.procs
console-setup.service
cpuacct.stat
cpuacct.usage
cpuacct.usage_all
cpuacct.usage_percpu
cpuacct.usage_percpu_sys
cpuacct.usage_percpu_user
cpuacct.usage_sys
cpuacct.usage_user
cpu.cfs_period_us
cpu.cfs_quota_us
cpu.shares
cpu.stat
cron.service
dbus.service
dbus.socket
dev-hugepages.mount
dev-mqueue.mount
ifupdown-pre.service
ifup@ens192.service
k3s.service
keyboard-setup.service
kmod-static-nodes.service
-.mount
networking.service
notify_on_release
rsyslog.service
run-k3s-containerd-io.containerd.grpc.v1.cri-sandboxes-0ca054b089ba04b336662f1affd688687033836e925565c0fbc9dddd5eb2f9a7-shm.mount
run-k3s-containerd-io.containerd.grpc.v1.cri-sandboxes-0d21693f83cda82b3918fc0c93b16231bcfab8dac25a12ee647c6f376a227f9b-shm.mount
run-k3s-containerd-io.containerd.grpc.v1.cri-sandboxes-4b385dfcdea1c3d22f9811219fed515eb5d6d9bc2b84a99477411f9a1d6a82de-shm.mount
run-k3s-containerd-io.containerd.grpc.v1.cri-sandboxes-716aa9c1e01202b5af2ae9e82eb72cfb4bd4fe390aa62b24bdbdef680e81f7e8-shm.mount
run-k3s-containerd-io.containerd.grpc.v1.cri-sandboxes-d849981908e887c2b32e4d6cae062b32caef69558563454e43e3e1d83cf46a3c-shm.mount
run-k3s-containerd-io.containerd.grpc.v1.cri-sandboxes-fcdc40e469182f717812fe4f26544725b2536e4667390901336a8e86fa77d6fa-shm.mount
run-k3s-containerd-io.containerd.runtime.v2.task-k8s.io-0ca054b089ba04b336662f1affd688687033836e925565c0fbc9dddd5eb2f9a7-rootfs.mount
run-k3s-containerd-io.containerd.runtime.v2.task-k8s.io-0d21693f83cda82b3918fc0c93b16231bcfab8dac25a12ee647c6f376a227f9b-rootfs.mount
run-k3s-containerd-io.containerd.runtime.v2.task-k8s.io-2ff072d8b74eefe60121fc7f4dc16f07c2e9b7d0d9cad14c8361123505ef4321-rootfs.mount
run-k3s-containerd-io.containerd.runtime.v2.task-k8s.io-3c5e5640905cc781a9ee1d015b327f07b8863f6824ee150eb237327919e4ce67-rootfs.mount
run-k3s-containerd-io.containerd.runtime.v2.task-k8s.io-4b385dfcdea1c3d22f9811219fed515eb5d6d9bc2b84a99477411f9a1d6a82de-rootfs.mount
run-k3s-containerd-io.containerd.runtime.v2.task-k8s.io-53cf2fce914da4e0371d8150cf1c527d29914a0b932a56170c085ad3a7a8bf08-rootfs.mount
run-k3s-containerd-io.containerd.runtime.v2.task-k8s.io-716aa9c1e01202b5af2ae9e82eb72cfb4bd4fe390aa62b24bdbdef680e81f7e8-rootfs.mount
run-k3s-containerd-io.containerd.runtime.v2.task-k8s.io-9a4e359d4546723bfaec96788b31e92e186ddc1a76bd6c47e81a26d746f3bcde-rootfs.mount
run-k3s-containerd-io.containerd.runtime.v2.task-k8s.io-b12c9057f8dbd1ae87b2bcc4839f6fa89a2ecb4b75ca6c5d7557e82cb2aadb83-rootfs.mount
run-k3s-containerd-io.containerd.runtime.v2.task-k8s.io-d849981908e887c2b32e4d6cae062b32caef69558563454e43e3e1d83cf46a3c-rootfs.mount
run-k3s-containerd-io.containerd.runtime.v2.task-k8s.io-d9ed3b46a5ef038cce81ba3038db45a56bcae2ea95df08e9141a0d6e32df2419-rootfs.mount
run-k3s-containerd-io.containerd.runtime.v2.task-k8s.io-f64fc27fc4607ff948bcf8f2619e136e1da637dcc89c6483766e440eba2d5449-rootfs.mount
run-k3s-containerd-io.containerd.runtime.v2.task-k8s.io-fcdc40e469182f717812fe4f26544725b2536e4667390901336a8e86fa77d6fa-rootfs.mount
run-netns-cni\x2d4bbe587c\x2dbafe\x2d664f\x2d20a3\x2d23d9e1b817a7.mount
run-netns-cni\x2d5235a53b\x2d9264\x2d7374\x2db535\x2de019911cfbdf.mount
run-netns-cni\x2d8f89102d\x2dbb79\x2d9d3b\x2de051\x2da178e2ccee4e.mount
run-netns-cni\x2db5598530\x2d7b68\x2da6a6\x2d12b0\x2dc21177fbef9f.mount
run-netns-cni\x2dd57a9532\x2d7f92\x2d4cf5\x2ddcad\x2dd5e788c84027.mount
run-netns-cni\x2deb4057ec\x2d3832\x2d7188\x2deb5e\x2d91ec52f65f61.mount
run-user-1000.mount
ssh.service
sys-kernel-debug.mount
syslog.socket
systemd-fsckd.socket
systemd-initctl.socket
systemd-journald-audit.socket
systemd-journald-dev-log.socket
systemd-journald.service
systemd-journald.socket
systemd-journal-flush.service
systemd-logind.service
systemd-modules-load.service
systemd-random-seed.service
systemd-remount-fs.service
systemd-sysctl.service
systemd-sysusers.service
systemd-timesyncd.service
systemd-tmpfiles-setup-dev.service
systemd-tmpfiles-setup.service
systemd-udevd-control.socket
systemd-udevd-kernel.socket
systemd-udevd.service
systemd-udev-trigger.service
systemd-update-utmp.service
systemd-user-sessions.service
system-getty.slice
tasks
var-lib-kubelet-pods-1f425bdf\x2db33b\x2d485e\x2d80fd\x2d65338d12d7a2-volumes-kubernetes.io\x7esecret-default\x2dtoken\x2d7dhj7.mount
var-lib-kubelet-pods-87d0741f\x2d605e\x2d4265\x2d9f22\x2d5135462238d1-volumes-kubernetes.io\x7esecret-local\x2dpath\x2dprovisioner\x2dservice\x2daccount\x2dtoken\x2dmd29n.mount
var-lib-kubelet-pods-8fe609a5\x2d12a9\x2d43ce\x2db7e7\x2def2c5fe912ae-volumes-kubernetes.io\x7esecret-coredns\x2dtoken\x2d4td8t.mount
var-lib-kubelet-pods-9acc4442\x2d7615\x2d4028\x2d9bf1\x2dce2babfe1450-volumes-kubernetes.io\x7esecret-metrics\x2dserver\x2dtoken\x2d54kln.mount
var-lib-kubelet-pods-c49f2827\x2dac9e\x2d48c1\x2d8e6c\x2d6e2f473798bb-volumes-kubernetes.io\x7esecret-default\x2dtoken\x2d5w6p2.mount
var-lib-kubelet-pods-dba46519\x2ddc94\x2d4665\x2da667\x2d5ac5e9154d2d-volumes-kubernetes.io\x7esecret-ssl.mount
var-lib-kubelet-pods-dba46519\x2ddc94\x2d4665\x2da667\x2d5ac5e9154d2d-volumes-kubernetes.io\x7esecret-traefik\x2dtoken\x2dsnjph.mount

# rm -rf /etc/systemd/system/k3s.service
# systemctl disable k3s
# ls /sys/fs/cgroup/cpu,cpuacct
cgroup.clone_children
cgroup.procs
cpuacct.stat
cpuacct.usage
cpuacct.usage_all
cpuacct.usage_percpu
cpuacct.usage_percpu_sys
cpuacct.usage_percpu_user
cpuacct.usage_sys
cpuacct.usage_user
cpu.cfs_period_us
cpu.cfs_quota_us
cpu.shares
cpu.stat
k3s.service
notify_on_release
systemd-udevd.service
system-getty.slice
tasks

mnorrsken · 2021-03-16T22:04:10Z

That is, only doing

# systemctl disable k3s

without removing k3s.service does NOT affect the cgroup data in /sys/fs/cgroup/cpu,cpuacct

mnorrsken · 2021-03-16T22:07:48Z

root@k3stest:~# k3s --version
k3s version v1.20.4+k3s1 (838a906a)
go version go1.15.8
root@k3stest:~# cat /etc/debian_version
10.8
root@k3stest:~# uname -a
Linux k3stest 4.19.0-14-amd64 #1 SMP Debian 4.19.171-2 (2021-01-30) x86_64 GNU/Linux
root@k3stest:~# systemd --version
systemd 241 (241)
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 default-hierarchy=hybrid

brandond · 2021-03-16T22:16:20Z

That output doesn't look right? The pods should be nested under /kubepods, and k3s should be in /system.slice/k3s.service:

[root@centos03 ~]# ls -l /sys/fs/cgroup/cpu,cpuacct/
total 0
-rw-r--r--.  1 root root 0 Mar 16 15:03 cgroup.clone_children
-rw-r--r--.  1 root root 0 Mar 16 15:00 cgroup.procs
-r--r--r--.  1 root root 0 Mar 16 15:03 cgroup.sane_behavior
-r--r--r--.  1 root root 0 Mar 16 15:03 cpuacct.stat
-rw-r--r--.  1 root root 0 Mar 16 15:03 cpuacct.usage
-r--r--r--.  1 root root 0 Mar 16 15:03 cpuacct.usage_all
-r--r--r--.  1 root root 0 Mar 16 15:03 cpuacct.usage_percpu
-r--r--r--.  1 root root 0 Mar 16 15:03 cpuacct.usage_percpu_sys
-r--r--r--.  1 root root 0 Mar 16 15:03 cpuacct.usage_percpu_user
-r--r--r--.  1 root root 0 Mar 16 15:03 cpuacct.usage_sys
-r--r--r--.  1 root root 0 Mar 16 15:03 cpuacct.usage_user
-rw-r--r--.  1 root root 0 Mar 16 15:03 cpu.cfs_period_us
-rw-r--r--.  1 root root 0 Mar 16 15:03 cpu.cfs_quota_us
-rw-r--r--.  1 root root 0 Mar 16 15:03 cpu.rt_period_us
-rw-r--r--.  1 root root 0 Mar 16 15:03 cpu.rt_runtime_us
-rw-r--r--.  1 root root 0 Mar 16 15:03 cpu.shares
-r--r--r--.  1 root root 0 Mar 16 15:03 cpu.stat
drwxr-xr-x.  4 root root 0 Mar 16 15:03 kubepods
-rw-r--r--.  1 root root 0 Mar 16 15:03 notify_on_release
-rw-r--r--.  1 root root 0 Mar 16 15:03 release_agent
drwxr-xr-x. 58 root root 0 Mar 16 15:00 system.slice
-rw-r--r--.  1 root root 0 Mar 16 15:03 tasks
drwxr-xr-x.  2 root root 0 Mar 16 15:00 user.slice

[root@centos03 ~]# systemd-cgtop -n 1 --depth 3 | cat
/                                                                117      -     1.5G        -        -
/kubepods                                                          -      -    61.3M        -        -
/kubepods/besteffort                                               -      -    46.9M        -        -
/kubepods/besteffort/podb77ee18e-8fc3-4929-968b-94cfa9bf5185       -      -    12.0M        -        -
/kubepods/besteffort/podd35119d3-8de3-4e24-9045-7ab80d445f46       -      -    16.9M        -        -
/kubepods/besteffort/podde5b0247-b65d-45e6-8612-f453bcd1bfee       -      -    10.6M        -        -
/kubepods/besteffort/podfd30f404-b2e1-4126-9f66-c638408ab761       -      -     3.0M        -        -
/kubepods/burstable                                                -      -    14.3M        -        -
/kubepods/burstable/podf270e1f6-2f35-4dc3-9a06-bae79ab3a002        -      -    14.3M        -        -
/system.slice                                                      -      -     1.3G        -        -
/system.slice/NetworkManager.service                               2      -    16.8M        -        -
/system.slice/auditd.service                                       1      -     4.1M        -        -
/system.slice/boot-efi.mount                                       -      -    44.0K        -        -
/system.slice/boot.mount                                           -      -    48.0K        -        -
/system.slice/chronyd.service                                      1      -     2.8M        -        -
/system.slice/crond.service                                        1      -     1.0M        -        -
/system.slice/dbus.service                                         1      -     2.7M        -        -
/system.slice/dev-hugepages.mount                                  -      -    76.0K        -        -
/system.slice/dev-mqueue.mount                                     -      -   128.0K        -        -
/system.slice/gssproxy.service                                     1      -     2.4M        -        -
/system.slice/irqbalance.service                                   1      -   900.0K        -        -
/system.slice/k3s.service                                          7      -     1.0G        -        -
/system.slice/lvm2-lvmetad.service                                 1      -     3.2M        -        -
/system.slice/polkit.service                                       1      -    15.0M        -        -
/system.slice/postfix.service                                      3      -     9.4M        -        -
/system.slice/qemu-guest-agent.service                             1      -   904.0K        -        -
/system.slice/rpcbind.service                                      1      -     2.0M        -        -
/system.slice/rsyslog.service                                      1      -     4.4M        -        -
/system.slice/sshd.service                                         1      -     7.2M        -        -
/system.slice/sys-kernel-debug.mount                               -      -   664.0K        -        -
/system.slice/system-getty.slice                                   1      -   384.0K        -        -
/system.slice/system-getty.slice/getty@tty1.service                1      -        -        -        -
/system.slice/system-lvm2\x2dpvscan.slice                          -      -   392.0K        -        -
/system.slice/systemd-journald.service                             1      -     2.6M        -        -
/system.slice/systemd-logind.service                               1      -     1.6M        -        -
/system.slice/systemd-udevd.service                                1      -    17.2M        -        -
/system.slice/tuned.service                                        1      -    25.6M        -        -
/system.slice/var-lib-nfs-rpc_pipefs.mount                         -      -    12.0K        -        -
/user.slice                                                        4      -   256.4M        -        -
/user.slice/user-0.slice/session-1.scope                           4      -        -        -        -

After deleting, everything is still there:

[root@centos03 ~]# rm -rf /etc/systemd/system/k3s.service

[root@centos03 ~]# ls -l /sys/fs/cgroup/cpu,cpuacct/
total 0
-rw-r--r--.  1 root root 0 Mar 16 15:03 cgroup.clone_children
-rw-r--r--.  1 root root 0 Mar 16 15:00 cgroup.procs
-r--r--r--.  1 root root 0 Mar 16 15:03 cgroup.sane_behavior
-r--r--r--.  1 root root 0 Mar 16 15:03 cpuacct.stat
-rw-r--r--.  1 root root 0 Mar 16 15:03 cpuacct.usage
-r--r--r--.  1 root root 0 Mar 16 15:03 cpuacct.usage_all
-r--r--r--.  1 root root 0 Mar 16 15:03 cpuacct.usage_percpu
-r--r--r--.  1 root root 0 Mar 16 15:03 cpuacct.usage_percpu_sys
-r--r--r--.  1 root root 0 Mar 16 15:03 cpuacct.usage_percpu_user
-r--r--r--.  1 root root 0 Mar 16 15:03 cpuacct.usage_sys
-r--r--r--.  1 root root 0 Mar 16 15:03 cpuacct.usage_user
-rw-r--r--.  1 root root 0 Mar 16 15:03 cpu.cfs_period_us
-rw-r--r--.  1 root root 0 Mar 16 15:03 cpu.cfs_quota_us
-rw-r--r--.  1 root root 0 Mar 16 15:03 cpu.rt_period_us
-rw-r--r--.  1 root root 0 Mar 16 15:03 cpu.rt_runtime_us
-rw-r--r--.  1 root root 0 Mar 16 15:03 cpu.shares
-r--r--r--.  1 root root 0 Mar 16 15:03 cpu.stat
drwxr-xr-x.  4 root root 0 Mar 16 15:03 kubepods
-rw-r--r--.  1 root root 0 Mar 16 15:03 notify_on_release
-rw-r--r--.  1 root root 0 Mar 16 15:03 release_agent
drwxr-xr-x. 58 root root 0 Mar 16 15:00 system.slice
-rw-r--r--.  1 root root 0 Mar 16 15:03 tasks
drwxr-xr-x.  2 root root 0 Mar 16 15:00 user.slice

[root@centos03 ~]# systemd-cgtop -n 1 --depth 3 | cat
/                                                                117      -     1.5G        -        -
/kubepods                                                          -      -    81.8M        -        -
/kubepods/besteffort                                               -      -    60.6M        -        -
/kubepods/besteffort/podb77ee18e-8fc3-4929-968b-94cfa9bf5185       -      -    11.8M        -        -
/kubepods/besteffort/podd35119d3-8de3-4e24-9045-7ab80d445f46       -      -    23.4M        -        -
/kubepods/besteffort/podde5b0247-b65d-45e6-8612-f453bcd1bfee       -      -    18.0M        -        -
/kubepods/besteffort/podfd30f404-b2e1-4126-9f66-c638408ab761       -      -     3.0M        -        -
/kubepods/burstable                                                -      -    21.2M        -        -
/kubepods/burstable/podf270e1f6-2f35-4dc3-9a06-bae79ab3a002        -      -    21.2M        -        -
/system.slice                                                      -      -     1.3G        -        -
/system.slice/NetworkManager.service                               2      -    16.8M        -        -
/system.slice/auditd.service                                       1      -     4.1M        -        -
/system.slice/boot-efi.mount                                       -      -    44.0K        -        -
/system.slice/boot.mount                                           -      -    48.0K        -        -
/system.slice/chronyd.service                                      1      -     2.8M        -        -
/system.slice/crond.service                                        1      -     1.0M        -        -
/system.slice/dbus.service                                         1      -     2.7M        -        -
/system.slice/dev-hugepages.mount                                  -      -    76.0K        -        -
/system.slice/dev-mqueue.mount                                     -      -   128.0K        -        -
/system.slice/gssproxy.service                                     1      -     2.4M        -        -
/system.slice/irqbalance.service                                   1      -   940.0K        -        -
/system.slice/k3s.service                                          7      -     1.0G        -        -
/system.slice/lvm2-lvmetad.service                                 1      -     3.2M        -        -
/system.slice/polkit.service                                       1      -    15.0M        -        -
/system.slice/postfix.service                                      3      -     9.4M        -        -
/system.slice/qemu-guest-agent.service                             1      -   904.0K        -        -
/system.slice/rpcbind.service                                      1      -     2.0M        -        -
/system.slice/rsyslog.service                                      1      -     4.4M        -        -
/system.slice/sshd.service                                         1      -     7.2M        -        -
/system.slice/sys-kernel-debug.mount                               -      -   664.0K        -        -
/system.slice/system-getty.slice                                   1      -   384.0K        -        -
/system.slice/system-getty.slice/getty@tty1.service                1      -        -        -        -
/system.slice/system-lvm2\x2dpvscan.slice                          -      -   392.0K        -        -
/system.slice/systemd-journald.service                             1      -     2.6M        -        -
/system.slice/systemd-logind.service                               1      -     1.6M        -        -
/system.slice/systemd-udevd.service                                1      -    17.2M        -        -
/system.slice/tuned.service                                        1      -    25.6M        -        -
/system.slice/var-lib-nfs-rpc_pipefs.mount                         -      -    12.0K        -        -
/user.slice                                                        4      -   256.4M        -        -
/user.slice/user-0.slice/session-1.scope                           4      -        -        -        -

mnorrsken · 2021-03-16T22:24:14Z

systemd logs when doing it the "wrong" way:

Mar 16 23:13:13 k3stest systemd[1]: Reloading.
Mar 16 23:13:13 k3stest systemd[1]: k3s.service: Current command vanished from the unit file, execution of the command list won't be resumed.

So this can be solved by doing systemctl disable before removing the unit files, i just tried that and it is working.

brandond · 2021-03-16T22:33:02Z

Are you saying that it is automatically triggering systemctl daemon-reload when you delete the unit file from disk? That is not something I have seen before.

mnorrsken · 2021-03-16T22:52:22Z

No, what I'm saying is that systemd "bugs out" when trying to disable an unit file that doesn't exist.

mnorrsken · 2021-03-16T22:55:54Z

You are running centos, which systemd version are you using? The issue could be Debian-centric.

mnorrsken · 2021-03-16T23:27:03Z

Indeed it seems to be an OS issue. I installed debian backports and the problem disappears.

root@k3stest:~# uname -a
Linux k3stest 5.10.0-0.bpo.3-amd64 #1 SMP Debian 5.10.13-1~bpo10+1 (2021-02-11) x86_64 GNU/Linux
root@k3stest:~# k3s --version
k3s version v1.20.4+k3s1 (838a906a)
go version go1.15.8
root@k3stest:~# systemd --version
systemd 247 (247.3-1~bpo10+1)
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified

However, debian buster is still the stable version, and anyway, the order I suggest of disabling before removing unit files is more logical (reverse order compared to installing a service).

brandond · 2021-03-18T17:01:05Z

Note for QA - this appears to only affect whatever version of systemd Debian Buster is currently shipping.

mnorrsken · 2021-03-21T10:35:45Z

I've run my install/upgrade ansible script several times now against get.k3s.io without this issue appearing on any of my debian servers, so as far as I'm concerned this issue is solved.

ShylajaDevadiga · 2021-03-31T01:05:59Z

Reproduced using k3s version v1.20.4+k3s1, following the instructions in #3035 (comment), Cpu metric reports brief 0's before reporting actual values.
$ cat /etc/debian_version
10.8

Error from server (NotFound): podmetrics.metrics.k8s.io "default/busybox" not found
Error from server (NotFound): podmetrics.metrics.k8s.io "default/busybox" not found
Error from server (NotFound): podmetrics.metrics.k8s.io "default/busybox" not found
551626208n
551626208n
551626208n
551626208n
551626208n
551626208n
551626208n
551626208n
0
0
0
0
0
0
0
0
0
0
0
551992545n
551992545n
551992545n
551992545n
551992545n
551992545n

Validated the fix in k3s version v1.20.5-rc1+k3s1, following the same insturctions, cpu metrics does not report 0.

mnorrsken mentioned this issue Mar 14, 2021

Skip systemctl disable (k3s-io#3035) #3060

Closed

mnorrsken mentioned this issue Mar 16, 2021

Remove unit files after disabling, instead of before #3081

Merged

brandond added this to the v1.20.5+k3s1 milestone Mar 18, 2021

brandond self-assigned this Mar 18, 2021

brandond added the kind/bug Something isn't working label Mar 18, 2021

davidnuzik assigned ShylajaDevadiga Mar 18, 2021

ShylajaDevadiga closed this as completed Mar 31, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cAdvisor stops updating metrics after k3s upgrade #3035

cAdvisor stops updating metrics after k3s upgrade #3035

mnorrsken commented Mar 9, 2021

mnorrsken commented Mar 9, 2021

brandond commented Mar 10, 2021

mnorrsken commented Mar 10, 2021

brandond commented Mar 11, 2021

mnorrsken commented Mar 12, 2021

mnorrsken commented Mar 13, 2021

mnorrsken commented Mar 13, 2021 •

edited

Loading

brandond commented Mar 13, 2021

mnorrsken commented Mar 13, 2021 •

edited

Loading

mnorrsken commented Mar 14, 2021

Oats87 commented Mar 16, 2021

brandond commented Mar 16, 2021

mnorrsken commented Mar 16, 2021

mnorrsken commented Mar 16, 2021

mnorrsken commented Mar 16, 2021

mnorrsken commented Mar 16, 2021

brandond commented Mar 16, 2021 •

edited

Loading

mnorrsken commented Mar 16, 2021

brandond commented Mar 16, 2021 •

edited

Loading

mnorrsken commented Mar 16, 2021

mnorrsken commented Mar 16, 2021

mnorrsken commented Mar 16, 2021

brandond commented Mar 18, 2021

mnorrsken commented Mar 21, 2021

ShylajaDevadiga commented Mar 31, 2021

cAdvisor stops updating metrics after k3s upgrade #3035

cAdvisor stops updating metrics after k3s upgrade #3035

Comments

mnorrsken commented Mar 9, 2021

mnorrsken commented Mar 9, 2021

brandond commented Mar 10, 2021

mnorrsken commented Mar 10, 2021

brandond commented Mar 11, 2021

mnorrsken commented Mar 12, 2021

mnorrsken commented Mar 13, 2021

mnorrsken commented Mar 13, 2021 • edited Loading

brandond commented Mar 13, 2021

mnorrsken commented Mar 13, 2021 • edited Loading

mnorrsken commented Mar 14, 2021

Oats87 commented Mar 16, 2021

brandond commented Mar 16, 2021

mnorrsken commented Mar 16, 2021

mnorrsken commented Mar 16, 2021

mnorrsken commented Mar 16, 2021

mnorrsken commented Mar 16, 2021

brandond commented Mar 16, 2021 • edited Loading

mnorrsken commented Mar 16, 2021

brandond commented Mar 16, 2021 • edited Loading

mnorrsken commented Mar 16, 2021

mnorrsken commented Mar 16, 2021

mnorrsken commented Mar 16, 2021

brandond commented Mar 18, 2021

mnorrsken commented Mar 21, 2021

ShylajaDevadiga commented Mar 31, 2021

mnorrsken commented Mar 13, 2021 •

edited

Loading

mnorrsken commented Mar 13, 2021 •

edited

Loading

brandond commented Mar 16, 2021 •

edited

Loading

brandond commented Mar 16, 2021 •

edited

Loading