Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

containerd does not work with cgroups v2 and systemd as cgroup driver #15633

Closed
prezha opened this issue Jan 12, 2023 · 3 comments
Closed

containerd does not work with cgroups v2 and systemd as cgroup driver #15633

prezha opened this issue Jan 12, 2023 · 3 comments
Labels
co/cgroup co/runtime/containerd kind/bug Categorizes issue or PR as related to a bug. priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@prezha
Copy link
Contributor

prezha commented Jan 12, 2023

What Happened?

if the underlying os, docker driver, containerd container runtime and kubernetes/kubelet are all configured to use systemd as cgroup v2 controller, the cluster fails with eg:

k8s v1.25.3

W0112 19:59:05.855012   14114 out.go:239] 💢  initialization failed, will try again: apply cni: cni apply: cmd: sudo /var/lib/minikube/binaries/v1.25.3/kubectl apply --kubeconfig=/var/lib/minikube/kubeconfig -f /var/tmp/minikube/cni.yaml output: 
** stderr ** 
error when retrieving current configuration of:
Resource: "rbac.authorization.k8s.io/v1, Resource=clusterroles", GroupVersionKind: "rbac.authorization.k8s.io/v1, Kind=ClusterRole"
Name: "kindnet", Namespace: ""
from server for: "/var/tmp/minikube/cni.yaml": Get "https://localhost:8443/apis/rbac.authorization.k8s.io/v1/clusterroles/kindnet": dial tcp 127.0.0.1:8443: connect: connection refused - error from a previous attempt: http2: server sent GOAWAY and closed the connection; LastStreamID=55, ErrCode=NO_ERROR, debug=""
error when retrieving current configuration of:
Resource: "rbac.authorization.k8s.io/v1, Resource=clusterrolebindings", GroupVersionKind: "rbac.authorization.k8s.io/v1, Kind=ClusterRoleBinding"
Name: "kindnet", Namespace: ""
from server for: "/var/tmp/minikube/cni.yaml": Get "https://localhost:8443/apis/rbac.authorization.k8s.io/v1/clusterrolebindings/kindnet": dial tcp 127.0.0.1:8443: connect: connection refused
error when retrieving current configuration of:
Resource: "/v1, Resource=serviceaccounts", GroupVersionKind: "/v1, Kind=ServiceAccount"
Name: "kindnet", Namespace: "kube-system"
from server for: "/var/tmp/minikube/cni.yaml": Get "https://localhost:8443/api/v1/namespaces/kube-system/serviceaccounts/kindnet": dial tcp 127.0.0.1:8443: connect: connection refused
error when retrieving current configuration of:
Resource: "apps/v1, Resource=daemonsets", GroupVersionKind: "apps/v1, Kind=DaemonSet"
Name: "kindnet", Namespace: "kube-system"
from server for: "/var/tmp/minikube/cni.yaml": Get "https://localhost:8443/apis/apps/v1/namespaces/kube-system/daemonsets/kindnet": dial tcp 127.0.0.1:8443: connect: connection refused

** /stderr **: sudo /var/lib/minikube/binaries/v1.25.3/kubectl apply --kubeconfig=/var/lib/minikube/kubeconfig -f /var/tmp/minikube/cni.yaml: Process exited with status 1

or k8s v1.26.0

W0112 20:24:09.932071   37610 out.go:239] 💢  initialization failed, will try again: apply cni: cni apply: cmd: sudo /var/lib/minikube/binaries/v1.26.0/kubectl apply --kubeconfig=/var/lib/minikube/kubeconfig -f /var/tmp/minikube/cni.yaml output: 
** stderr ** 
error when retrieving current configuration of:
Resource: "rbac.authorization.k8s.io/v1, Resource=clusterroles", GroupVersionKind: "rbac.authorization.k8s.io/v1, Kind=ClusterRole"
Name: "kindnet", Namespace: ""
from server for: "/var/tmp/minikube/cni.yaml": Get "https://localhost:8443/apis/rbac.authorization.k8s.io/v1/clusterroles/kindnet": dial tcp 127.0.0.1:8443: connect: connection refused - error from a previous attempt: http2: server sent GOAWAY and closed the connection; LastStreamID=53, ErrCode=NO_ERROR, debug=""
error when retrieving current configuration of:
Resource: "rbac.authorization.k8s.io/v1, Resource=clusterrolebindings", GroupVersionKind: "rbac.authorization.k8s.io/v1, Kind=ClusterRoleBinding"
Name: "kindnet", Namespace: ""
from server for: "/var/tmp/minikube/cni.yaml": Get "https://localhost:8443/apis/rbac.authorization.k8s.io/v1/clusterrolebindings/kindnet": dial tcp 127.0.0.1:8443: connect: connection refused
error when retrieving current configuration of:
Resource: "/v1, Resource=serviceaccounts", GroupVersionKind: "/v1, Kind=ServiceAccount"
Name: "kindnet", Namespace: "kube-system"
from server for: "/var/tmp/minikube/cni.yaml": Get "https://localhost:8443/api/v1/namespaces/kube-system/serviceaccounts/kindnet": dial tcp 127.0.0.1:8443: connect: connection refused
error when retrieving current configuration of:
Resource: "apps/v1, Resource=daemonsets", GroupVersionKind: "apps/v1, Kind=DaemonSet"
Name: "kindnet", Namespace: "kube-system"
from server for: "/var/tmp/minikube/cni.yaml": Get "https://localhost:8443/apis/apps/v1/namespaces/kube-system/daemonsets/kindnet": dial tcp 127.0.0.1:8443: connect: connection refused

** /stderr **: sudo /var/lib/minikube/binaries/v1.26.0/kubectl apply --kubeconfig=/var/lib/minikube/kubeconfig -f /var/tmp/minikube/cni.yaml: Process exited with status 1

in all other combinations - cluster works, eg:

  • (interestingly!) if underlying os, docker driver and containerd container runtime are configured to use systemd as cgroup v2 controller but kubernetes/kubelet is configured to use cgroupfs as cgroup driver, the cluster works (this is currently used as a workaround the issue in pr improve how CRs and k8s work with CNI plugins and cgroup drivers #15463)

  • if underlying os, docker driver, containerd container runtime and kubernetes/kubelet are all configured to use cgroupfs as cgroup v1 controller, the cluster works

  • if docker is used as both driver and container runtime, and the whole "stack" (ie, os, driver, container runtime and k8s/kubelet) is configured to all use either systemd (as cgroup v2 controller) xor cgroupfs (as cgroup v1 controller), the cluster works

note: if we use eg, kvm driver, then we currently use iso that boots into cgroupfs by default, so everything would use the same and hence the cluster would work

test env:

Attach the log file

systemd - docker_containerd-1.6.15_k8s-1.25.3@ubuntu-22.04 - fail.log
systemd - docker_containerd-1.6.15_k8s-1.26.0@ubuntu-22.04 - fail.log
cgroupfs - docker_containerd-1.6.15_k8s-1.25.3@ubuntu-20.04 - pass.log
systemd - docker_docker-1.6.15_k8s-1.26.0@ubuntu-22.04 - pass.log
systemd - docker_docker-1.6.15_k8s-1.25.3@ubuntu-22.04 - pass.log

Operating System

Ubuntu

Driver

Docker

@spowelljr spowelljr added kind/bug Categorizes issue or PR as related to a bug. priority/backlog Higher priority than priority/awaiting-more-evidence. co/runtime/containerd co/cgroup labels Jan 12, 2023
@prezha
Copy link
Contributor Author

prezha commented Jan 23, 2023

i've re-run the TestForceSystemd{Flag,Env} tests after #15463 was merged into master and the situation looks a bit clearer in the context of this issue - namely, using force-systemd in different combinations gives less inconsistent results:

  • docker driver with containerd container runtime on cgroupfs host: works with and without the "workaround"(*)
  • kvm2 driver with containerd container runtime on systemd host: works with and without the "workaround"(*)
  • docker driver with containerd container runtime on systemd "system": works with the "workaround"(*), but still fails without it

(*)workaround refers to this:

// TODO: investigate why containerd (v1.6.15) does not work with k8s (v1.25.3) when both are set to use systemd cgroup driver
// issue: https://github.com/kubernetes/minikube/issues/15633
// until this is fixed, the workaround is to configure kubelet to use cgroupfs when containerd is using systemd
// note: pkg/minikube/bootstrapper/bsutil/kubeadm_test.go::TestGenerateKubeadmYAML also extects this override (for now)
if cc.KubernetesConfig.ContainerRuntime == constants.Containerd && cgroupDriver == constants.SystemdCgroupDriver {
cgroupDriver = constants.CgroupfsCgroupDriver
}

logs:

docker_containerd - cgroupfs.log
kvm2_containerd - systemd.log
docker_containerd - systemd.log

@prezha
Copy link
Contributor Author

prezha commented Jan 23, 2023

brief update 2:

i've re-tested again locally - this time the @spowelljr's pr #15541 (that "embeds" updated binaries and actually applies the right container runtimes' configs), and also with the above workaround removed:
it all worked nicely with both docker and kvm2 drivers, on ubuntu 20.04 vm (cgroupfs), ubuntu 22.04 vm (systemd) and opensuse (systemd) - ie, including the third docker driver with containerd container runtime on systemd case that failed previously w/o the workaround

if we confirm that with jenkins, we can close this ticket

@prezha
Copy link
Contributor Author

prezha commented Jan 24, 2023

indeed, Docker_Linux_containerd and KVM_Linux_containerd do not have this issue anymore, so i'll close this ticket, as the containerd container runtime is now fully working with cgroups v2 and systemd as cgroup driver across the whole "stack", and so we've eliminated the workaround

@prezha prezha closed this as completed Jan 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
co/cgroup co/runtime/containerd kind/bug Categorizes issue or PR as related to a bug. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests

2 participants