Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Podman|CRI-O: unable to restart minikube once stopped #8033

Closed
elegos opened this issue May 7, 2020 · 21 comments
Closed

Podman|CRI-O: unable to restart minikube once stopped #8033

elegos opened this issue May 7, 2020 · 21 comments
Labels
co/podman-driver podman driver issues co/runtime/crio CRIO related issues kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.

Comments

@elegos
Copy link
Contributor

elegos commented May 7, 2020

Steps to reproduce the issue:

  1. Ensure SELinux is disabled or in permissive (see Linux|Podman|Docker: permission denied (selinux?) for /preloaded.tar #8016): getenforce (expected: "Permissive")
  2. Ensure using cgroup v1 (kernel boot with argument systemd.unified_cgroup_hierarchy=0), otherwise docker won't be able to run inside of podman.
  3. minikube start --driver=podman --conrtainer-runtime=cri-o (expected OK)
  4. minikube stop (expected OK)
  5. minikube start (expected error)

Full output of failed command:

~ » minikube start --alsologtostderr                                                                               elegos@localhost
I0507 11:54:48.671960   16143 start.go:99] hostinfo: {"hostname":"localhost.localdomain","uptime":858,"bootTime":1588844430,"procs":469,"os":"linux","platform":"fedora","platformFamily":"fedora","platformVersion":"32","kernelVersion":"5.6.8-300.fc32.x86_64","virtualizationSystem":"kvm","virtualizationRole":"host","hostid":"5a3e2727-374c-4665-8b07-e67a1fc66448"}
I0507 11:54:48.672982   16143 start.go:109] virtualization: kvm host
😄  minikube v1.10.0-beta.2 on Fedora 32
I0507 11:54:48.673198   16143 notify.go:125] Checking for updates...
I0507 11:54:48.673882   16143 driver.go:253] Setting default libvirt URI to qemu:///system
I0507 11:54:48.719012   16143 podman.go:95] podman version: 1.9.1
✨  Using the podman (experimental) driver based on existing profile
I0507 11:54:48.719098   16143 start.go:215] selected driver: podman
I0507 11:54:48.719108   16143 start.go:594] validating driver "podman" against &{Name:minikube KeepContext:false EmbedCerts:false MinikubeISO: KicBaseImage:gcr.io/k8s-minikube/kicbase:v0.0.10@sha256:f58e0c4662bac8a9b5dda7984b185bad8502ade5d9fa364bf2755d636ab51438 Memory:3900 CPUs:2 DiskSize:20000 Driver:podman HyperkitVpnKitSock: HyperkitVSockPorts:[] DockerEnv:[] InsecureRegistry:[] RegistryMirror:[] HostOnlyCIDR:192.168.99.1/24 HypervVirtualSwitch: HypervUseExternalSwitch:false HypervExternalAdapter: KVMNetwork:default KVMQemuURI:qemu:///system KVMGPU:false KVMHidden:false DockerOpt:[] DisableDriverMounts:false NFSShare:[] NFSSharesRoot:/nfsshares UUID: NoVTXCheck:false DNSProxy:false HostDNSResolver:true HostOnlyNicType:virtio NatNicType:virtio KubernetesConfig:{KubernetesVersion:v1.18.1 ClusterName:minikube APIServerName:minikubeCA APIServerNames:[] APIServerIPs:[] DNSDomain:cluster.local ContainerRuntime:cri-o CRISocket: NetworkPlugin:cni FeatureGates: ServiceCIDR:10.96.0.0/12 ImageRepository: LoadBalancerStartIP: LoadBalancerEndIP: ExtraOptions:[] ShouldLoadCachedImages:true EnableDefaultCNI:true NodeIP: NodePort:8443 NodeName:} Nodes:[{Name: IP:10.88.0.176 Port:8443 KubernetesVersion:v1.18.1 ControlPlane:true Worker:true}] Addons:map[] VerifyComponents:map[apiserver:true system_pods:true]}
I0507 11:54:48.719190   16143 start.go:600] status for podman: {Installed:true Healthy:true Error:<nil> Fix: Doc:}
👍  Starting control plane node minikube in cluster minikube
I0507 11:54:48.719326   16143 cache.go:104] Beginning downloading kic artifacts for podman with cri-o
I0507 11:54:48.719335   16143 cache.go:122] Driver isn't docker, skipping base-image download
I0507 11:54:48.719347   16143 preload.go:81] Checking if preload exists for k8s version v1.18.1 and runtime cri-o
W0507 11:54:53.949529   16143 preload.go:109] https://storage.googleapis.com/minikube-preloaded-volume-tarballs/preloaded-images-k8s-v3-v1.18.1-cri-o-overlay2-amd64.tar.lz4 status code: 404
I0507 11:54:53.949874   16143 profile.go:156] Saving config to /home/elegos/.minikube/profiles/minikube/config.json ...
I0507 11:54:53.949895   16143 cache.go:92] acquiring lock: {Name:mk4dd954adff8fa0d60314aad615dda579b1396d Clock:{} Delay:500ms Timeout:10m0s Cancel:<nil>}
I0507 11:54:53.949910   16143 cache.go:92] acquiring lock: {Name:mkb7be2f4ecca70480f28a14a4b699a589e40f5c Clock:{} Delay:500ms Timeout:10m0s Cancel:<nil>}
I0507 11:54:53.949918   16143 cache.go:92] acquiring lock: {Name:mk7f6cd531fe586d47c6c22ebea058eeae9b58c2 Clock:{} Delay:500ms Timeout:10m0s Cancel:<nil>}
I0507 11:54:53.949964   16143 cache.go:92] acquiring lock: {Name:mk0327cc4045dd112d4082debbde24e24b7272fd Clock:{} Delay:500ms Timeout:10m0s Cancel:<nil>}
I0507 11:54:53.949966   16143 cache.go:92] acquiring lock: {Name:mk8af40c1031f03b61cb0bb04e274c93034489c6 Clock:{} Delay:500ms Timeout:10m0s Cancel:<nil>}
I0507 11:54:53.950021   16143 cache.go:92] acquiring lock: {Name:mk5337d2759832364a20d7a798d7d258a3bb4931 Clock:{} Delay:500ms Timeout:10m0s Cancel:<nil>}
I0507 11:54:53.950035   16143 cache.go:92] acquiring lock: {Name:mk72127098ac52ad30658338364f8f84892ebed4 Clock:{} Delay:500ms Timeout:10m0s Cancel:<nil>}
I0507 11:54:53.949952   16143 cache.go:92] acquiring lock: {Name:mk4483b2d2945ee4358756130ea3cb53d5ef4261 Clock:{} Delay:500ms Timeout:10m0s Cancel:<nil>}
I0507 11:54:53.950130   16143 cache.go:100] /home/elegos/.minikube/cache/images/k8s.gcr.io/coredns_1.6.7 exists
I0507 11:54:53.949973   16143 cache.go:92] acquiring lock: {Name:mk8bd4a41f455a37ccd9a1b763eab2040dc6018d Clock:{} Delay:500ms Timeout:10m0s Cancel:<nil>}
I0507 11:54:53.950151   16143 cache.go:81] cache image "k8s.gcr.io/coredns:1.6.7" -> "/home/elegos/.minikube/cache/images/k8s.gcr.io/coredns_1.6.7" took 265.691µs
I0507 11:54:53.950164   16143 cache.go:66] save to tar file k8s.gcr.io/coredns:1.6.7 -> /home/elegos/.minikube/cache/images/k8s.gcr.io/coredns_1.6.7 succeeded
I0507 11:54:53.949924   16143 cache.go:92] acquiring lock: {Name:mk08a6384a79351ab1784a2c479de7b38add038a Clock:{} Delay:500ms Timeout:10m0s Cancel:<nil>}
I0507 11:54:53.950164   16143 cache.go:100] /home/elegos/.minikube/cache/images/k8s.gcr.io/etcd_3.4.3-0 exists
I0507 11:54:53.950226   16143 cache.go:100] /home/elegos/.minikube/cache/images/k8s.gcr.io/kube-scheduler_v1.18.1 exists
I0507 11:54:53.950235   16143 cache.go:100] /home/elegos/.minikube/cache/images/gcr.io/k8s-minikube/storage-provisioner_v1.8.1 exists
I0507 11:54:53.950254   16143 cache.go:81] cache image "gcr.io/k8s-minikube/storage-provisioner:v1.8.1" -> "/home/elegos/.minikube/cache/images/gcr.io/k8s-minikube/storage-provisioner_v1.8.1" took 306.612µs
I0507 11:54:53.950240   16143 cache.go:100] /home/elegos/.minikube/cache/images/k8s.gcr.io/kube-apiserver_v1.18.1 exists
I0507 11:54:53.950270   16143 cache.go:100] /home/elegos/.minikube/cache/images/k8s.gcr.io/kube-controller-manager_v1.18.1 exists
I0507 11:54:53.950296   16143 cache.go:81] cache image "k8s.gcr.io/kube-apiserver:v1.18.1" -> "/home/elegos/.minikube/cache/images/k8s.gcr.io/kube-apiserver_v1.18.1" took 318.032µs
I0507 11:54:53.950306   16143 cache.go:100] /home/elegos/.minikube/cache/images/k8s.gcr.io/pause_3.2 exists
I0507 11:54:53.950327   16143 cache.go:81] cache image "k8s.gcr.io/pause:3.2" -> "/home/elegos/.minikube/cache/images/k8s.gcr.io/pause_3.2" took 427.102µs
I0507 11:54:53.950305   16143 cache.go:81] cache image "k8s.gcr.io/kube-controller-manager:v1.18.1" -> "/home/elegos/.minikube/cache/images/k8s.gcr.io/kube-controller-manager_v1.18.1" took 298.302µs
I0507 11:54:53.950344   16143 cache.go:66] save to tar file k8s.gcr.io/pause:3.2 -> /home/elegos/.minikube/cache/images/k8s.gcr.io/pause_3.2 succeeded
I0507 11:54:53.950171   16143 cache.go:100] /home/elegos/.minikube/cache/images/kubernetesui/metrics-scraper_v1.0.2 exists
I0507 11:54:53.950353   16143 cache.go:66] save to tar file k8s.gcr.io/kube-controller-manager:v1.18.1 -> /home/elegos/.minikube/cache/images/k8s.gcr.io/kube-controller-manager_v1.18.1 succeeded
I0507 11:54:53.950310   16143 cache.go:66] save to tar file k8s.gcr.io/kube-apiserver:v1.18.1 -> /home/elegos/.minikube/cache/images/k8s.gcr.io/kube-apiserver_v1.18.1 succeeded
I0507 11:54:53.950257   16143 cache.go:100] /home/elegos/.minikube/cache/images/kubernetesui/dashboard_v2.0.0 exists
I0507 11:54:53.950236   16143 cache.go:81] cache image "k8s.gcr.io/etcd:3.4.3-0" -> "/home/elegos/.minikube/cache/images/k8s.gcr.io/etcd_3.4.3-0" took 325.401µs
I0507 11:54:53.950394   16143 cache.go:66] save to tar file k8s.gcr.io/etcd:3.4.3-0 -> /home/elegos/.minikube/cache/images/k8s.gcr.io/etcd_3.4.3-0 succeeded
I0507 11:54:53.950164   16143 cache.go:100] /home/elegos/.minikube/cache/images/k8s.gcr.io/kube-proxy_v1.18.1 exists
I0507 11:54:53.950396   16143 cache.go:81] cache image "kubernetesui/dashboard:v2.0.0" -> "/home/elegos/.minikube/cache/images/kubernetesui/dashboard_v2.0.0" took 414.232µs
I0507 11:54:53.950410   16143 cache.go:66] save to tar file kubernetesui/dashboard:v2.0.0 -> /home/elegos/.minikube/cache/images/kubernetesui/dashboard_v2.0.0 succeeded
I0507 11:54:53.950416   16143 cache.go:81] cache image "k8s.gcr.io/kube-proxy:v1.18.1" -> "/home/elegos/.minikube/cache/images/k8s.gcr.io/kube-proxy_v1.18.1" took 473.442µs
I0507 11:54:53.950427   16143 cache.go:66] save to tar file k8s.gcr.io/kube-proxy:v1.18.1 -> /home/elegos/.minikube/cache/images/k8s.gcr.io/kube-proxy_v1.18.1 succeeded
I0507 11:54:53.950254   16143 cache.go:81] cache image "k8s.gcr.io/kube-scheduler:v1.18.1" -> "/home/elegos/.minikube/cache/images/k8s.gcr.io/kube-scheduler_v1.18.1" took 329.872µs
I0507 11:54:53.950439   16143 cache.go:66] save to tar file k8s.gcr.io/kube-scheduler:v1.18.1 -> /home/elegos/.minikube/cache/images/k8s.gcr.io/kube-scheduler_v1.18.1 succeeded
I0507 11:54:53.950267   16143 cache.go:66] save to tar file gcr.io/k8s-minikube/storage-provisioner:v1.8.1 -> /home/elegos/.minikube/cache/images/gcr.io/k8s-minikube/storage-provisioner_v1.8.1 succeeded
I0507 11:54:53.950183   16143 cache.go:132] Successfully downloaded all kic artifacts
I0507 11:54:53.950475   16143 start.go:223] acquiring machines lock for minikube: {Name:mk54bbd76b9ba071d84e6139eee3a3cd7ecc36f4 Clock:{} Delay:500ms Timeout:15m0s Cancel:<nil>}
I0507 11:54:53.950367   16143 cache.go:81] cache image "kubernetesui/metrics-scraper:v1.0.2" -> "/home/elegos/.minikube/cache/images/kubernetesui/metrics-scraper_v1.0.2" took 487.052µs
I0507 11:54:53.950543   16143 cache.go:66] save to tar file kubernetesui/metrics-scraper:v1.0.2 -> /home/elegos/.minikube/cache/images/kubernetesui/metrics-scraper_v1.0.2 succeeded
I0507 11:54:53.950557   16143 cache.go:73] Successfully saved all images to host disk.
I0507 11:54:53.950570   16143 start.go:227] acquired machines lock for "minikube" in 82.14µs
I0507 11:54:53.950590   16143 start.go:87] Skipping create...Using existing machine configuration
I0507 11:54:53.950597   16143 fix.go:53] fixHost starting: 
I0507 11:54:53.950946   16143 cli_runner.go:108] Run: sudo podman inspect minikube --format={{.State.Status}}
I0507 11:54:54.018236   16143 fix.go:105] recreateIfNeeded on minikube: state=Stopped err=<nil>
W0507 11:54:54.018265   16143 fix.go:131] unexpected machine state, will restart: <nil>
🔄  Restarting existing podman container for "minikube" ...
I0507 11:54:54.366993   16143 fix.go:55] fixHost completed within 416.385005ms
I0507 11:54:54.367032   16143 start.go:74] releasing machines lock for "minikube", held for 416.445145ms
🤦  StartHost failed, but will try again: driver start: start: exit status 125
I0507 11:54:59.367286   16143 start.go:223] acquiring machines lock for minikube: {Name:mk54bbd76b9ba071d84e6139eee3a3cd7ecc36f4 Clock:{} Delay:500ms Timeout:15m0s Cancel:<nil>}
I0507 11:54:59.367490   16143 start.go:227] acquired machines lock for "minikube" in 168.621µs
I0507 11:54:59.367616   16143 start.go:87] Skipping create...Using existing machine configuration
I0507 11:54:59.367627   16143 fix.go:53] fixHost starting: 
I0507 11:54:59.368126   16143 cli_runner.go:108] Run: sudo podman inspect minikube --format={{.State.Status}}
I0507 11:54:59.443927   16143 fix.go:105] recreateIfNeeded on minikube: state=Stopped err=<nil>
W0507 11:54:59.443967   16143 fix.go:131] unexpected machine state, will restart: <nil>
🔄  Restarting existing podman container for "minikube" ...
I0507 11:54:59.828059   16143 fix.go:55] fixHost completed within 460.429039ms
I0507 11:54:59.828085   16143 start.go:74] releasing machines lock for "minikube", held for 460.576631ms
😿  Failed to start podman container. "minikube start" may fix it: driver start: start: exit status 125
I0507 11:54:59.828370   16143 exit.go:58] WithError(error provisioning host)=Failed to start host: driver start: start: exit status 125 called from:
goroutine 1 [running]:
runtime/debug.Stack(0x40c49a, 0x181f360, 0x199d5a0)
        /usr/lib/golang/src/runtime/debug/stack.go:24 +0x9d
k8s.io/minikube/pkg/minikube/exit.WithError(0x1a7458b, 0x17, 0x1d34700, 0xc0002cc980)
        /home/elegos/Development/minikube/pkg/minikube/exit/exit.go:58 +0x34
k8s.io/minikube/cmd/minikube/cmd.runStart(0x2a5eae0, 0xc0004061a0, 0x0, 0x1)
        /home/elegos/Development/minikube/cmd/minikube/cmd/start.go:170 +0xac2
github.com/spf13/cobra.(*Command).execute(0x2a5eae0, 0xc000406190, 0x1, 0x1, 0x2a5eae0, 0xc000406190)
        /home/elegos/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:846 +0x29d
github.com/spf13/cobra.(*Command).ExecuteC(0x2a5db20, 0x0, 0x1, 0xc000441a20)
        /home/elegos/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:950 +0x349
github.com/spf13/cobra.(*Command).Execute(...)
        /home/elegos/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:887
k8s.io/minikube/cmd/minikube/cmd.Execute()
        /home/elegos/Development/minikube/cmd/minikube/cmd/root.go:108 +0x691
main.main()
        /home/elegos/Development/minikube/cmd/minikube/main.go:66 +0xe6
W0507 11:54:59.828658   16143 out.go:201] error provisioning host: Failed to start host: driver start: start: exit status 125

💣  error provisioning host: Failed to start host: driver start: start: exit status 125

😿  minikube is exiting due to an error. If the above message is not useful, open an issue:
👉  https://github.com/kubernetes/minikube/issues/new/choose

Podman's minikube log

INFO: ensuring we can execute /bin/mount even with userns-remap
INFO: remounting /sys read-only
INFO: making mounts shared
INFO: fix cgroup mounts for all subsystems
INFO: clearing and regenerating /etc/machine-id
Initializing machine ID from random generator.
INFO: faking /sys/class/dmi/id/product_name to be "kind"
INFO: faking /sys/class/dmi/id/product_uuid to be random
INFO: faking /sys/devices/virtual/dmi/id/product_uuid as well
INFO: setting iptables to detected mode: legacy
systemd 242 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid)
Detected virtualization container-other.
Detected architecture x86-64.

Welcome to Ubuntu 19.10!

Set hostname to <minikube>.
Failed to bump fs.file-max, ignoring: Invalid argument
Failed to attach 1 to compat systemd cgroup /libpod_parent/libpod-18e0adda97531bd42235adfb494e1b10a91313e669697e99c21296be0f01a101/init.scope: No such file or directory
Failed to open pin file: No such file or directory
Failed to allocate manager object: No such file or directory
[!!!!!!] Failed to allocate manager object.
Exiting PID 1...

Searching on internet, I found out that balena OS had a very similar issue (see: https://forums.balena.io/t/container-reboot-and-failed-to-attach-1-to-compat-systemd-cgroup/27739/24). They suggested to resolve the issue running systemd.legacy_systemd_cgroup_controller before starting systemd (see solution in the linked issue).

Thank you :)

@elegos elegos changed the title Podman|CRI-O: unable to restart minikube once stopped ( Podman|CRI-O: unable to restart minikube once stopped May 7, 2020
@afbjorklund
Copy link
Collaborator

Sounds like the cgroup manager again #7996 ? Or is there a subtle difference here ?

@afbjorklund afbjorklund added co/podman-driver podman driver issues co/runtime/crio CRIO related issues priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels May 7, 2020
@elegos
Copy link
Contributor Author

elegos commented May 7, 2020

It is still cgroup, but I think it's different as the runtime might (I don't know the minikube internals) act differently depending on the container runtime.

@afbjorklund
Copy link
Collaborator

Does PR #8001 help here ?

Since you have issues starting the system container, I don't think it even gets as far as cri-o

@elegos
Copy link
Contributor Author

elegos commented May 7, 2020

Actually I am testing minikube with your patch applied, otherwise as you're saying it won't start even the first time.

@afbjorklund
Copy link
Collaborator

afbjorklund commented May 7, 2020

Wonder why systemd worked in the previous podman container, but not in this one ?

Or maybe that is what it didn't, and the same issue is showing here. Will try to reproduce.

@elegos
Copy link
Contributor Author

elegos commented May 7, 2020

I've just downloaded the master branch again and, without applying the patch and having the same environment as this bug, the error is different. So yes, the patch seems to help, but it also seems that it's not complete.

For reference, I'm showing you the log of the (re)start of minikube with the "vanilla" master branch:

~/Development/minikube(master) » minikube start --alsologtostderr                                             70 ↵ elegos@localhost
I0507 18:11:19.077970   50703 start.go:99] hostinfo: {"hostname":"localhost.localdomain","uptime":18485,"bootTime":1588849394,"procs":601,"os":"linux","platform":"fedora","platformFamily":"fedora","platformVersion":"32","kernelVersion":"5.6.8-300.fc32.x86_64","virtualizationSystem":"kvm","virtualizationRole":"host","hostid":"5a3e2727-374c-4665-8b07-e67a1fc66448"}
I0507 18:11:19.078790   50703 start.go:109] virtualization: kvm host
😄  minikube v1.10.0-beta.2 on Fedora 32
I0507 18:11:19.078939   50703 notify.go:125] Checking for updates...
I0507 18:11:19.079318   50703 driver.go:253] Setting default libvirt URI to qemu:///system
I0507 18:11:19.132923   50703 podman.go:95] podman version: 1.9.1
✨  Using the podman (experimental) driver based on existing profile
I0507 18:11:19.133005   50703 start.go:215] selected driver: podman
I0507 18:11:19.133012   50703 start.go:594] validating driver "podman" against &{Name:minikube KeepContext:false EmbedCerts:false MinikubeISO: KicBaseImage:gcr.io/k8s-minikube/kicbase:v0.0.10@sha256:f58e0c4662bac8a9b5dda7984b185bad8502ade5d9fa364bf2755d636ab51438 Memory:3900 CPUs:2 DiskSize:20000 Driver:podman HyperkitVpnKitSock: HyperkitVSockPorts:[] DockerEnv:[] InsecureRegistry:[] RegistryMirror:[] HostOnlyCIDR:192.168.99.1/24 HypervVirtualSwitch: HypervUseExternalSwitch:false HypervExternalAdapter: KVMNetwork:default KVMQemuURI:qemu:///system KVMGPU:false KVMHidden:false DockerOpt:[] DisableDriverMounts:false NFSShare:[] NFSSharesRoot:/nfsshares UUID: NoVTXCheck:false DNSProxy:false HostDNSResolver:true HostOnlyNicType:virtio NatNicType:virtio KubernetesConfig:{KubernetesVersion:v1.18.1 ClusterName:minikube APIServerName:minikubeCA APIServerNames:[] APIServerIPs:[] DNSDomain:cluster.local ContainerRuntime:cri-o CRISocket: NetworkPlugin:cni FeatureGates: ServiceCIDR:10.96.0.0/12 ImageRepository: LoadBalancerStartIP: LoadBalancerEndIP: ExtraOptions:[] ShouldLoadCachedImages:true EnableDefaultCNI:true NodeIP: NodePort:8443 NodeName:} Nodes:[{Name: IP: Port:8443 KubernetesVersion:v1.18.1 ControlPlane:true Worker:true}] Addons:map[default-storageclass:true storage-provisioner:true] VerifyComponents:map[apiserver:true system_pods:true]}
I0507 18:11:19.133094   50703 start.go:600] status for podman: {Installed:true Healthy:true Error:<nil> Fix: Doc:}
👍  Starting control plane node minikube in cluster minikube
I0507 18:11:19.133213   50703 cache.go:104] Beginning downloading kic artifacts for podman with cri-o
I0507 18:11:19.133222   50703 cache.go:122] Driver isn't docker, skipping base-image download
I0507 18:11:19.133233   50703 preload.go:81] Checking if preload exists for k8s version v1.18.1 and runtime cri-o
W0507 18:11:19.362432   50703 preload.go:109] https://storage.googleapis.com/minikube-preloaded-volume-tarballs/preloaded-images-k8s-v3-v1.18.1-cri-o-overlay2-amd64.tar.lz4 status code: 404
I0507 18:11:19.362848   50703 profile.go:156] Saving config to /home/elegos/.minikube/profiles/minikube/config.json ...
I0507 18:11:19.362924   50703 cache.go:92] acquiring lock: {Name:mk0327cc4045dd112d4082debbde24e24b7272fd Clock:{} Delay:500ms Timeout:10m0s Cancel:<nil>}
I0507 18:11:19.362949   50703 cache.go:92] acquiring lock: {Name:mkb7be2f4ecca70480f28a14a4b699a589e40f5c Clock:{} Delay:500ms Timeout:10m0s Cancel:<nil>}
I0507 18:11:19.362991   50703 cache.go:92] acquiring lock: {Name:mk8bd4a41f455a37ccd9a1b763eab2040dc6018d Clock:{} Delay:500ms Timeout:10m0s Cancel:<nil>}
I0507 18:11:19.362993   50703 cache.go:92] acquiring lock: {Name:mk4483b2d2945ee4358756130ea3cb53d5ef4261 Clock:{} Delay:500ms Timeout:10m0s Cancel:<nil>}
I0507 18:11:19.362999   50703 cache.go:92] acquiring lock: {Name:mk7f6cd531fe586d47c6c22ebea058eeae9b58c2 Clock:{} Delay:500ms Timeout:10m0s Cancel:<nil>}
I0507 18:11:19.363007   50703 cache.go:92] acquiring lock: {Name:mk72127098ac52ad30658338364f8f84892ebed4 Clock:{} Delay:500ms Timeout:10m0s Cancel:<nil>}
I0507 18:11:19.362976   50703 cache.go:92] acquiring lock: {Name:mk5337d2759832364a20d7a798d7d258a3bb4931 Clock:{} Delay:500ms Timeout:10m0s Cancel:<nil>}
I0507 18:11:19.362955   50703 cache.go:92] acquiring lock: {Name:mk08a6384a79351ab1784a2c479de7b38add038a Clock:{} Delay:500ms Timeout:10m0s Cancel:<nil>}
I0507 18:11:19.362924   50703 cache.go:92] acquiring lock: {Name:mk4dd954adff8fa0d60314aad615dda579b1396d Clock:{} Delay:500ms Timeout:10m0s Cancel:<nil>}
I0507 18:11:19.363014   50703 cache.go:92] acquiring lock: {Name:mk8af40c1031f03b61cb0bb04e274c93034489c6 Clock:{} Delay:500ms Timeout:10m0s Cancel:<nil>}
I0507 18:11:19.363120   50703 cache.go:132] Successfully downloaded all kic artifacts
I0507 18:11:19.363238   50703 start.go:223] acquiring machines lock for minikube: {Name:mk54bbd76b9ba071d84e6139eee3a3cd7ecc36f4 Clock:{} Delay:500ms Timeout:15m0s Cancel:<nil>}
I0507 18:11:19.363258   50703 cache.go:100] /home/elegos/.minikube/cache/images/k8s.gcr.io/kube-proxy_v1.18.1 exists
I0507 18:11:19.363270   50703 cache.go:100] /home/elegos/.minikube/cache/images/k8s.gcr.io/pause_3.2 exists
I0507 18:11:19.363286   50703 cache.go:81] cache image "k8s.gcr.io/kube-proxy:v1.18.1" -> "/home/elegos/.minikube/cache/images/k8s.gcr.io/kube-proxy_v1.18.1" took 386.692µs
I0507 18:11:19.363305   50703 cache.go:100] /home/elegos/.minikube/cache/images/k8s.gcr.io/kube-scheduler_v1.18.1 exists
I0507 18:11:19.363304   50703 cache.go:81] cache image "k8s.gcr.io/pause:3.2" -> "/home/elegos/.minikube/cache/images/k8s.gcr.io/pause_3.2" took 379.583µs
I0507 18:11:19.363323   50703 cache.go:100] /home/elegos/.minikube/cache/images/k8s.gcr.io/kube-apiserver_v1.18.1 exists
I0507 18:11:19.363324   50703 cache.go:81] cache image "k8s.gcr.io/kube-scheduler:v1.18.1" -> "/home/elegos/.minikube/cache/images/k8s.gcr.io/kube-scheduler_v1.18.1" took 348.182µs
I0507 18:11:19.363337   50703 cache.go:66] save to tar file k8s.gcr.io/kube-scheduler:v1.18.1 -> /home/elegos/.minikube/cache/images/k8s.gcr.io/kube-scheduler_v1.18.1 succeeded
I0507 18:11:19.363343   50703 cache.go:81] cache image "k8s.gcr.io/kube-apiserver:v1.18.1" -> "/home/elegos/.minikube/cache/images/k8s.gcr.io/kube-apiserver_v1.18.1" took 360.822µs
I0507 18:11:19.363357   50703 cache.go:100] /home/elegos/.minikube/cache/images/k8s.gcr.io/etcd_3.4.3-0 exists
I0507 18:11:19.363369   50703 cache.go:66] save to tar file k8s.gcr.io/kube-apiserver:v1.18.1 -> /home/elegos/.minikube/cache/images/k8s.gcr.io/kube-apiserver_v1.18.1 succeeded
I0507 18:11:19.363307   50703 cache.go:66] save to tar file k8s.gcr.io/kube-proxy:v1.18.1 -> /home/elegos/.minikube/cache/images/k8s.gcr.io/kube-proxy_v1.18.1 succeeded
I0507 18:11:19.363325   50703 cache.go:66] save to tar file k8s.gcr.io/pause:3.2 -> /home/elegos/.minikube/cache/images/k8s.gcr.io/pause_3.2 succeeded
I0507 18:11:19.363299   50703 cache.go:100] /home/elegos/.minikube/cache/images/k8s.gcr.io/coredns_1.6.7 exists
I0507 18:11:19.363295   50703 cache.go:100] /home/elegos/.minikube/cache/images/kubernetesui/metrics-scraper_v1.0.2 exists
I0507 18:11:19.363410   50703 cache.go:100] /home/elegos/.minikube/cache/images/gcr.io/k8s-minikube/storage-provisioner_v1.8.1 exists
I0507 18:11:19.363409   50703 cache.go:81] cache image "k8s.gcr.io/coredns:1.6.7" -> "/home/elegos/.minikube/cache/images/k8s.gcr.io/coredns_1.6.7" took 491.783µs
I0507 18:11:19.363432   50703 cache.go:81] cache image "gcr.io/k8s-minikube/storage-provisioner:v1.8.1" -> "/home/elegos/.minikube/cache/images/gcr.io/k8s-minikube/storage-provisioner_v1.8.1" took 495.563µs
I0507 18:11:19.363382   50703 cache.go:81] cache image "k8s.gcr.io/etcd:3.4.3-0" -> "/home/elegos/.minikube/cache/images/k8s.gcr.io/etcd_3.4.3-0" took 449.973µs
I0507 18:11:19.363455   50703 cache.go:66] save to tar file gcr.io/k8s-minikube/storage-provisioner:v1.8.1 -> /home/elegos/.minikube/cache/images/gcr.io/k8s-minikube/storage-provisioner_v1.8.1 succeeded
I0507 18:11:19.363458   50703 cache.go:66] save to tar file k8s.gcr.io/etcd:3.4.3-0 -> /home/elegos/.minikube/cache/images/k8s.gcr.io/etcd_3.4.3-0 succeeded
I0507 18:11:19.363432   50703 cache.go:100] /home/elegos/.minikube/cache/images/kubernetesui/dashboard_v2.0.0 exists
I0507 18:11:19.363494   50703 cache.go:81] cache image "kubernetesui/dashboard:v2.0.0" -> "/home/elegos/.minikube/cache/images/kubernetesui/dashboard_v2.0.0" took 541.443µs
I0507 18:11:19.363507   50703 cache.go:66] save to tar file kubernetesui/dashboard:v2.0.0 -> /home/elegos/.minikube/cache/images/kubernetesui/dashboard_v2.0.0 succeeded
I0507 18:11:19.363447   50703 cache.go:66] save to tar file k8s.gcr.io/coredns:1.6.7 -> /home/elegos/.minikube/cache/images/k8s.gcr.io/coredns_1.6.7 succeeded
I0507 18:11:19.363309   50703 start.go:227] acquired machines lock for "minikube" in 62.631µs
I0507 18:11:19.363536   50703 start.go:87] Skipping create...Using existing machine configuration
I0507 18:11:19.363544   50703 fix.go:53] fixHost starting: 
I0507 18:11:19.363456   50703 cache.go:100] /home/elegos/.minikube/cache/images/k8s.gcr.io/kube-controller-manager_v1.18.1 exists
I0507 18:11:19.363421   50703 cache.go:81] cache image "kubernetesui/metrics-scraper:v1.0.2" -> "/home/elegos/.minikube/cache/images/kubernetesui/metrics-scraper_v1.0.2" took 507.973µs
I0507 18:11:19.363647   50703 cache.go:81] cache image "k8s.gcr.io/kube-controller-manager:v1.18.1" -> "/home/elegos/.minikube/cache/images/k8s.gcr.io/kube-controller-manager_v1.18.1" took 675.343µs
I0507 18:11:19.363666   50703 cache.go:66] save to tar file k8s.gcr.io/kube-controller-manager:v1.18.1 -> /home/elegos/.minikube/cache/images/k8s.gcr.io/kube-controller-manager_v1.18.1 succeeded
I0507 18:11:19.363662   50703 cache.go:66] save to tar file kubernetesui/metrics-scraper:v1.0.2 -> /home/elegos/.minikube/cache/images/kubernetesui/metrics-scraper_v1.0.2 succeeded
I0507 18:11:19.363687   50703 cache.go:73] Successfully saved all images to host disk.
I0507 18:11:19.363857   50703 cli_runner.go:108] Run: sudo podman inspect minikube --format={{.State.Status}}
I0507 18:11:19.432275   50703 fix.go:105] recreateIfNeeded on minikube: state=Stopped err=<nil>
W0507 18:11:19.432293   50703 fix.go:131] unexpected machine state, will restart: <nil>
🔄  Restarting existing podman container for "minikube" ...
I0507 18:11:19.432485   50703 exec_runner.go:49] Run: sudo podman start minikube
I0507 18:11:19.802654   50703 fix.go:55] fixHost completed within 439.108284ms
I0507 18:11:19.802669   50703 start.go:74] releasing machines lock for "minikube", held for 439.141274ms
🤦  StartHost failed, but will try again: driver start: start: sudo podman start minikube: exit status 125
stdout:

stderr:
Error: unable to start container "minikube": container_linux.go:349: starting container process caused "process_linux.go:297: applying cgroup configuration for process caused \"Invalid unit name 'libpod_parent'\"": OCI runtime error

I0507 18:11:24.802882   50703 start.go:223] acquiring machines lock for minikube: {Name:mk54bbd76b9ba071d84e6139eee3a3cd7ecc36f4 Clock:{} Delay:500ms Timeout:15m0s Cancel:<nil>}
I0507 18:11:24.803194   50703 start.go:227] acquired machines lock for "minikube" in 278.121µs
I0507 18:11:24.803222   50703 start.go:87] Skipping create...Using existing machine configuration
I0507 18:11:24.803235   50703 fix.go:53] fixHost starting: 
I0507 18:11:24.803713   50703 cli_runner.go:108] Run: sudo podman inspect minikube --format={{.State.Status}}
I0507 18:11:24.878582   50703 fix.go:105] recreateIfNeeded on minikube: state=Stopped err=<nil>
W0507 18:11:24.878599   50703 fix.go:131] unexpected machine state, will restart: <nil>
🔄  Restarting existing podman container for "minikube" ...
I0507 18:11:24.878730   50703 exec_runner.go:49] Run: sudo podman start minikube
I0507 18:11:25.252844   50703 fix.go:55] fixHost completed within 449.608084ms
I0507 18:11:25.252861   50703 start.go:74] releasing machines lock for "minikube", held for 449.652484ms
😿  Failed to start podman container. "minikube start" may fix it: driver start: start: sudo podman start minikube: exit status 125
stdout:

stderr:
Error: unable to start container "minikube": container_linux.go:349: starting container process caused "process_linux.go:297: applying cgroup configuration for process caused \"Invalid unit name 'libpod_parent'\"": OCI runtime error

I0507 18:11:25.252982   50703 exit.go:58] WithError(error provisioning host)=Failed to start host: driver start: start: sudo podman start minikube: exit status 125
stdout:

stderr:
Error: unable to start container "minikube": container_linux.go:349: starting container process caused "process_linux.go:297: applying cgroup configuration for process caused \"Invalid unit name 'libpod_parent'\"": OCI runtime error
 called from:
goroutine 1 [running]:
runtime/debug.Stack(0x40c49a, 0x181f360, 0x1804900)
        /usr/lib/golang/src/runtime/debug/stack.go:24 +0x9d
k8s.io/minikube/pkg/minikube/exit.WithError(0x1a7457e, 0x17, 0x1d34680, 0xc000a5aa20)
        /home/elegos/Development/minikube/pkg/minikube/exit/exit.go:58 +0x34
k8s.io/minikube/cmd/minikube/cmd.runStart(0x2a5eae0, 0xc00078ae00, 0x0, 0x1)
        /home/elegos/Development/minikube/cmd/minikube/cmd/start.go:170 +0xac2
github.com/spf13/cobra.(*Command).execute(0x2a5eae0, 0xc00078adf0, 0x1, 0x1, 0x2a5eae0, 0xc00078adf0)
        /home/elegos/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:846 +0x29d
github.com/spf13/cobra.(*Command).ExecuteC(0x2a5db20, 0x0, 0x1, 0xc000508460)
        /home/elegos/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:950 +0x349
github.com/spf13/cobra.(*Command).Execute(...)
        /home/elegos/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:887
k8s.io/minikube/cmd/minikube/cmd.Execute()
        /home/elegos/Development/minikube/cmd/minikube/cmd/root.go:108 +0x691
main.main()
        /home/elegos/Development/minikube/cmd/minikube/main.go:66 +0xe6
W0507 18:11:25.253174   50703 out.go:201] error provisioning host: Failed to start host: driver start: start: sudo podman start minikube: exit status 125
stdout:

stderr:
Error: unable to start container "minikube": container_linux.go:349: starting container process caused "process_linux.go:297: applying cgroup configuration for process caused \"Invalid unit name 'libpod_parent'\"": OCI runtime error

💣  error provisioning host: Failed to start host: driver start: start: sudo podman start minikube: exit status 125
stdout:

stderr:
Error: unable to start container "minikube": container_linux.go:349: starting container process caused "process_linux.go:297: applying cgroup configuration for process caused \"Invalid unit name 'libpod_parent'\"": OCI runtime error


😿  minikube is exiting due to an error. If the above message is not useful, open an issue:
👉  https://github.com/kubernetes/minikube/issues/new/choose

@afbjorklund
Copy link
Collaborator

Changing the host podman to use systemd/journald by default reproduces the issue.

# CGroup Manager - valid values are "systemd" and "cgroupfs"
cgroup_manager = "systemd"

# Selects which logging mechanism to use for Podman events.  Valid values
# are `journald` or `file`.
events_logger = "journald"

@sharifelgamal sharifelgamal added the kind/bug Categorizes issue or PR as related to a bug. label May 7, 2020
@afbjorklund
Copy link
Collaborator

afbjorklund commented May 7, 2020

Not sure why it didn't use runCmd, but whatever.

Here is the unhelpful error message, when you do:

WithError(error provisioning host)=Failed to start host: driver start: start: sudo podman start --cgroup-manager cgroupfs minikube: exit status 125
stdout:

stderr:
Error: unable to start container "minikube": container create failed (no logs from conmon): EOF

EDIT: Seemed to be a known error with podman 1.6.2, will try again. (should stop using that PPA)

With podman 1.9.1, minikube goes back into waiting for the crashed container (with above error)

@afbjorklund
Copy link
Collaborator

Seems to be either a kernel or a compile option, neither is very helpful in this case ?

        * The default control group setup mode may be selected both a boot-time
          via a set of kernel command line parameters (specifically:
          systemd.unified_cgroup_hierarchy= and
          systemd.legacy_systemd_cgroup_controller=), as well as a compile-time
          default selected on the configure command line
          (--with-default-hierarchy=). The upstream default is "hybrid"

@elegos
Copy link
Contributor Author

elegos commented May 7, 2020

I've tried setting systemd.legacy_systemd_cgroup_controller=yes in the host's kernel options with no result...

@elegos
Copy link
Contributor Author

elegos commented May 7, 2020

Btw the unhelpful error message might be integrated with the podman's log (as reported in the thread)

Failed to attach 1 to compat systemd cgroup /libpod_parent/libpod-18e0adda97531bd42235adfb494e1b10a91313e669697e99c21296be0f01a101/init.scope: No such file or directory

Strange thing is that it's happening only at the restart, not at the first start...

@afbjorklund
Copy link
Collaborator

afbjorklund commented May 7, 2020

It doesn't help that the path is missing or having duplicated components, either.

This is the file created by start 1:
/sys/fs/cgroup/systemd/libpod_parent/libpod-8e2ae2e765b057377981d53b15b9deb1f4d88bde88e24ac0b64e8e63ef900d15/libpod_parent/libpod-8e2ae2e765b057377981d53b15b9deb1f4d88bde88e24ac0b64e8e63ef900d15/init.scope

And here is the bogus file in start 2:
Failed to attach 1 to compat systemd cgroup /libpod_parent/libpod-8e2ae2e765b057377981d53b15b9deb1f4d88bde88e24ac0b64e8e63ef900d15/init.scope: No such file or directory

Basically a fight between two buggy pieces of software (podman and systemd)

And the whole cgroups "v1" vs "v2" is not helping either, not a stable platform this.

@elegos
Copy link
Contributor Author

elegos commented May 7, 2020

So actually someone, either podman or systemd, is missing the prefixed path, right? Is there any way we can try to debug this issue?

Also in theory (AFAIK) both podman and CRI-O should support cgroup v2 out of the box (but not docker and containerd)... shouldn't the couple be running just fine with cgroup v2? (obviously this is not the case)

@afbjorklund
Copy link
Collaborator

As far as I know, it is podman that has an issue with cgroup manager=systemd and kubernetes that has an issue with cgroups v2 ? And docker and also runc, so might have to switch to crun as well...

@afbjorklund
Copy link
Collaborator

Maybe looking in the wrong place ? We do get the same error from docker too

Failed to bump fs.file-max, ignoring: Invalid argument
Failed to attach 1 to compat systemd cgroup /docker/b452a2f2ccf627fb16b9de4d08c9185acde8c73f0a5facb8c4403ef88fd6208c/init.scope: No such file or directory
/lib/systemd/system/docker.socket:5: ListenStream= references a path below legacy directory /var/run/, updating /var/run/docker.sock → /run/docker.sock; please update the unit file accordingly.

But we don't get those other errors after the attach, that we get with podman:

Failed to bump fs.file-max, ignoring: Invalid argument
Failed to attach 1 to compat systemd cgroup /libpod_parent/libpod-90f917f60f4968c14a6be92e5c8a66102a2994713d5d3fee405014e12fb053df/init.scope: No such file or directory
Failed to open pin file: No such file or directory
Failed to allocate manager object: No such file or directory
[!!!!!!] Failed to allocate manager object.
Exiting PID 1...

@afbjorklund
Copy link
Collaborator

This could be related: (but again, same for both drivers)
Found cgroup2 on /sys/fs/cgroup/unified, unified hierarchy for systemd controller

@afbjorklund
Copy link
Collaborator

I think it is as easy as the /sys being mounted read-only (ro).
But in the entrypoint we have a function that is mounting docker as bind (rw), but not for podman.

TARGET                                                                               SOURCE                                                                 FSTYPE OPTIONS
/sys/fs/cgroup/systemd/docker/057d7636a84ac0644e1df8acf88a2c34ab17d7b4046db8862efb0c5a49fea879
                                                                                     cgroup[/docker/057d7636a84ac0644e1df8acf88a2c34ab17d7b4046db8862efb0c5a49fea879]
                                                                                                                                                            cgroup rw,nosu

Apparently Failed to allocate manager object: No such file or directory is a normal error.
So maybe we need to do the same thing, for libpod_parent

https://github.com/kubernetes-sigs/kind/blob/master/images/base/files/usr/local/bin/entrypoint#L48_L73

@afbjorklund
Copy link
Collaborator

Yes, that seems to have done the trick! (after start/stop) Custom entrypoint:

😄  minikube v1.10.0-beta.2 on Ubuntu 18.04
✨  Using the podman (experimental) driver based on existing profile
👍  Starting control plane node minikube in cluster minikube
🔄  Restarting existing podman container for "minikube" ...
🎁  Preparing Kubernetes v1.18.1 on CRI-O 1.17.3 ...
🌟  Enabled addons: default-storageclass, storage-provisioner
🏄  Done! kubectl is now configured to use "minikube"

And now it can "allocate" the manager "object" again:

Failed to bump fs.file-max, ignoring: Invalid argument
Failed to attach 1 to compat systemd cgroup /libpod_parent/libpod-79cbc36340b92f82aff8a14148b3519b6d65fe36cb807467e8dff83617a0a430/init.scope: No such file or directory
/lib/systemd/system/docker.socket:5: ListenStream= references a path below legacy directory /var/run/, updating /var/run/docker.sock → /run/docker.sock; please update the unit file accordingly.

Basically the same code as for docker, for looping the cgroups.
The only difference being "libpod_parent", and not "podman"...

@afbjorklund afbjorklund added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels May 7, 2020
@elegos
Copy link
Contributor Author

elegos commented May 8, 2020

@afbjorklund is it possible to inject the podman support in the entrypoint somehow, or we have to edit the kubernete's image? I'm afraid I know too little to dig into this for a PR from my side... (was thinking about --entrypoint blablabla && /entrypoint.sh - or a proper patch)

@afbjorklund
Copy link
Collaborator

@elegos : the entrypoint comes from the KIND project, they don't have podman support (yet?). We should probably replace/patch in the image - eventually upstream - but for testing I just mounted it...

-v /tmp/entrypoint:/usr/local/bin/entrypoint

containers/podman#4373 (comment)

Just for the reference, KinD does not work with podman. There are many small things need to be addressed in KinD project to make it work properly.

kubernetes-sigs/kind#154 (comment)

Note that the podman support is experimental and has many limitations, I don't recommend it yet and we're not committed to keeping it for sure, certain desirable features seem to be infeasible

But this should at least fix one (additional) podman issue.

Will make a kind PR for it, or if we fork the entrypoint (#7788)

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 6, 2020
@medyagh medyagh closed this as completed Aug 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
co/podman-driver podman driver issues co/runtime/crio CRIO related issues kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

No branches or pull requests

6 participants