Skip to content
This repository has been archived by the owner on Dec 15, 2021. It is now read-only.

Trigger pods fail on microk8s #1046

Closed
mehdisadeghi opened this issue Jun 21, 2019 · 12 comments
Closed

Trigger pods fail on microk8s #1046

mehdisadeghi opened this issue Jun 21, 2019 · 12 comments

Comments

@mehdisadeghi
Copy link

mehdisadeghi commented Jun 21, 2019

Is this a BUG REPORT or FEATURE REQUEST?:
This is bug on fresh Ubuntu 18.04 and microk8s single node cluster installation.

What happened:
Functions do not run on schedule due to error.

What you expected to happen:
Functions should run on schedule.

How to reproduce it (as minimally and precisely as possible):

snap install microk8s --classic
microk8s.kubectl config view --raw > ~/.kube/config
microk8s.kubectl create ns kubeless
export RELEASE=$(curl -s https://api.github.com/repos/kubeless/kubeless/releases/latest | grep tag_name | cut -d '"' -f 4)
microk8s.kubectl create -f https://github.com/kubeless/kubeless/releases/download/$RELEASE/kubeless-$RELEASE.yaml
echo "def hello(event, context): pass" > f1.py
kubeless function deploy f1 --from-file f1.py --handler f1.hello --runtime python3.7 --schedule '*/3 * * * *'

Function deploy and runs when called perfectly. The schedule fails however.

    NAMESPACE     NAME                                                  READY   STATUS              RESTARTS   AGE
default       pod/f1-d8684fdc6-67q9f                                1/1     Running             0          9m21s
default       pod/trigger-f1-1561121460-75srj                       0/1     StartError          0          8m51s
default       pod/trigger-f1-1561121460-8db7h                       0/1     StartError          0          7m52s

The error is the same for each trigger pod:

    ➜  kubeless-playbook git:(master) ✗ kubectl describe pod trigger-f1-1561121280-4rxq6                  
Name:               trigger-f1-1561121280-4rxq6
Namespace:          default
Priority:           0
PriorityClassName:  <none>
Node:               ubuntu-2gb-nbg1-1/195.201.121.180
Start Time:         Fri, 21 Jun 2019 14:50:35 +0200
Labels:             controller-uid=29988236-9423-11e9-9d3f-960000290a27
                    job-name=trigger-f1-1561121280
Annotations:        <none>
Status:             Failed
IP:                 10.1.1.11
Controlled By:      Job/trigger-f1-1561121280
Containers:
  trigger:
    Container ID:  containerd://4ba957cde7b4ac318bea2f4e5e5fa3e2f983244e735d789256466dad387cd801
    Image:         kubeless/unzip@sha256:f162c062973cca05459834de6ed14c039d45df8cdb76097f50b028a1621b3697
    Image ID:      docker.io/kubeless/unzip@sha256:f162c062973cca05459834de6ed14c039d45df8cdb76097f50b028a1621b3697
    Port:          <none>
    Host Port:     <none>
    Args:
      curl
      -Lv
       -H "event-id: MtO_OBDHj2ntXL8" -H "event-time: 2019-06-21 12:50:26.356815543 +0000 UTC" -H "event-type: application/json" -H "event-namespace: cronjobtrigger.kubeless.io"
      http://f1.default.svc.cluster.local:8080
    State:          Terminated
      Reason:       StartError
      Message:      failed to create containerd task: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:424: container init caused \"read init-p: connection reset by peer\"": unknown
      Exit Code:    128
      Started:      Thu, 01 Jan 1970 01:00:00 +0100
      Finished:     Fri, 21 Jun 2019 14:50:43 +0200
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     1m
      memory:  4Mi
    Requests:
      cpu:        1m
      memory:     4Mi
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-c2s6q (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  default-token-c2s6q:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-c2s6q
    Optional:    false
QoS Class:       Guaranteed
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age    From                        Message
  ----     ------     ----   ----                        -------
  Normal   Scheduled  2m53s  default-scheduler           Successfully assigned default/trigger-f1-1561121280-4rxq6 to ubuntu-2gb-nbg1-1
  Normal   Pulled     2m50s  kubelet, ubuntu-2gb-nbg1-1  Container image "kubeless/unzip@sha256:f162c062973cca05459834de6ed14c039d45df8cdb76097f50b028a1621b3697" already present on machine
  Normal   Created    2m49s  kubelet, ubuntu-2gb-nbg1-1  Created container trigger
  Warning  Failed     2m45s  kubelet, ubuntu-2gb-nbg1-1  Error: failed to create containerd task: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:424: container init caused \"read init-p: connection reset by peer\"": unknown

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
	Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.3", GitCommit:"5e53fd6bc17c0dec8434817e69b04a25d8ae0ff0", GitTreeState:"archive", BuildDate:"2019-06-08T16:22:01Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
	Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.2", GitCommit:"66049e3b21efe110454d67df4fa62b08ea79a19b", GitTreeState:"clean", BuildDate:"2019-05-16T16:14:56Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
  • Kubeless version (use kubeless version):
	Kubeless version: v1.0.3-9-g2c847c48
  • Cloud provider or physical cluster:
	hetzner
@andresmgot
Copy link
Contributor

This issue may be related:

docker/for-linux#597

Some people are reporting there issues when running docker in Ubuntu 18.04 but apparently that can be fixed after upgrading/restarting. Can you check if that helps you with your issue?

@mehdisadeghi
Copy link
Author

@andresmgot Thanks for your reply.

Apaprently microk8s snap on Ubuntu 18.04 does not use docker and uses containerd directly. There is no docker daemon running on my host.

@mehdisadeghi
Copy link
Author

This is what I get every few minutes in the system logs:

Jun 24 10:20:40 k1 microk8s.daemon-controller-manager[14622]: I0624 10:20:40.108274   14622 event.go:258] Event(v1.ObjectReference{Kind:"Job", Namespace:"default", Name:"trigger-f1-1561364280", UID:"e3a91fe1-ef0e-45dc-8317-5cdcc8c358ad", APIVersion:"batch/v1", ResourceVersion:"1050", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' Created pod: trigger-f1-1561364280-vpnmh
Jun 24 10:20:40 k1 microk8s.daemon-controller-manager[14622]: E0624 10:20:40.120785   14622 job_controller.go:397] Error syncing job: failed pod(s) detected for job key "default/trigger-f1-1561364280"
Jun 24 10:20:40 k1 microk8s.daemon-kubelet[14529]: I0624 10:20:40.197006   14529 reconciler.go:203] operationExecutor.VerifyControllerAttachedVolume started for volume "default-token-vmhxs" (UniqueName: "kubernetes.io/secret/65c2648d-885a-44aa-9aff-310537b8d391-default-token-vmhxs") pod "trigger-f1-1561364280-vpnmh" (UID: "65c2648d-885a-44aa-9aff-310537b8d391")
Jun 24 10:20:40 k1 systemd[1]: Started Kubernetes transient mount for /var/snap/microk8s/common/var/lib/kubelet/pods/65c2648d-885a-44aa-9aff-310537b8d391/volumes/kubernetes.io~secret/default-token-vmhxs.
Jun 24 10:20:40 k1 microk8s.daemon-containerd[14509]: time="2019-06-24T10:20:40.424585656+02:00" level=info msg="RunPodsandbox for &PodSandboxMetadata{Name:trigger-f1-1561364280-vpnmh,Uid:65c2648d-885a-44aa-9aff-310537b8d391,Namespace:default,Attempt:0,}"
Jun 24 10:20:40 k1 kernel: IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
Jun 24 10:20:40 k1 systemd-udevd[21815]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Jun 24 10:20:40 k1 systemd-udevd[21815]: Could not generate persistent MAC address for vethdaf3a14b: No such file or directory
Jun 24 10:20:40 k1 kernel: IPv6: ADDRCONF(NETDEV_UP): vethdaf3a14b: link is not ready
Jun 24 10:20:40 k1 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vethdaf3a14b: link becomes ready
Jun 24 10:20:40 k1 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Jun 24 10:20:40 k1 kernel: cbr0: port 3(vethdaf3a14b) entered blocking state
Jun 24 10:20:40 k1 kernel: cbr0: port 3(vethdaf3a14b) entered disabled state
Jun 24 10:20:40 k1 kernel: device vethdaf3a14b entered promiscuous mode
Jun 24 10:20:40 k1 kernel: cbr0: port 3(vethdaf3a14b) entered blocking state
Jun 24 10:20:40 k1 kernel: cbr0: port 3(vethdaf3a14b) entered forwarding state
Jun 24 10:20:40 k1 microk8s.daemon-containerd[14509]: time="2019-06-24T10:20:40.694784259+02:00" level=info msg="shim containerd-shim started" address="/containerd-shim/k8s.io/06762d92d62e127c82ec8a15680d438e9a483471ac4ecb1a340684423aa4dc94/shim.sock" debug=true pid=21835
Jun 24 10:20:40 k1 microk8s.daemon-containerd[14509]: time="2019-06-24T10:20:40+02:00" level=debug msg="registering ttrpc server"
Jun 24 10:20:40 k1 microk8s.daemon-containerd[14509]: time="2019-06-24T10:20:40+02:00" level=debug msg="serving api on unix socket" socket="[inherited from parent]"
Jun 24 10:20:42 k1 sshd[21716]: Failed password for root from 112.85.42.171 port 46357 ssh2
Jun 24 10:20:42 k1 microk8s.daemon-containerd[14509]: time="2019-06-24T10:20:42.782408207+02:00" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:trigger-f1-1561364280-vpnmh,Uid:65c2648d-885a-44aa-9aff-310537b8d391,Namespace:default,Attempt:0,} returns sandbox id "06762d92d62e127c82ec8a15680d438e9a483471ac4ecb1a340684423aa4dc94""
Jun 24 10:20:42 k1 microk8s.daemon-containerd[14509]: time="2019-06-24T10:20:42.793869260+02:00" level=info msg="CreateContainer within sandbox "06762d92d62e127c82ec8a15680d438e9a483471ac4ecb1a340684423aa4dc94" for container &ContainerMetadata{Name:trigger,Attempt:0,}"
Jun 24 10:20:42 k1 microk8s.daemon-containerd[14509]: time="2019-06-24T10:20:42.870397350+02:00" level=info msg="CreateContainer within sandbox "06762d92d62e127c82ec8a15680d438e9a483471ac4ecb1a340684423aa4dc94" for &ContainerMetadata{Name:trigger,Attempt:0,} returns container id "54eba74f19fe9ccecc926dabfe55cf48d4e1004795234441c8a84032f1c7d46b""
Jun 24 10:20:42 k1 microk8s.daemon-containerd[14509]: time="2019-06-24T10:20:42.871662847+02:00" level=info msg="StartContainer for "54eba74f19fe9ccecc926dabfe55cf48d4e1004795234441c8a84032f1c7d46b""
Jun 24 10:20:42 k1 microk8s.daemon-containerd[14509]: time="2019-06-24T10:20:42.874014187+02:00" level=info msg="shim containerd-shim started" address="/containerd-shim/k8s.io/54eba74f19fe9ccecc926dabfe55cf48d4e1004795234441c8a84032f1c7d46b/shim.sock" debug=true pid=21937
Jun 24 10:20:42 k1 microk8s.daemon-containerd[14509]: time="2019-06-24T10:20:42+02:00" level=debug msg="registering ttrpc server"
Jun 24 10:20:42 k1 microk8s.daemon-containerd[14509]: time="2019-06-24T10:20:42+02:00" level=debug msg="serving api on unix socket" socket="[inherited from parent]"
Jun 24 10:20:44 k1 microk8s.daemon-containerd[14509]: time="2019-06-24T10:20:44.968381513+02:00" level=info msg="TaskOOM event &TaskOOM{ContainerID:06762d92d62e127c82ec8a15680d438e9a483471ac4ecb1a340684423aa4dc94,}"
Jun 24 10:20:44 k1 kernel: runc:[2:INIT] invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=-998
Jun 24 10:20:44 k1 kernel: runc:[2:INIT] cpuset=54eba74f19fe9ccecc926dabfe55cf48d4e1004795234441c8a84032f1c7d46b mems_allowed=0
Jun 24 10:20:44 k1 kernel: CPU: 1 PID: 21967 Comm: runc:[2:INIT] Not tainted 4.15.0-50-generic #54-Ubuntu
Jun 24 10:20:44 k1 kernel: Hardware name: Hetzner vServer, BIOS 20171111 11/11/2017
Jun 24 10:20:44 k1 kernel: Call Trace:
Jun 24 10:20:44 k1 kernel:  dump_stack+0x63/0x8b
Jun 24 10:20:44 k1 kernel:  dump_header+0x71/0x285
Jun 24 10:20:44 k1 kernel:  oom_kill_process+0x220/0x440
Jun 24 10:20:44 k1 kernel:  out_of_memory+0x2d1/0x4f0
Jun 24 10:20:44 k1 kernel:  mem_cgroup_out_of_memory+0x4b/0x80
Jun 24 10:20:44 k1 kernel:  mem_cgroup_oom_synchronize+0x2e8/0x320
Jun 24 10:20:44 k1 kernel:  ? mem_cgroup_css_online+0x40/0x40
Jun 24 10:20:44 k1 kernel:  pagefault_out_of_memory+0x36/0x7b
Jun 24 10:20:44 k1 kernel:  mm_fault_error+0x90/0x180
Jun 24 10:20:44 k1 kernel:  __do_page_fault+0x4a5/0x4d0
Jun 24 10:20:44 k1 kernel:  do_page_fault+0x2e/0xe0
Jun 24 10:20:44 k1 kernel:  ? async_page_fault+0x2f/0x50
Jun 24 10:20:44 k1 kernel:  do_async_page_fault+0x51/0x80
Jun 24 10:20:44 k1 kernel:  async_page_fault+0x45/0x50
Jun 24 10:20:44 k1 kernel: RIP: 0033:0x560f61a7721f
Jun 24 10:20:44 k1 kernel: RSP: 002b:000000c420145f78 EFLAGS: 00010206
Jun 24 10:20:44 k1 kernel: RAX: 000000c420145ff8 RBX: 000000c4200ac988 RCX: 000000c420000180
Jun 24 10:20:44 k1 kernel: RDX: 0000000000000000 RSI: 0000560f61f38440 RDI: 000000c420146388
Jun 24 10:20:44 k1 kernel: RBP: 000000c4201469f0 R08: 0000000000000000 R09: 0000000000000000
Jun 24 10:20:44 k1 kernel: R10: 0000560f61c91eb6 R11: 0000000000000010 R12: 0000000000000000
Jun 24 10:20:44 k1 kernel: R13: 00000000000000f1 R14: 0000000000000011 R15: 0000000000000000
Jun 24 10:20:44 k1 kernel: Task in /kubepods/pod65c2648d-885a-44aa-9aff-310537b8d391/54eba74f19fe9ccecc926dabfe55cf48d4e1004795234441c8a84032f1c7d46b killed as a result of limit of /kubepods/pod65c2648d-885a-44aa-9aff-310537b8d391      
Jun 24 10:20:44 k1 kernel: memory: usage 4096kB, limit 4096kB, failcnt 43
Jun 24 10:20:44 k1 kernel: memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
Jun 24 10:20:44 k1 kernel: kmem: usage 844kB, limit 9007199254740988kB, failcnt 0
Jun 24 10:20:44 k1 kernel: Memory cgroup stats for /kubepods/pod65c2648d-885a-44aa-9aff-310537b8d391: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
Jun 24 10:20:44 k1 kernel: Memory cgroup stats for /kubepods/pod65c2648d-885a-44aa-9aff-310537b8d391/06762d92d62e127c82ec8a15680d438e9a483471ac4ecb1a340684423aa4dc94: cache:0KB rss:44KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:44KB inactive_file:0KB active_file:0KB unevictable:0KB
Jun 24 10:20:44 k1 kernel: Memory cgroup stats for /kubepods/pod65c2648d-885a-44aa-9aff-310537b8d391/54eba74f19fe9ccecc926dabfe55cf48d4e1004795234441c8a84032f1c7d46b: cache:0KB rss:3208KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:3208KB inactive_file:0KB active_file:0KB unevictable:0KB
Jun 24 10:20:44 k1 kernel: [ pid ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Jun 24 10:20:44 k1 kernel: [21863]     0 21863      256        1    32768        0          -998 pause
Jun 24 10:20:44 k1 kernel: [21967]     0 21967    15559     1732   126976        0          -998 runc:[2:INIT]
Jun 24 10:20:44 k1 kernel: Memory cgroup out of memory: Kill process 21967 (runc:[2:INIT]) score 696 or sacrifice child
Jun 24 10:20:44 k1 kernel: Killed process 21967 (runc:[2:INIT]) total-vm:62236kB, anon-rss:3032kB, file-rss:3896kB, shmem-rss:0kB

@andresmgot
Copy link
Contributor

It seems that the container is being killed because an OOM issue (which is weird because that container is just doing a curl request).

https://stackoverflow.com/questions/55304976/oom-killer-docker

It's a long shot but this may be related to the fact that the container is running as root. To prove/discard that can you try to create a similar pod but specifying a securityContext and changing the running UID? Something like:

securityContext:
  runAsUser: 1001

@mehdisadeghi
Copy link
Author

@andresmgot would you mind describing where should I add the yaml snippet? I'm totally new to k8s and kubeless!

Moreover, in a new vm, I installed everything with a normal user and finally I got the same error. btw, can you reproduce this in microk8s install? In an Ubuntu vm it'd b sudo snap install microk8s --classic.

@andresmgot
Copy link
Contributor

I will try to reproduce it. In the meantime you can try Minikube that is a supported environment.

@mehdisadeghi
Copy link
Author

mehdisadeghi commented Jun 24, 2019

@andresmgot thanks for giving it a try!

I have already tried Minikube but it does not work with emulated CPUs which the vm provider provides. Moreover it does not work with the tiny vms that I use (1core, 2gb ram).

@andresmgot
Copy link
Contributor

Have you tried https://github.com/kubernetes-sigs/kind ? It's also a lightweight Kubernetes distribution and I have tested Kubeless in that platform before.

@mehdisadeghi
Copy link
Author

mehdisadeghi commented Jun 24, 2019

I haven't tried that yet. In fact I can run Kubeless on top of Kubernetes with scheduling inside a VM with 2cores and 4gb of ram. Here is my Ansible-Playbook for that. What I am trying to do, is fitting everything inside a 1core,2gb instance.

@andresmgot
Copy link
Contributor

I found the problem with this. I was able to reproduce the issue using the latest Minikube version. The problem was that the memory limit set is too low. This commit fixes it:

vmware-archive/cronjob-trigger@61e04ea

It will be available in the next release.

@mehdisadeghi
Copy link
Author

mehdisadeghi commented Jun 27, 2019

@andresmgot I'll try with the git version and report back. Thank you very much!

@mehdisadeghi
Copy link
Author

mehdisadeghi commented Jul 3, 2019

@andresmgot I tried scheduling functions with the new v1.0.4 release and it worked perfectly. Thank you!

P.S for future vistors: I had to enable dns on microk8s first microk8s.enable dns.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants