Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model pod enters in CrashLoopBackOff. How to debug? #376

Closed
denisb411 opened this issue Jan 10, 2019 · 1 comment
Closed

Model pod enters in CrashLoopBackOff. How to debug? #376

denisb411 opened this issue Jan 10, 2019 · 1 comment

Comments

@denisb411
Copy link

Model pod enters in CrashLoopBackOff and I don't know what's causing it.

How can I debug what's happening with the image pod created by Seldon? (used s2i to build)

The pod describe:

kubectl describe pod/mnist-model-single-model-183e651-7c6c48d887-49bkr

Name:           mnist-model-single-model-183e651-7c6c48d887-49bkr
Namespace:      default
Node:           minikube/172.25.60.17
Start Time:     Thu, 10 Jan 2019 19:29:31 +0000
Labels:         app=mnist-model-single-model-183e651
                pod-template-hash=3727048443
                seldon-app=mnist-model
                seldon-app-classifier=mnist-model-single-model-classifier-mnist-model-0-1
                seldon-deployment-id=mnist-model
Annotations:    deployment_version: v1
                predictor_version: v1
                project_name: Tensorflow MNIST
                prometheus.io/path: /prometheus
                prometheus.io/port: 8000
                prometheus.io/scrape: true
Status:         Running
IP:             172.17.0.23
Controlled By:  ReplicaSet/mnist-model-single-model-183e651-7c6c48d887
Containers:
  classifier:
    Container ID:   docker://cd0e6c6ab941e5790e166b14570bdf6139730048fae6ae5d61fdbcb9c8237518
    Image:          mnist-model:0.1
    Image ID:       docker://sha256:75ed2467bd3eed44bf56d2a3297684ea51ad0e5201c7041003bad6821e15f7bd
    Port:           9000/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Thu, 10 Jan 2019 19:32:42 +0000
      Finished:     Thu, 10 Jan 2019 19:32:43 +0000
    Ready:          False
    Restart Count:  5
    Requests:
      memory:   1Mi
    Liveness:   tcp-socket :http delay=60s timeout=1s period=5s #success=1 #failure=3
    Readiness:  tcp-socket :http delay=20s timeout=1s period=5s #success=1 #failure=3
    Environment:
      PREDICTIVE_UNIT_SERVICE_PORT:  9000
      PREDICTIVE_UNIT_PARAMETERS:    []
      PREDICTIVE_UNIT_ID:            classifier
      PREDICTOR_ID:                  single-model
      SELDON_DEPLOYMENT_ID:          mnist-model
    Mounts:
      /etc/podinfo from podinfo (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-hwr4b (ro)
  seldon-container-engine:
    Container ID:   docker://04c9e9152c29e02c2f3aca665da1c114bc00f73f2f91ea380995dad9946c043b
    Image:          seldonio/engine:0.2.6-SNAPSHOT
    Image ID:       docker-pullable://seldonio/engine@sha256:f09c50d4766af29c3d7b7dfff1235e7b1ac708d0fcc67d083d87399649fed0de
    Ports:          8000/TCP, 8082/TCP, 9090/TCP
    Host Ports:     0/TCP, 0/TCP, 0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    143
      Started:      Thu, 10 Jan 2019 19:32:55 +0000
      Finished:     Thu, 10 Jan 2019 19:33:35 +0000
    Ready:          False
    Restart Count:  5
    Requests:
      cpu:      100m
    Liveness:   http-get http://:admin/ready delay=20s timeout=2s period=5s #success=1 #failure=3
    Readiness:  http-get http://:admin/ready delay=20s timeout=2s period=1s #success=1 #failure=1
    Environment:
      ENGINE_PREDICTOR:         eyJuYW1lIjoic2luZ2xlLW1vZGVsIiwiZ3JhcGgiOnsibmFtZSI6ImNsYXNzaWZpZXIiLCJ0eXBlIjoiTU9ERUwiLCJlbmRwb2ludCI6eyJzZXJ2aWNlX2hvc3QiOiJsb2NhbGhvc3QiLCJzZXJ2aWNlX3BvcnQiOjkwMDAsInR5cGUiOiJSRVNUIn19LCJjb21wb25lbnRTcGVjcyI6W3sibWV0YWRhdGEiOnsibGFiZWxzIjp7InNlbGRvbi1hcHAtY2xhc3NpZmllciI6Im1uaXN0LW1vZGVsLXNpbmdsZS1tb2RlbC1jbGFzc2lmaWVyLW1uaXN0LW1vZGVsLTAtMSJ9fSwic3BlYyI6eyJjb250YWluZXJzIjpbeyJuYW1lIjoiY2xhc3NpZmllciIsImltYWdlIjoibW5pc3QtbW9kZWw6MC4xIiwicG9ydHMiOlt7Im5hbWUiOiJodHRwIiwiY29udGFpbmVyUG9ydCI6OTAwMH1dLCJlbnYiOlt7Im5hbWUiOiJQUkVESUNUSVZFX1VOSVRfU0VSVklDRV9QT1JUIiwidmFsdWUiOiI5MDAwIn0seyJuYW1lIjoiUFJFRElDVElWRV9VTklUX1BBUkFNRVRFUlMiLCJ2YWx1ZSI6IltdIn0seyJuYW1lIjoiUFJFRElDVElWRV9VTklUX0lEIiwidmFsdWUiOiJjbGFzc2lmaWVyIn0seyJuYW1lIjoiUFJFRElDVE9SX0lEIiwidmFsdWUiOiJzaW5nbGUtbW9kZWwifSx7Im5hbWUiOiJTRUxET05fREVQTE9ZTUVOVF9JRCIsInZhbHVlIjoibW5pc3QtbW9kZWwifV0sInJlc291cmNlcyI6eyJyZXF1ZXN0cyI6eyJtZW1vcnkiOiIxTWkifX0sInZvbHVtZU1vdW50cyI6W3sibmFtZSI6InBvZGluZm8iLCJyZWFkT25seSI6dHJ1ZSwibW91bnRQYXRoIjoiL2V0Yy9wb2RpbmZvIn1dLCJsaXZlbmVzc1Byb2JlIjp7ImhhbmRsZXIiOnsidGNwU29ja2V0Ijp7InBvcnQiOiJodHRwIn19LCJpbml0aWFsRGVsYXlTZWNvbmRzIjo2MCwicGVyaW9kU2Vjb25kcyI6NX0sInJlYWRpbmVzc1Byb2JlIjp7ImhhbmRsZXIiOnsidGNwU29ja2V0Ijp7InBvcnQiOiJodHRwIn19LCJpbml0aWFsRGVsYXlTZWNvbmRzIjoyMCwicGVyaW9kU2Vjb25kcyI6NX0sImxpZmVjeWNsZSI6eyJwcmVTdG9wIjp7ImV4ZWMiOnsiY29tbWFuZCI6WyIvYmluL3NoIiwiLWMiLCIvYmluL3NsZWVwIDUiXX19fSwiaW1hZ2VQdWxsUG9saWN5IjoiSWZOb3RQcmVzZW50In1dLCJ0ZXJtaW5hdGlvbkdyYWNlUGVyaW9kU2Vjb25kcyI6MjB9fV0sInJlcGxpY2FzIjoxLCJhbm5vdGF0aW9ucyI6eyJwcmVkaWN0b3JfdmVyc2lvbiI6InYxIn19
      DEPLOYMENT_NAME:          mnist-model
      ENGINE_SERVER_PORT:       8000
      ENGINE_SERVER_GRPC_PORT:  5001
      JAVA_OPTS:                -Dcom.sun.management.jmxremote.rmi.port=9090 -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=9090 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.local.only=false -Djava.rmi.server.hostname=127.0.0.1
    Mounts:
      /etc/podinfo from podinfo (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-hwr4b (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          False 
  PodScheduled   True 
Volumes:
  podinfo:
    Type:  DownwardAPI (a volume populated by information about the pod)
    Items:
      metadata.annotations -> annotations
  default-token-hwr4b:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-hwr4b
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                 Age                    From               Message
  ----     ------                 ----                   ----               -------
  Normal   Scheduled              4m57s                  default-scheduler  Successfully assigned mnist-model-single-model-183e651-7c6c48d887-49bkr to minikube
  Normal   SuccessfulMountVolume  4m56s                  kubelet, minikube  MountVolume.SetUp succeeded for volume "default-token-hwr4b"
  Normal   SuccessfulMountVolume  4m56s                  kubelet, minikube  MountVolume.SetUp succeeded for volume "podinfo"
  Normal   Pulled                 4m55s                  kubelet, minikube  Container image "seldonio/engine:0.2.6-SNAPSHOT" already present on machine
  Normal   Created                4m55s                  kubelet, minikube  Created container
  Normal   Started                4m55s                  kubelet, minikube  Started container
  Normal   Pulled                 4m34s (x3 over 4m55s)  kubelet, minikube  Container image "mnist-model:0.1" already present on machine
  Normal   Created                4m34s (x3 over 4m55s)  kubelet, minikube  Created container
  Normal   Started                4m34s (x3 over 4m55s)  kubelet, minikube  Started container
  Warning  Unhealthy              4m34s                  kubelet, minikube  Liveness probe failed: HTTP probe failed with statuscode: 503
  Warning  BackOff                4m31s (x4 over 4m48s)  kubelet, minikube  Back-off restarting failed container
  Warning  Unhealthy              4m30s (x6 over 4m35s)  kubelet, minikube  Readiness probe failed: HTTP probe failed with statuscode: 503

When I try to use kubectl's log command it returns:

kubectl logs  -p pod/mnist-model-single-model-183e651-7c6c48d887-49bkr

Error from server (BadRequest): a container name must be specified for pod mnist-model-single-model-183e651-7c6c48d887-49bkr, choose one of: [classifier seldon-container-engine]
@ukclivecox
Copy link
Contributor

Yes, you should use kubectl logs for example:

kubectl logs  mnist-model-single-model-183e651-7c6c48d887-49bkr classifier

You were missing which container to get logs from.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants