Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operator crash if one container in pod not created properly #1104

Closed
yufengshan opened this issue Nov 13, 2019 · 6 comments · Fixed by #1107
Closed

Operator crash if one container in pod not created properly #1104

yufengshan opened this issue Nov 13, 2019 · 6 comments · Fixed by #1107
Assignees
Labels
Milestone

Comments

@yufengshan
Copy link

Component: Operator, Version: 0.5.1-SNAPSHOT

When deploy a SeldonDep, if one of the container in pod fails to be created, the operator panic and will keep rebooting, until the deployment is completely removed.

========================
Crash log looks like below:

2019-11-13T19:45:44.464Z INFO controllers.SeldonDeployment pSvcName {"seldondeployment": "prehac-mlflow-artifact/mlflow-binary-no-orchestrator", "val": "se
ldon-9545cbc497aba25c3cb921fc8df42d7f"}
2019-11-13T19:45:44.464Z INFO controllers.SeldonDeployment Not creating container service for output-transformer {"seldondeployment": "prehac-mlflow-artifact/ml
flow-binary-no-orchestrator"}
E1113 19:45:44.464932 1 runtime.go:69] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer derefe
rence)
/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190404173353-6a84e37a896d/pkg/util/runtime/runtime.go:76
/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190404173353-6a84e37a896d/pkg/util/runtime/runtime.go:65
/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190404173353-6a84e37a896d/pkg/util/runtime/runtime.go:51
/usr/local/go/src/runtime/panic.go:522
/usr/local/go/src/runtime/panic.go:82
/usr/local/go/src/runtime/signal_unix.go:390
/workspace/controllers/seldondeployment_controller.go:366
/workspace/controllers/seldondeployment_controller.go:1175
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.2.0/pkg/internal/controller/controller.go:216
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.2.0/pkg/internal/controller/controller.go:192
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.2.0/pkg/internal/controller/controller.go:171
/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190404173353-6a84e37a896d/pkg/util/wait/wait.go:152
/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190404173353-6a84e37a896d/pkg/util/wait/wait.go:153
/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190404173353-6a84e37a896d/pkg/util/wait/wait.go:88
/usr/local/go/src/runtime/asm_amd64.s:1337
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x120 pc=0x117bcf9]
goroutine 188 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190404173353-6a84e37a896d/pkg/util/runtime/runtime.go:58 +0x105
panic(0x12ddce0, 0x2160910)
/usr/local/go/src/runtime/panic.go:522 +0x1b5
github.com/seldonio/seldon-core/operator/controllers.createComponents(0xc000284810, 0xc001695ba0, 0x165af00, 0xc001322760, 0x16, 0xc0003fb7a0, 0x1d)
/workspace/controllers/seldondeployment_controller.go:366 +0x799
github.com/seldonio/seldon-core/operator/controllers.(*SeldonDeploymentReconciler).Reconcile(0xc000284810, 0xc0003fb7c0, 0x16, 0xc0003fb7a0, 0x1d, 0x2174b40, 0x42bd21, 0x162f1
20, 0xc00164bd88)
/workspace/controllers/seldondeployment_controller.go:1175 +0x2e3
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc00010a0a0, 0x1326c00, 0xc00000c380, 0x1326c00)
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.2.0/pkg/internal/controller/controller.go:216 +0x149
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc00010a0a0, 0xc00149a600)
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.2.0/pkg/internal/controller/controller.go:192 +0xb5
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker(0xc00010a0a0)

@ukclivecox ukclivecox added the bug label Nov 14, 2019
@ukclivecox ukclivecox added this to the 0.5.x milestone Nov 14, 2019
@ukclivecox
Copy link
Contributor

Can you provide yaml to reproduce this?

@ryandawsonuk
Copy link
Contributor

ryandawsonuk commented Nov 14, 2019

If it's the latest snapshot then the panic seems to be at (have referenced a particular commit as that's what the snapshot currently is and master will move on)

port := int(svc.Spec.Ports[0].Port)

Based on "Not creating container service for output-transformer" I take it there's an output-transformer in the graph?

@yufengshan
Copy link
Author

yufengshan commented Nov 14, 2019

Hi Clive and Ryan,

Thanks for response.

Below is one yaml file that crashes the operator. We have another case which can also crash the operator. That one also have two images (containers). The error message is very similar ( The error log is placed after the yaml in the bottom)

================== yaml =====================

apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  labels:
    app: seldon
  name: mlflow-binary-no-orchestrator
  namespace: prehac-mlflow-artifact
spec:
  #annotations:
  #  seldon.io/headless-svc: "true"
  name: mlflow-binary-no-orchestrator
  predictors:
  - annotations:
      seldon.io/no-engine: "true"
    componentSpecs:
    - spec:
        containers:
        - image: mlflow_prehac_artifact_model:0.4
          imagePullPolicy: Always
          name: mlflow-prehac-artifact-model
          resources:
            limits:
              cpu: 1
              memory: 2Gi
            requests:
              cpu: 1
              memory: 256Mi
        - image: transformer:0.1
          imagePullPolicy: Always
          name: output-transformer
          resources:
            limits:
              cpu: 1
              memory: 2Gi
            requests:
              cpu: 1
              memory: 256Mi

    svcOrchSpec:
      env:
        - name: JAVA_OPTS
          value: -server -Xms512m -Xmx512m -XX:+AlwaysPreTouch -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=20 -XX:+UseG1GC -XX:MaxGCPauseMillis=10 -XX:GCLogFileSize=10485760 -XX:NumberOfGCLogFiles=1 -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+UseGCLogFileRotation -Xloggc:/tmp/gc.log -XX:+UseTLAB -XX:+DisableExplicitGC
        - name: SELDON_LOG_LEVEL
          value: DEBUG

      resources:
        limits:
          cpu: 2
          memory: 4Gi
        requests:
          cpu: 500m
          memory: 1Gi
    #graph:
    #  children: []
    #  endpoint:
    #    type: GRPC
    #  name: mlflow-prehac-artifact-model
    #  type: MODEL
    #name: artifact

    graph:

      children: []

      name: mlflow-prehac-artifact-model
      endpoint:
        type: REST
      type: MODEL
    name: artifact

    replicas: 1

=================== error log for another SeldonDep ==================

2019-11-13T15:18:31.533Z        INFO    controllers.SeldonDeployment    Not creating container service for scribe       {"seldondep
loyment": "exp-image-attractivenessexp-image-attractiveness"}
E1113 15:18:31.533740       1 runtime.go:69] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error:
invalid memory address or nil pointer dereference)
/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190404173353-6a84e37a896d/pkg/util/runtime/runtime.go:76
/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190404173353-6a84e37a896d/pkg/util/runtime/runtime.go:65
/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190404173353-6a84e37a896d/pkg/util/runtime/runtime.go:51
/usr/local/go/src/runtime/panic.go:522
/usr/local/go/src/runtime/panic.go:82
/usr/local/go/src/runtime/signal_unix.go:390
/workspace/controllers/seldondeployment_controller.go:366
/workspace/controllers/seldondeployment_controller.go:1175
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.2.0/pkg/internal/controller/controller.go:216
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.2.0/pkg/internal/controller/controller.go:192
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.2.0/pkg/internal/controller/controller.go:171
/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190404173353-6a84e37a896d/pkg/util/wait/wait.go:152
/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190404173353-6a84e37a896d/pkg/util/wait/wait.go:153
/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190404173353-6a84e37a896d/pkg/util/wait/wait.go:88
/usr/local/go/src/runtime/asm_amd64.s:1337
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x120 pc=0x117bcf9]
goroutine 339 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
        /go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190404173353-6a84e37a896d/pkg/util/runtime/runtime.go:58 +0x105
panic(0x12ddce0, 0x2160910)
        /usr/local/go/src/runtime/panic.go:522 +0x1b5
github.com/seldonio/seldon-core/operator/controllers.createComponents(0xc0004ce120, 0xc000343040, 0x165af00, 0xc001653320, 0xb, 0xc
0002e0c60, 0x30)
        /workspace/controllers/seldondeployment_controller.go:366 +0x799
github.com/seldonio/seldon-core/operator/controllers.(*SeldonDeploymentReconciler).Reconcile(0xc0004ce120, 0xc000572620, 0xb, 0xc00
02e0c60, 0x30, 0x2174b40, 0x42bd21, 0x162f120, 0xc0015ebd88)
        /workspace/controllers/seldondeployment_controller.go:1175 +0x2e3
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc000340140, 0x1326c00, 0xc000672240, 0x1326
c00)
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.2.0/pkg/internal/controller/controller.go:216 +0x149
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc000340140, 0xc00145aa00)
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.2.0/pkg/internal/controller/controller.go:192 +0xb5
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker(0xc000340140)
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.2.0/pkg/internal/controller/controller.go:171 +0x2b
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc0013f5680)
        /go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190404173353-6a84e37a896d/pkg/util/wait/wait.go:152 +0x54
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0013f5680, 0x3b9aca00, 0x0, 0x1, 0xc0000e4120)
        /go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190404173353-6a84e37a896d/pkg/util/wait/wait.go:153 +0xf8
k8s.io/apimachinery/pkg/util/wait.Until(0xc0013f5680, 0x3b9aca00, 0xc0000e4120)
        /go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190404173353-6a84e37a896d/pkg/util/wait/wait.go:88 +0x4d
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.2.0/pkg/internal/controller/controller.go:157 +0x311

@ryandawsonuk
Copy link
Contributor

Interesting, looks like the code flow goes through a line commented 'a user-supplied container may not be a pu so we may not create service for that' and then it goes on to reference the service, which is naturally nil and so it blows. Maybe it needs a continue there to not do that and instead move on to the next container. Will look into this a bit further.

@ryandawsonuk
Copy link
Contributor

ryandawsonuk commented Nov 15, 2019

@yufengshan I notice your output-transformer is not part of the seldon graph defintion. Is that intentional? Are you calling the output-transformer directly from your model code? Just double-checking (either way what you report is a bug and we will be publishing a new snapshot with a fix).

@yufengshan
Copy link
Author

yufengshan commented Nov 15, 2019

@ryandawsonuk Thanks Ryan for the quick response. 👍 , looking forward to testing the new image. The transformer is directly called from the model code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants