Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SeldonPodSpec in SeldonDeployment V1alpha and V1 in seldon v1.4 is not parsing metadata successfully #2983

Closed
Isaacwhyuenac opened this issue Feb 23, 2021 · 1 comment
Labels

Comments

@Isaacwhyuenac
Copy link

Isaacwhyuenac commented Feb 23, 2021

type SeldonPodSpec struct {
	Metadata metav1.ObjectMeta       `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"`
	Spec     v1.PodSpec              `json:"spec,omitempty" protobuf:"bytes,2,opt,name=spec"`
	HpaSpec  *SeldonHpaSpec          `json:"hpaSpec,omitempty" protobuf:"bytes,3,opt,name=hpaSpec"`
	Replicas *int32                  `json:"replicas,omitempty" protobuf:"bytes,4,opt,name=replicas"`
	KedaSpec *SeldonScaledObjectSpec `json:"kedaSpec,omitempty" protobuf:"bytes,5,opt,name=kedaSpec"`
	PdbSpec  *SeldonPdbSpec          `json:"pdbSpec,omitempty" protobuf:"bytes,6,opt,name=pdbSpec"`
}

Examples SeldonDeployment yaml

apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: main
spec:
  annotations:
    iam.amazonaws.com/role: xxxx
    seldon.io/rest-connect-retries: "1"
#    seldon.io/rest-connection-timeout: "10000"
#    seldon.io/rest-read-timeout: "10000"
  predictors:
  - name: service
    replicas: 1
    svcOrchSpec:
      env:
      - name: SELDON_LOG_LEVEL
        value: ERROR
    graph:
      children:
      - endpoint:
          type: REST
        name: main
        type: MODEL
      - endpoint:
          type: REST
        name: green
        type: MODEL
      endpoint:
        type: REST
      name: router
      type: ROUTER

    componentSpecs:
    # Router spec
    - metadata:
        annotations:
          reloader.stakater.com/auto: "true"
      spec:
        containers:
        - name: router
          image: xxxxxxx:v10-build
          imagePullPolicy: Always
          env:
          - name: ROUTER_MODE
            value: uid
          - name: MODEL__1__CONFIG
            value: |-
              {
                "buckets": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
                "whitelist": ["51B3B82F-3629-4699-9D8B-07091C365C41"],
                "blacklist": ["db3ba7e2-e3dd-4f8d-b39d-d60004978807"]
              }

      # HPA
      hpaSpec:
        maxReplicas: 4
        metrics:
        - resource:
            name: cpu
            targetAverageUtilization: 60
          type: Resource
        minReplicas: 2

    # Model spec
    - metadata:
        annotations:
          restartedAt: ""
      spec:
        containers:
        - name: green
          image: xxx:v21-build-serving
          imagePullPolicy: Always
          env:
          - name: AWS_DEFAULT_REGION
            value: "us-west-2"
          - name: AWS_METADATA_SERVICE_TIMEOUT
            value: "3"
          - name: AWS_METADATA_SERVICE_NUM_ATTEMPTS
            value: "3"

          # Additional env
          - name: AWS_METADATA_SERVICE_TIMEOUT
            value: "3"
          - name: AWS_METADATA_SERVICE_NUM_ATTEMPTS
            value: "3"
          - name: AWS_DEFAULT_REGION
            value: us-west-2
          - name: ASSETS_BASE_PATH
            value: assets
          - name: MODELS_BASE_PATH
            value: s3://xxx

          # Additional spec
          livenessProbe:
            failureThreshold: 10
            httpGet:
              path: /health/status
              port: http
              scheme: HTTP
            initialDelaySeconds: 30
            periodSeconds: 15
            successThreshold: 1
            timeoutSeconds: 1
          readinessProbe:
            failureThreshold: 10
            httpGet:
              path: /health/status
              port: http
              scheme: HTTP
            initialDelaySeconds: 30
            periodSeconds: 5
            successThreshold: 1
            timeoutSeconds: 1
          resources:
            limits:
              memory: 4Gi
            requests:
              cpu: 250m
              memory: 1800Mi

        affinity:
          podAntiAffinity:
            preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                topologyKey: kubernetes.io/hostname
                labelSelector:
                  matchExpressions:
                  - key: seldon-app-svc
                    operator: In
                    values:
                    - main-service-green

      # HPA
      hpaSpec:
        maxReplicas: 50
        metrics:
        - resource:
            name: cpu
            targetAverageUtilization: 60
          type: Resource
        minReplicas: 2
    - metadata:
        annotations:
          restartedAt: ""
      spec:
        containers:
        - name: main
          image: xxxxxxx:v28-build-serving
          imagePullPolicy: Always
          env:
          - name: AWS_DEFAULT_REGION
            value: "us-west-2"
          - name: AWS_METADATA_SERVICE_TIMEOUT
            value: "3"
          - name: AWS_METADATA_SERVICE_NUM_ATTEMPTS
            value: "3"

          # Additional env
          - name: AWS_METADATA_SERVICE_TIMEOUT
            value: "3"
          - name: AWS_METADATA_SERVICE_NUM_ATTEMPTS
            value: "3"
          - name: AWS_DEFAULT_REGION
            value: us-west-2
          - name: ASSETS_BASE_PATH
            value: assets
          - name: MODELS_BASE_PATH
            value: s3://xxx

          # Additional spec
          livenessProbe:
            failureThreshold: 10
            httpGet:
              path: /health/status
              port: http
              scheme: HTTP
            initialDelaySeconds: 30
            periodSeconds: 15
            successThreshold: 1
            timeoutSeconds: 1
          readinessProbe:
            failureThreshold: 10
            httpGet:
              path: /health/status
              port: http
              scheme: HTTP
            initialDelaySeconds: 30
            periodSeconds: 5
            successThreshold: 1
            timeoutSeconds: 1
          resources:
            limits:
              cpu: "2"
              memory: 4Gi
            requests:
              cpu: 250m
              memory: 1800Mi

        affinity:
          podAntiAffinity:
            preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                topologyKey: kubernetes.io/hostname
                labelSelector:
                  matchExpressions:
                  - key: seldon-app-svc
                    operator: In
                    values:
                    - main-service-main

      # HPA
      hpaSpec:
        maxReplicas: 50
        metrics:
        - resource:
            name: cpu
            targetAverageUtilization: 60
          type: Resource
        minReplicas: 2

To reproduce

kubectl apply -f <the-above-config.yaml>

Expected behaviour

kubectl --namespace <namespace> get sdep main -o yaml should include metadata in .spec.predictors.componentSpec.[].metadata, but in seldon v1.4 the metadata is not included.

Need the field to patch new pods when new s3 files are ready.

apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  annotations:
    meta.helm.sh/release-name: test
    meta.helm.sh/release-namespace: default
  creationTimestamp: "2021-02-23T10:03:03Z"
  generation: 1
  labels:
    app.kubernetes.io/managed-by: Helm
  managedFields:
  - apiVersion: machinelearning.seldon.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:meta.helm.sh/release-name: {}
          f:meta.helm.sh/release-namespace: {}
        f:labels:
          .: {}
          f:app.kubernetes.io/managed-by: {}
      f:spec:
        .: {}
        f:annotations:
          .: {}
          f:iam.amazonaws.com/role: {}
          f:seldon.io/rest-connect-retries: {}
    manager: Go-http-client
    operation: Update
    time: "2021-02-23T10:03:03Z"
  - apiVersion: machinelearning.seldon.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:spec:
        f:predictors: {}
      f:status:
        .: {}
        f:address:
          .: {}
          f:url: {}
        f:deploymentStatus:
          .: {}
          f:main-service-0-router:
            .: {}
            f:availableReplicas: {}
            f:replicas: {}
          f:main-service-1-green:
            .: {}
            f:availableReplicas: {}
            f:replicas: {}
          f:main-service-2-main:
            .: {}
            f:availableReplicas: {}
            f:replicas: {}
        f:replicas: {}
        f:serviceStatus:
          .: {}
          f:main-service:
            .: {}
            f:grpcEndpoint: {}
            f:httpEndpoint: {}
            f:svcName: {}
          f:main-service-green:
            .: {}
            f:httpEndpoint: {}
            f:svcName: {}
          f:main-service-main:
            .: {}
            f:httpEndpoint: {}
            f:svcName: {}
          f:main-service-router:
            .: {}
            f:httpEndpoint: {}
            f:svcName: {}
        f:state: {}
    manager: manager
    operation: Update
    time: "2021-02-23T10:04:31Z"
  name: main
  namespace: default
  resourceVersion: "26817957"
  selfLink: /apis/machinelearning.seldon.io/v1/namespaces/default/seldondeployments/main
  uid: e0bd56a3-40bf-4761-be1b-7a5cbce3861c
spec:
  annotations:
    iam.amazonaws.com/role: arn:aws:iam::xxxxxx
    seldon.io/rest-connect-retries: "1"
  predictors:
  - componentSpecs:
    - hpaSpec:
        maxReplicas: 4
        metrics:
        - resource:
            name: cpu
            targetAverageUtilization: 60
          type: Resource
        minReplicas: 2
      metadata: {}
      spec:
        containers:
        - env:
          - name: ROUTER_MODE
            value: uid
          - name: MODEL__1__CONFIG
            value: |-
              {
                "buckets": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
                "whitelist": ["51B3B82F-3629-4699-9D8B-07091C365C41"],
                "blacklist": ["db3ba7e2-e3dd-4f8d-b39d-d60004978807"]
              }
          image: xxxxxx:v10-build
          imagePullPolicy: Always
          name: router
          ports:
          - containerPort: 6000
            name: metrics
            protocol: TCP
          resources: {}
          volumeMounts:
          - mountPath: /etc/podinfo
            name: seldon-podinfo
    - hpaSpec:
        maxReplicas: 50
        metrics:
        - resource:
            name: cpu
            targetAverageUtilization: 60
          type: Resource
        minReplicas: 2
      metadata: {}
      spec:
        affinity:
          podAntiAffinity:
            preferredDuringSchedulingIgnoredDuringExecution:
            - podAffinityTerm:
                labelSelector:
                  matchExpressions:
                  - key: seldon-app-svc
                    operator: In
                    values:
                    - main-service-green
                topologyKey: kubernetes.io/hostname
              weight: 100
        containers:
        - env:
          - name: AWS_DEFAULT_REGION
            value: us-west-2
          - name: AWS_METADATA_SERVICE_TIMEOUT
            value: "3"
          - name: AWS_METADATA_SERVICE_NUM_ATTEMPTS
            value: "3"
          - name: AWS_METADATA_SERVICE_TIMEOUT
            value: "3"
          - name: AWS_METADATA_SERVICE_NUM_ATTEMPTS
            value: "3"
          - name: AWS_DEFAULT_REGION
            value: us-west-2
          - name: ASSETS_BASE_PATH
            value: assets
          - name: MODELS_BASE_PATH
            value: s3://xxxxxx
          image: xxxxx:v21-build-serving
          imagePullPolicy: Always
          livenessProbe:
            failureThreshold: 10
            httpGet:
              path: /health/status
              port: http
              scheme: HTTP
            initialDelaySeconds: 30
            periodSeconds: 15
            successThreshold: 1
            timeoutSeconds: 1
          name: green
          ports:
          - containerPort: 6001
            name: metrics
            protocol: TCP
          readinessProbe:
            failureThreshold: 10
            httpGet:
              path: /health/status
              port: http
              scheme: HTTP
            initialDelaySeconds: 30
            periodSeconds: 5
            successThreshold: 1
            timeoutSeconds: 1
          resources:
            limits:
              memory: 4Gi
            requests:
              cpu: 250m
              memory: 1800Mi
          volumeMounts:
          - mountPath: /etc/podinfo
            name: seldon-podinfo
    - hpaSpec:
        maxReplicas: 50
        metrics:
        - resource:
            name: cpu
            targetAverageUtilization: 60
          type: Resource
        minReplicas: 2
      metadata: {}
      spec:
        affinity:
          podAntiAffinity:
            preferredDuringSchedulingIgnoredDuringExecution:
            - podAffinityTerm:
                labelSelector:
                  matchExpressions:
                  - key: seldon-app-svc
                    operator: In
                    values:
                    - main-service-main
                topologyKey: kubernetes.io/hostname
              weight: 100
        containers:
        - env:
          - name: AWS_DEFAULT_REGION
            value: us-west-2
          - name: AWS_METADATA_SERVICE_TIMEOUT
            value: "3"
          - name: AWS_METADATA_SERVICE_NUM_ATTEMPTS
            value: "3"
          - name: AWS_METADATA_SERVICE_TIMEOUT
            value: "3"
          - name: AWS_METADATA_SERVICE_NUM_ATTEMPTS
            value: "3"
          - name: AWS_DEFAULT_REGION
            value: us-west-2
          - name: ASSETS_BASE_PATH
            value: assets
          - name: MODELS_BASE_PATH
            value: s3://xxxx
          image: xxxxx:v28-build-serving
          imagePullPolicy: Always
          livenessProbe:
            failureThreshold: 10
            httpGet:
              path: /health/status
              port: http
              scheme: HTTP
            initialDelaySeconds: 30
            periodSeconds: 15
            successThreshold: 1
            timeoutSeconds: 1
          name: main
          ports:
          - containerPort: 6002
            name: metrics
            protocol: TCP
          readinessProbe:
            failureThreshold: 10
            httpGet:
              path: /health/status
              port: http
              scheme: HTTP
            initialDelaySeconds: 30
            periodSeconds: 5
            successThreshold: 1
            timeoutSeconds: 1
          resources:
            limits:
              cpu: "2"
              memory: 4Gi
            requests:
              cpu: 250m
              memory: 1800Mi
          volumeMounts:
          - mountPath: /etc/podinfo
            name: seldon-podinfo
    engineResources: {}
    graph:
      children:
      - endpoint:
          service_host: main-service-main.default.svc.cluster.local.
          service_port: 9002
          type: REST
        implementation: UNKNOWN_IMPLEMENTATION
        name: main
        type: MODEL
      - endpoint:
          service_host: main-service-green.default.svc.cluster.local.
          service_port: 9001
          type: REST
        implementation: UNKNOWN_IMPLEMENTATION
        name: green
        type: MODEL
      endpoint:
        service_host: localhost
        service_port: 9000
        type: REST
      implementation: UNKNOWN_IMPLEMENTATION
      name: router
      type: ROUTER
    labels:
      version: service
    name: service
    replicas: 1
    svcOrchSpec:
      env:
      - name: SELDON_LOG_LEVEL
        value: ERROR
status:
  address:
    url: http://main-service.default.svc.cluster.local:8000/api/v1.0/predictions
  deploymentStatus:
    main-service-0-router:
      availableReplicas: 2
      replicas: 2
    main-service-1-green:
      availableReplicas: 2
      replicas: 2
    main-service-2-main:
      availableReplicas: 2
      replicas: 2
  replicas: 2
  serviceStatus:
    main-service:
      grpcEndpoint: main-service.default:5001
      httpEndpoint: main-service.default:8000
      svcName: main-service
    main-service-green:
      httpEndpoint: main-service-green.default:9001
      svcName: main-service-green
    main-service-main:
      httpEndpoint: main-service-main.default:9002
      svcName: main-service-main
    main-service-router:
      httpEndpoint: main-service-router.default:9000
      svcName: main-service-router
  state: Available

Such behaviour does not exist in v1.2.1

Environment

seldon-core-operator v1.4
aws eks 1.8

@Isaacwhyuenac Isaacwhyuenac added bug triage Needs to be triaged and prioritised accordingly labels Feb 23, 2021
@Isaacwhyuenac Isaacwhyuenac changed the title SeldonPodSpec in SeldonDeployment V1alpha in seldon v1.4 is not parsing metadata successfully SeldonPodSpec in SeldonDeployment V1alpha and V1 in seldon v1.4 is not parsing metadata successfully Feb 23, 2021
@ukclivecox ukclivecox removed the triage Needs to be triaged and prioritised accordingly label Feb 25, 2021
@ukclivecox
Copy link
Contributor

Should be fixed in 1.5.2 or master.
Please reopen if still an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants