Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deep_mnist example: failed calling webhook "v1alpha2.mseldondeployment.kb.io" #2107

Closed
marhav20 opened this issue Jul 10, 2020 · 2 comments
Closed
Labels
bug triage Needs to be triaged and prioritised accordingly

Comments

@marhav20
Copy link

marhav20 commented Jul 10, 2020

In seldon v1.2.1 installed via helm on K8 1.17.8 I tried to run the examples/models/deep_mnist example from the command line and got the following error, which also occurred with the chainer_mnist example:

[root@seldontest deep_mnist]# kubectl create -f deep_mnist.json
Error from server (InternalError): error when creating "deep_mnist.json": Internal error occurred: failed calling webhook "v1alpha2.mseldondeployment.kb.io": Post https://seldon-webhook-service.seldon-system.svc:443/mutate-machinelearning-seldon-io-v1alpha2-seldondeployment?timeout=30s: unexpected EOF

I would appreciate any help how to debug this.

Steps for deep_mnist (proxy is needed because the VM sits in a corporate network):

[root@seldontest deep_mnist]# pip3 install -r requirements.txt
...
[root@seldontest deep_mnist]# python3 create_model.py
...
[root@seldontest deep_mnist]# s2i build -e HTTP_PROXY=$http_proxy -e HTTPS_PROXY=$https_proxy . seldonio/seldon-core-s2i-python36:1.2.2-dev deep-mnist:0.1
...
[root@seldontest deep_mnist]# kubectl create -f deep_mnist.json

The K8 cluster was installed with kubeadm in a vagrant VirtualBox CentOS/7 VM. Here some configuration output that may help. (I abbreviated the CA Bundle certs for readability):

[root@seldontest deep_mnist]# kubectl version
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.8", GitCommit:"35dc4cdc26cfcb6614059c4c6e836e5f0dc61dee", GitTreeState:"clean", BuildDate:"2020-06-26T03:43:27Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.0", GitCommit:"70132b0f130acc0bed193d9ba59dd186f0e634cf", GitTreeState:"clean", BuildDate:"2019-12-07T21:12:17Z", GoVersion:"go1.13.4", Compiler:"gc", Platform:"linux/amd64"}
[root@seldontest deep_mnist]# kubectl get pods
NAME                                                             READY   STATUS    RESTARTS   AGE
seldon-controller-manager-769694875-2x7dg                        1/1     Running   0          6h34m
seldon-core-analytics-grafana-6b85b9dd45-mxhvf                   2/2     Running   0          6h34m
seldon-core-analytics-kube-state-metrics-757fc85968-6qj88        1/1     Running   0          6h34m
seldon-core-analytics-prometheus-alertmanager-69cf96b5cb-mkld2   2/2     Running   0          6h34m
seldon-core-analytics-prometheus-node-exporter-2ct5c             1/1     Running   0          6h34m
seldon-core-analytics-prometheus-pushgateway-5db464b864-zt2pf    1/1     Running   0          6h34m
seldon-core-analytics-prometheus-seldon-7756c74cf5-9sps9         2/2     Running   0          6h34m
[root@seldontest deep_mnist]# kubectl get services
NAME                                             TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)       AGE
seldon-core-analytics-grafana                    NodePort    10.96.93.61    <none>        80:3000/TCP   6h34m
seldon-core-analytics-kube-state-metrics         ClusterIP   10.96.211.55   <none>        8080/TCP      6h34m
seldon-core-analytics-prometheus-alertmanager    ClusterIP   10.96.25.1     <none>        80/TCP        6h34m
seldon-core-analytics-prometheus-node-exporter   ClusterIP   None           <none>        9100/TCP      6h34m
seldon-core-analytics-prometheus-pushgateway     ClusterIP   10.96.97.113   <none>        9091/TCP      6h34m
seldon-core-analytics-prometheus-seldon          ClusterIP   10.96.206.1    <none>        80/TCP        6h34m
seldon-webhook-service                           ClusterIP   10.96.29.229   <none>        443/TCP       6h34m
[root@seldontest deep_mnist]# kubectl get mutatingwebhookconfigurations
NAME                                                  CREATED AT
seldon-mutating-webhook-configuration-seldon-system   2020-07-10T08:53:02Z
[root@seldontest deep_mnist]# kubectl describe mutatingwebhookconfigurations seldon-mutating-webhook-configuration-seldon-system
Name:         seldon-mutating-webhook-configuration-seldon-system
Namespace:
Labels:       app=seldon
              app.kubernetes.io/instance=seldon-core-operator
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=seldon-core-operator
              app.kubernetes.io/version=1.2.1
Annotations:  cert-manager.io/inject-ca-from: seldon-system/seldon-serving-cert
              meta.helm.sh/release-name: seldon-core-operator
              meta.helm.sh/release-namespace: seldon-system
API Version:  admissionregistration.k8s.io/v1
Kind:         MutatingWebhookConfiguration
Metadata:
  Creation Timestamp:  2020-07-10T08:53:02Z
  Generation:          1
  Resource Version:    1975
  Self Link:           /apis/admissionregistration.k8s.io/v1/mutatingwebhookconfigurations/seldon-mutating-webhook-configuration-seldon-system
  UID:                 7ec4acb7-78ca-4167-9a91-23b6a68c1886
Webhooks:
  Admission Review Versions:
    v1beta1
  Client Config:
    Ca Bundle:  LS0tLS1CRUdJTi...
    Service:
      Name:        seldon-webhook-service
      Namespace:   seldon-system
      Path:        /mutate-machinelearning-seldon-io-v1-seldondeployment
      Port:        443
  Failure Policy:  Fail
  Match Policy:    Exact
  Name:            v1.mseldondeployment.kb.io
  Namespace Selector:
    Match Expressions:
      Key:       seldon.io/controller-id
      Operator:  DoesNotExist
  Object Selector:
    Match Expressions:
      Key:              seldon.io/controller-id
      Operator:         DoesNotExist
  Reinvocation Policy:  Never
  Rules:
    API Groups:
      machinelearning.seldon.io
    API Versions:
      v1
    Operations:
      CREATE
      UPDATE
    Resources:
      seldondeployments
    Scope:          *
  Side Effects:     Unknown
  Timeout Seconds:  30
  Admission Review Versions:
    v1beta1
  Client Config:
    Ca Bundle:  LS0tLS1CRUdJTi...
    Service:
      Name:        seldon-webhook-service
      Namespace:   seldon-system
      Path:        /mutate-machinelearning-seldon-io-v1alpha2-seldondeployment
      Port:        443
  Failure Policy:  Fail
  Match Policy:    Exact
  Name:            v1alpha2.mseldondeployment.kb.io
  Namespace Selector:
    Match Expressions:
      Key:       seldon.io/controller-id
      Operator:  DoesNotExist
  Object Selector:
    Match Expressions:
      Key:              seldon.io/controller-id
      Operator:         DoesNotExist
  Reinvocation Policy:  Never
  Rules:
    API Groups:
      machinelearning.seldon.io
    API Versions:
      v1alpha2
    Operations:
      CREATE
      UPDATE
    Resources:
      seldondeployments
    Scope:          *
  Side Effects:     Unknown
  Timeout Seconds:  30
  Admission Review Versions:
    v1beta1
  Client Config:
    Ca Bundle:  LS0tLS1CRUdJT...
    Service:
      Name:        seldon-webhook-service
      Namespace:   seldon-system
      Path:        /mutate-machinelearning-seldon-io-v1alpha3-seldondeployment
      Port:        443
  Failure Policy:  Fail
  Match Policy:    Exact
  Name:            v1alpha3.mseldondeployment.kb.io
  Namespace Selector:
    Match Expressions:
      Key:       seldon.io/controller-id
      Operator:  DoesNotExist
  Object Selector:
    Match Expressions:
      Key:              seldon.io/controller-id
      Operator:         DoesNotExist
  Reinvocation Policy:  Never
  Rules:
    API Groups:
      machinelearning.seldon.io
    API Versions:
      v1alpha3
    Operations:
      CREATE
      UPDATE
    Resources:
      seldondeployments
    Scope:          *
  Side Effects:     Unknown
  Timeout Seconds:  30
Events:             <none>

Access of the webhook seems to work:

[root@seldontest deep_mnist]# kubectl port-forward service/seldon-webhook-service 3500:443
..
[root@seldontest deep_mnist]# SECRET_NAME=$(kubectl get secrets | grep ^seldon-webhook | cut -f1 -d ' ')
[root@seldontest deep_mnist]# TOKEN=$(kubectl get secret $SECRET_NAME -o jsonpath='{.data.token}' | base64 --decode)
[root@seldontest deep_mnist]# curl -v -k -X POST -d "crap" --header "Authorization: Bearer $TOKEN" --header "Content-Type: application/json" https://localhost:3500/mutate-machinelearning-seldon-io-v1alpha2-seldondeployment
* About to connect() to localhost port 3500 (#0)
*   Trying ::1...
* Connected to localhost (::1) port 3500 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* skipping SSL peer certificate verification
* Server certificate:
*       subject: CN=seldon-webhook-service
*       start date: Jul 10 08:53:01 2020 GMT
*       expire date: Jul 10 08:53:01 2021 GMT
*       common name: seldon-webhook-service
*       issuer: CN=custom-metrics-ca
> POST /mutate-machinelearning-seldon-io-v1alpha2-seldondeployment HTTP/1.1
> User-Agent: curl/7.29.0
> Host: localhost:3500
> Accept: */*
> Authorization: Bearer
> Content-Type: application/json
> Content-Length: 4
>
* upload completely sent off: 4 out of 4 bytes
< HTTP/1.1 200 OK
< Date: Fri, 10 Jul 2020 15:34:13 GMT
< Content-Length: 297
< Content-Type: text/plain; charset=utf-8
<
{"response":{"uid":"","allowed":false,"status":{"metadata":{},"message":"couldn't get version/kind; json parse error: json: cannot unmarshal string into Go value of type struct { APIVersion string \"json:\\\"apiVersion,omitempty\\\"\"; Kind string \"json:\\\"kind,omitempty\\\"\" }","code":400}}}
* Connection #0 to host localhost left intact
@marhav20 marhav20 added bug triage Needs to be triaged and prioritised accordingly labels Jul 10, 2020
@marhav20
Copy link
Author

Seems something is wrong with the CRD registration. The K8 1.17 api server logs:

E0710 08:53:07.473618       1 customresource_handler.go:655] error building openapi models for seldondeployments.machinelearning.seldon.io: ERROR $root.definitions.io.seldon.machinelearning.v1.S
eldonDeployment.properties.spec.properties.predictors.items.<array>.properties.componentSpecs.items.<array>.properties.hpaSpec.properties.metrics.items.<array>.properties.external.properties.tar
getAverageValue has invalid property: anyOf
ERROR $root.definitions.io.seldon.machinelearning.v1.SeldonDeployment.properties.spec.properties.predictors.items.<array>.properties.componentSpecs.items.<array>.properties.hpaSpec.properties.me
trics.items.<array>.properties.external.properties.targetValue has invalid property: anyOf
ERROR $root.definitions.io.seldon.machinelearning.v1.SeldonDeployment.properties.spec.properties.predictors.items.<array>.properties.componentSpecs.items.<array>.properties.hpaSpec.properties.me
trics.items.<array>.properties.object.properties.averageValue has invalid property: anyOf
...

@marhav20
Copy link
Author

The issue was a http/https_proxy configuration in the kubernetes API server. The http/https_proxy configuration was picked up by kubeadm from the environment and configured for the API server. Since the Seldon webhook domain seldon-webhook-service.seldon-system.svc was not part of the no_proxy statement kubernetes API server requests to the webhook went to the proxy, which did not know anything about the cluster internal IP to which the seldon-webhook-service.seldon-system.svc resolves.

So the solution is to either install the cluster without proxy configuration, i.e. unset the http/https_proxy configuration before calling kubeadm init, or to add .svc to the no_proxy environment variable, i.e. export no_proxy="$no_proxy,.svc"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug triage Needs to be triaged and prioritised accordingly
Projects
None yet
Development

No branches or pull requests

1 participant