Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seldon AB testing - getting an error "info": "Parameter 'ratioA' is missing." #1081

Closed
vackysh opened this issue Nov 9, 2019 · 18 comments
Closed

Comments

@vackysh
Copy link

vackysh commented Nov 9, 2019

Hi Experts,

I am working on the AB testing model where i have below yaml file for seldon deployment. The deployment is happened successfully. When i try to call through Predict API, it is throwing an error:

{
"code": 204,
"info": "Parameter 'ratioA' is missing.",
"reason": "Error happened in AB Test Routing",
"status": "FAILURE"
}

Although, i have all necessary parameters mentioned in yaml file
seldon deployement file:


apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
labels:
app: seldon
name: "seldon-deployment-{{workflow.name}}"
namespace: kubeflow
spec:
annotations:
project_name: Creditloan predictor DVC feedback loop
deployment_version: v1
name: "seldon-deployment-{{workflow.name}}"
oauth_key: oauth-key
oauth_secret: oauth-secret
predictors:

  • componentSpecs:
    • spec:
      containers:
      • image: image1
        imagePullPolicy: IfNotPresent
        name: feature-engg
        resources:
        requests:
        memory: 1Mi
        volumeMounts:
        • name: mypvc
          mountPath: /mnt
      • image: image2
        imagePullPolicy: IfNotPresent
        name: sklearn
        resources:
        requests:
        memory: 1Mi
        volumeMounts:
        • name: mypvc
          mountPath: /mnt
          terminationGracePeriodSeconds: 20
          volumes:
      • name: mypvc
        persistentVolumeClaim:
        claimName: "{{workflow.name}}-my-pvc"
        graph:
        name: random-ab-test
        implementation: RANDOM_ABTEST
        parameter:
        name: ratioA
        value: 0.5
        type: FLOAT
        children:
      • name: feature-engg
        endpoint:
        type: REST
        type: MODEL
      • name: sklearn
        endpoint:
        type: REST
        type: MODEL
        name: cld-random-ab-test
        replicas: 1
        annotations:
        predictor_version: v1

Can you please let me know where i am doing a mistake ?

Thanks in advance

Regards,
Varun

@ukclivecox
Copy link
Contributor

I think the core reason is you have parameter above not parameters.

However, this should be caught by validation. Which version of Seldon are you running and how did you install it. It should have said something like:

error: error validating "mab2.yaml": error validating data: ValidationError(SeldonDeployment.spec.predictors[0].graph): unknown field "parameter" in io.seldon.machinelearning.v1alpha2.SeldonDeployment.spec.predictors.graph; if you choose to ignore these errors, turn validation off with --validate=false

@ukclivecox
Copy link
Contributor

if (parameter == null) {
throw new APIException(
APIException.ApiExceptionType.ENGINE_INVALID_ABTEST, "Parameter 'ratioA' is missing.");

@vackysh
Copy link
Author

vackysh commented Nov 11, 2019

Hi,

Ther seldon version is 0.4 which is insatlled on kubernetes machine.

However, i tried using parameters instead parameter and didn't get any validation issue.
while deployment, it got failed and thrown an error

This step is in Failed state with this message: Error from server (InternalError): error when creating "/tmp/manifest.yaml": Internal error occurred: admission webhook "mutating-create-update-seldondeployment.seldon.io" denied the request: v1alpha2.SeldonDeployment.Spec: v1alpha2.SeldonDeploymentSpec.Predictors: []v1alpha2.PredictorSpec: v1alpha2.PredictorSpec.Graph: v1alpha2.PredictiveUnit.Parameters: []v1alpha2.Parameter: decode slice: expect [ or n, but found {, error found in #10 byte of ...|ameters":{"name":"ra|..., bigger context ...|DOM_ABTEST","name":"random-ab-test","parameters":{"name":"ratioA","type":"FLOAT","value":0.5}},"name|...

After this, i changed the "parameters" to "parameter" and deployment succeeded but got a below response when predict API is called

{
"code": 204,
"info": "Parameter 'ratioA' is missing.",
"reason": "Error happened in AB Test Routing",
"status": "FAILURE"
}

I couldn't find the issue as i created yaml file based on seldon documentation.

Regards.
Vackysh

@vackysh
Copy link
Author

vackysh commented Nov 11, 2019

Hi,

here, we have complete error

error when creating "/tmp/manifest.yaml": Internal error occurred: admission webhook "mutating-create-update-seldondeployment.seldon.io" denied the request: v1alpha2.SeldonDeployment.Spec: v1alpha2.SeldonDeploymentSpec.Predictors: []v1alpha2.PredictorSpec: v1alpha2.PredictorSpec.Graph: v1alpha2.PredictiveUnit.Parameters: []v1alpha2.Parameter: decode slice: expect [ or n, but found {, error found in #10 byte of ...|ameters":{"name":"ra|..., bigger context ...|DOM_ABTEST","name":"random-ab-test","parameters":{"name":"ratioA","type":"FLOAT","value":0.5}},"name|...\ngithub.com/argoproj/argo/errors.New\n\t/go/src/github.com/argoproj/argo/errors/errors.go:49\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).ExecResource\n\t/go/src/github.com/argoproj/argo/workflow/executor/resource.go:62\ngithub.com/argoproj/argo/cmd/argoexec/commands.execResource\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/resource.go:44\ngithub.com/argoproj/argo/cmd/argoexec/commands.NewResourceCommand.func1\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/resource.go:21\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/src/github.com/spf13/cobra/command.go:766\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/src/github.com/spf13/cobra/command.go:852\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/src/github.com/spf13/cobra/command.go:800\nmain.main\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:201\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1333"

Regards,
vackysh

@ukclivecox
Copy link
Contributor

@ukclivecox
Copy link
Contributor

Can you show the yaml for the parameters you are using?

@vackysh
Copy link
Author

vackysh commented Nov 11, 2019

Hi,

No, i didn't use list of parameters.
here is graph configuration from yaml file


graph:
name: random-ab-test
implementation: RANDOM_ABTEST
parameter:
name: ratioA
value: 0.5
type: FLOAT
children:
name: feature-engg
endpoint:
type: REST
type: MODEL
name: sklearn
endpoint:
type: REST
type: MODEL
name: cld-random-ab-test
replicas: 1
annotations:
predictor_version: v1

Can you please give me exact code for list of parameters ?

Regards,
Vackysh

@vackysh
Copy link
Author

vackysh commented Nov 11, 2019

Hi,

Uploaded yaml file.

seldon_deployment.yaml.txt

Kindly let me know what is the issue ?

Regards,
Varun

@ukclivecox
Copy link
Contributor

@vackysh
Copy link
Author

vackysh commented Nov 12, 2019

Hi @cliveseldon ,

Thanks for your support.

We are able to fix the yaml file. But we have now another issue aroused.
The seldon deployment happened on kubernetes server but pods are still in waiting state and looking for the input models.

seldon-4d84728ace66503bb615d09e87a9863e-8894fc496-nkc6c 0/3 Running 8 7m17s

We are using kubeflow pipeline for model deployment on seldon.
Here is the pipeline flow:
model downloader -----> predict ---------> seldon AB testing deployment (ref. yaml file attached prev)

volume mounts (/mnt) and pvc is mentioned in yaml file.

model downloader download the model from external storage and load on volume mount /mnt, after that predict component reads the model and doing the predictions. But when it comes to seldon deployment AB testing pod, it is unable to find the model on same pvc and remains in waiting state.

Can you please let us know what we are missing ?

I appreciate your quick response here.

Regards,
Varun

@ukclivecox
Copy link
Contributor

Not sure. Have you kubectl describe on the Deployment to see if it mounted the PVC ok?

@ryandawsonuk
Copy link
Contributor

You may also be able to do a kubectl exec -it onto the Pod and list what files are in the expected directory.

@vackysh
Copy link
Author

vackysh commented Nov 12, 2019

Hi @cliveseldon ,

yes, i can see it is mounted on pvc when i did kubectl describe

Mounts:
/mnt from mypvc (rw)

But i couldn't find any ls /mnt when i did kubectl exec -it on pod.
Although i can see only source code files under /workspace

ls /workspace

Predictor.py init.py pycache build_image.sh pipeline_step.py requirements.txt

It's very strange, It is working fine when we are deploying the single model but failed to locate the /mnt when we do seldon deployment for AB testing.

I have attached yaml file again for your reference.
Could you please have a look and let me know if i did any mistake ?

Regards,
Vackysh
seldon_deployment.yaml.txt

Regards,
Varun

@ukclivecox
Copy link
Contributor

is your pvc read-many?

@vackysh
Copy link
Author

vackysh commented Nov 12, 2019

Yes, it is ReadWriteMany access mode.

Regards,
Vackysh

@ukclivecox
Copy link
Contributor

Its hard for us to debug further unless you can provide some error log as to why the PVC is being made available to all components.

@vackysh
Copy link
Author

vackysh commented Nov 13, 2019

The issue has been resolved. It was an issue with yaml/json file where i removed the part volumeSource and it started working. Now we can see the mounted volume with model artifacts.

         ],
           "volumes": [
                  {
                      "name": "mypvc",
                      "persistentVolumeClaim": {
                               "claimName": "{{workflow.name}}-mypvc"
                                        }
                                   }
                     ]

Thanks for your support.

@ukclivecox
Copy link
Contributor

Great! Glad its solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants