Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

End-to-end tests for Pre-packaged model servers hang if name doesn't match exactly #820

Closed
axsaucedo opened this issue Aug 26, 2019 · 6 comments
Milestone

Comments

@axsaucedo
Copy link
Contributor

When running e2e tests the rollout deployment checks are done with the exact string of automatically generated deployment name - i.e.:

wait_for_rollout("iris-default-8bb3ef6")

If the model is created with a different name the deployment doesn't start. A fix could be to monitor the deployment through the label name as opposed to the generated name.

@ukclivecox
Copy link
Contributor

I agree a better way to wait for rollout that is less brittle is needed.

@ukclivecox ukclivecox added this to the 1.0.x milestone Aug 26, 2019
@ukclivecox ukclivecox reopened this Nov 7, 2019
@ukclivecox ukclivecox modified the milestones: 1.0, 1.1 Nov 7, 2019
@RafalSkolasinski
Copy link
Contributor

Do we have cases when the deployment failed? How could I reproduce the failure?

Do I understand correctly that scope of the fix would be to modify wait_for_rollout function

def wait_for_rollout(deploymentName):

to monitor rollout status using labels and add appropriate labels to object's yaml definitions, e.g. here?

@RafalSkolasinski
Copy link
Contributor

It seems that kubectl rollout status is expecting a deployment name, not labels.
I see two options:

  1. Add labels to deployments and knowing it and the part of name (in context of example above: iris-default) find full name of the deployment using kubectl get ...
  2. Write a python function that process yaml object definition and generate full name

I did try to do 2. Following function

def deployment_name(fname):
    with open(fname, 'r') as f:
        data = yaml.safe_load(f.read())
    
    sdep_name = data['metadata']['name']
    predictor_spec = data['spec']['predictors'][0]
    pod_spec = predictor_spec['componentSpecs'][0]['spec']

    s = []
    for container in pod_spec['containers']:
        s.append(container['name'])
        s.append(container['image'])

    s = ":".join(s) + ";"
    pod_hash = hashlib.md5(s.encode()).hexdigest()[:7]            
    
    sdep_name = "-".join([sdep_name, predictor_spec['graph']['name'], pod_hash])
    return sdep_name

seems to work properly on yaml's that define containers, .e.g this one but does not work on ones that do not define containers, e.g. iris.yml.

@RafalSkolasinski
Copy link
Contributor

In case of iris.yml the name comes from pod having container classifier that uses seldonio/sklearnserver_rest:0.2 image:

>>> name = "classifier"
>>> image = "seldonio/sklearnserver_rest:0.2"
>>> s = f"{name}:{image};"
>>> hashlib.md5(s.encode()).hexdigest()[:7]    
4903e3c

As the yaml file does not contain information about which image will be used it may be better to indeed go with adding labels and filtering by them manually, a.k.a. option 1 in previous comment.

@RafalSkolasinski
Copy link
Contributor

I think I may have found another option. I believe that SeldonDeployment names must be unique in within a namespace. If that is true I can get name of deployments with for example

>>> import yaml
>>> from subprocess import run
>>> ret = run('kubectl get -n seldon seldondeployment sklearn -o yaml', shell=True, capture_output=True)
>>> data = yaml.safe_load(ret.stdout.decode())
>>> list(data['status']['deploymentStatus'])
['iris-default-4903e3c']

@axsaucedo @adriangonz What do you think? It seems like simplest and shortest solution.

@RafalSkolasinski
Copy link
Contributor

I pushed proof of concept fix. Check #1315.

glindsell pushed a commit to glindsell/seldon-core that referenced this issue Jan 15, 2020
New approach is based on getting deyployment names directly from
SeldonDeployment objects. This allow to avoid hard-coded hashes in
test scripts.
glindsell pushed a commit that referenced this issue Jan 15, 2020
* 1297 WIP Update Analytics Helm Chart

Signed-off-by: glindsell <gl@seldon.io>

* Update README.md

ns: seldon -> seldon-system

* first try

* add preprocessor and structure notebook

* pack outlier detection into seldon deployment

* add endpoint that combines the classification and outlier detection

* polish example and return outliers score via tags

* cleanup model wrapper

* push alternative layout of the example

* add combiner to the example

* add comments in new notebook

* use jsonData instead of strData for return values

* add logging

* introduce base image to optimize s2i builds

* remove redundant version of the example

* adjust image names

* add images and remove output from requirement installation cells

* Bump pillow from 6.2.0 to 7.0.0 in /python

Bumps [pillow](https://github.com/python-pillow/Pillow) from 6.2.0 to 7.0.0.
- [Release notes](https://github.com/python-pillow/Pillow/releases)
- [Changelog](https://github.com/python-pillow/Pillow/blob/master/CHANGES.rst)
- [Commits](python-pillow/Pillow@6.2.0...7.0.0)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>

* Bump okhttp from 4.2.2 to 4.3.0 in /engine

Bumps [okhttp](https://github.com/square/okhttp) from 4.2.2 to 4.3.0.
- [Release notes](https://github.com/square/okhttp/releases)
- [Changelog](https://github.com/square/okhttp/blob/master/CHANGELOG.md)
- [Commits](square/okhttp@parent-4.2.2...parent-4.3.0)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>

* Automatically find deployment names in e2e tests, closes #820

New approach is based on getting deyployment names directly from
SeldonDeployment objects. This allow to avoid hard-coded hashes in
test scripts.

* set deployment replicas

* Use https for training set

* Remove log4j from pom

* Update link

* apply fix to other tests and iterate over deployments in wait_for_rollout

* adjust to tests being run with Python 3.6

* remove note about missing graph, add nblink

* modify local operator tests to use proper namespace and run helm uninstall at the end

* update to new kind

* request ephemeral storage

* exception should be logged

* Bump okhttp from 4.3.0 to 4.3.1 in /engine

Bumps [okhttp](https://github.com/square/okhttp) from 4.3.0 to 4.3.1.
- [Release notes](https://github.com/square/okhttp/releases)
- [Changelog](https://github.com/square/okhttp/blob/master/CHANGELOG.md)
- [Commits](square/okhttp@parent-4.3.0...parent-4.3.1)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>

* operator build test

* 1297 WIP Update Analytics Helm Chart

Signed-off-by: glindsell <gl@seldon.io>

* typo fix: missing api in io.seldon.wrapper.api.SeldonPredictionService

* Create and use seldonio/core-builder:0.10

* fix operator build - controller-gen install for go modules

* make gpu image Python 3 exclusive, closes #1324

* version 1.0.1

* version 1.0.2-SNAPSHOT

* seldon-core python version 1.0.1

* python wrapper version usage updated

* update images reference doc

Co-authored-by: RafalSkolasinski <r.j.skolasinski@gmail.com>
Co-authored-by: dependabot-preview[bot] <27856297+dependabot-preview[bot]@users.noreply.github.com>
Co-authored-by: Adrian Gonzalez <adrian.gonz.mar@gmail.com>
Co-authored-by: Ryan Dawson <ryandawson@cantab.net>
Co-authored-by: Gurminder Sunner <gsunner2000@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants