-
Notifications
You must be signed in to change notification settings - Fork 834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
End-to-end tests for Pre-packaged model servers hang if name doesn't match exactly #820
Comments
I agree a better way to wait for rollout that is less brittle is needed. |
Do we have cases when the deployment failed? How could I reproduce the failure? Do I understand correctly that scope of the fix would be to modify
to monitor rollout status using labels and add appropriate labels to object's yaml definitions, e.g. here? |
It seems that
I did try to do 2. Following function def deployment_name(fname):
with open(fname, 'r') as f:
data = yaml.safe_load(f.read())
sdep_name = data['metadata']['name']
predictor_spec = data['spec']['predictors'][0]
pod_spec = predictor_spec['componentSpecs'][0]['spec']
s = []
for container in pod_spec['containers']:
s.append(container['name'])
s.append(container['image'])
s = ":".join(s) + ";"
pod_hash = hashlib.md5(s.encode()).hexdigest()[:7]
sdep_name = "-".join([sdep_name, predictor_spec['graph']['name'], pod_hash])
return sdep_name seems to work properly on yaml's that define containers, .e.g this one but does not work on ones that do not define containers, e.g. iris.yml. |
In case of >>> name = "classifier"
>>> image = "seldonio/sklearnserver_rest:0.2"
>>> s = f"{name}:{image};"
>>> hashlib.md5(s.encode()).hexdigest()[:7]
4903e3c As the yaml file does not contain information about which image will be used it may be better to indeed go with adding labels and filtering by them manually, a.k.a. option 1 in previous comment. |
I think I may have found another option. I believe that >>> import yaml
>>> from subprocess import run
>>> ret = run('kubectl get -n seldon seldondeployment sklearn -o yaml', shell=True, capture_output=True)
>>> data = yaml.safe_load(ret.stdout.decode())
>>> list(data['status']['deploymentStatus'])
['iris-default-4903e3c'] @axsaucedo @adriangonz What do you think? It seems like simplest and shortest solution. |
I pushed proof of concept fix. Check #1315. |
New approach is based on getting deyployment names directly from SeldonDeployment objects. This allow to avoid hard-coded hashes in test scripts.
* 1297 WIP Update Analytics Helm Chart Signed-off-by: glindsell <gl@seldon.io> * Update README.md ns: seldon -> seldon-system * first try * add preprocessor and structure notebook * pack outlier detection into seldon deployment * add endpoint that combines the classification and outlier detection * polish example and return outliers score via tags * cleanup model wrapper * push alternative layout of the example * add combiner to the example * add comments in new notebook * use jsonData instead of strData for return values * add logging * introduce base image to optimize s2i builds * remove redundant version of the example * adjust image names * add images and remove output from requirement installation cells * Bump pillow from 6.2.0 to 7.0.0 in /python Bumps [pillow](https://github.com/python-pillow/Pillow) from 6.2.0 to 7.0.0. - [Release notes](https://github.com/python-pillow/Pillow/releases) - [Changelog](https://github.com/python-pillow/Pillow/blob/master/CHANGES.rst) - [Commits](python-pillow/Pillow@6.2.0...7.0.0) Signed-off-by: dependabot-preview[bot] <support@dependabot.com> * Bump okhttp from 4.2.2 to 4.3.0 in /engine Bumps [okhttp](https://github.com/square/okhttp) from 4.2.2 to 4.3.0. - [Release notes](https://github.com/square/okhttp/releases) - [Changelog](https://github.com/square/okhttp/blob/master/CHANGELOG.md) - [Commits](square/okhttp@parent-4.2.2...parent-4.3.0) Signed-off-by: dependabot-preview[bot] <support@dependabot.com> * Automatically find deployment names in e2e tests, closes #820 New approach is based on getting deyployment names directly from SeldonDeployment objects. This allow to avoid hard-coded hashes in test scripts. * set deployment replicas * Use https for training set * Remove log4j from pom * Update link * apply fix to other tests and iterate over deployments in wait_for_rollout * adjust to tests being run with Python 3.6 * remove note about missing graph, add nblink * modify local operator tests to use proper namespace and run helm uninstall at the end * update to new kind * request ephemeral storage * exception should be logged * Bump okhttp from 4.3.0 to 4.3.1 in /engine Bumps [okhttp](https://github.com/square/okhttp) from 4.3.0 to 4.3.1. - [Release notes](https://github.com/square/okhttp/releases) - [Changelog](https://github.com/square/okhttp/blob/master/CHANGELOG.md) - [Commits](square/okhttp@parent-4.3.0...parent-4.3.1) Signed-off-by: dependabot-preview[bot] <support@dependabot.com> * operator build test * 1297 WIP Update Analytics Helm Chart Signed-off-by: glindsell <gl@seldon.io> * typo fix: missing api in io.seldon.wrapper.api.SeldonPredictionService * Create and use seldonio/core-builder:0.10 * fix operator build - controller-gen install for go modules * make gpu image Python 3 exclusive, closes #1324 * version 1.0.1 * version 1.0.2-SNAPSHOT * seldon-core python version 1.0.1 * python wrapper version usage updated * update images reference doc Co-authored-by: RafalSkolasinski <r.j.skolasinski@gmail.com> Co-authored-by: dependabot-preview[bot] <27856297+dependabot-preview[bot]@users.noreply.github.com> Co-authored-by: Adrian Gonzalez <adrian.gonz.mar@gmail.com> Co-authored-by: Ryan Dawson <ryandawson@cantab.net> Co-authored-by: Gurminder Sunner <gsunner2000@gmail.com>
When running e2e tests the rollout deployment checks are done with the exact string of automatically generated deployment name - i.e.:
seldon-core/testing/scripts/test_prepackaged_servers.py
Line 35 in 60c9fd2
If the model is created with a different name the deployment doesn't start. A fix could be to monitor the deployment through the label name as opposed to the generated name.
The text was updated successfully, but these errors were encountered: