-
Notifications
You must be signed in to change notification settings - Fork 699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU tests failing; ks env doesn't exist #640
Labels
Comments
jlewi
added a commit
to jlewi/k8s
that referenced
this issue
Jun 12, 2018
* Add retries for ksonnet errors because it looks like with 0.11 we start having problems because GPU and non GPU tests both try to add the environment * If the ksonnet environment already exists this will cause an error; we should keep going. Fix kubeflow#640
This was referenced Jun 12, 2018
k8s-ci-robot
pushed a commit
that referenced
this issue
Jun 12, 2018
* Add proper error handling for deploying the tests. * Add retries for ksonnet errors because it looks like with 0.11 we start having problems because GPU and non GPU tests both try to add the environment * If the ksonnet environment already exists this will cause an error; we should keep going. Fix #640 * * Add retries to test_runner * Fix lint * Fix lint. * Remove YAML files.
yph152
pushed a commit
to yph152/tf-operator
that referenced
this issue
Jun 18, 2018
* Add proper error handling for deploying the tests. * Add retries for ksonnet errors because it looks like with 0.11 we start having problems because GPU and non GPU tests both try to add the environment * If the ksonnet environment already exists this will cause an error; we should keep going. Fix kubeflow#640 * * Add retries to test_runner * Fix lint * Fix lint. * Remove YAML files.
jetmuffin
pushed a commit
to jetmuffin/tf-operator
that referenced
this issue
Jul 9, 2018
* Add proper error handling for deploying the tests. * Add retries for ksonnet errors because it looks like with 0.11 we start having problems because GPU and non GPU tests both try to add the environment * If the ksonnet environment already exists this will cause an error; we should keep going. Fix kubeflow#640 * * Add retries to test_runner * Fix lint * Fix lint. * Remove YAML files.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The GPU tests have started failing with the error
I suspect a race condition because the non gpu and gpu tests are both trying to add it at the same time.
This appears to have started when we upgraded to 0.11 in the testing container.
kubeflow/kubeflow#727
My guess is the behavior might have changed.
Adding retries might help; but it looks like we will also get an error if the environment already exists so we will need to fix that.
/assign @jlewi
The text was updated successfully, but these errors were encountered: