Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add proper error handling for deploying the tests. #642

Merged
merged 4 commits into from
Jun 12, 2018

Conversation

jlewi
Copy link
Contributor

@jlewi jlewi commented Jun 12, 2018

  • Add retries for ksonnet errors because it looks like with 0.11 we start
    having problems because GPU and non GPU tests both try to add the environment

  • If the ksonnet environment already exists this will cause an error;
    we should keep going.

  • Use kubeflow/testing/py/util.py rather than the util module in tf-operator

  • In waiting for pod deletes; don't print an error when pods not found because that's expected and the error just causes confusion.

Fix #640


This change is Reviewable

* Add retries for ksonnet errors because it looks like with 0.11 we start
  having problems because GPU and non GPU tests both try to add the environment

* If the ksonnet environment already exists this will cause an error;
  we should keep going.

Fix kubeflow#640
@coveralls
Copy link

coveralls commented Jun 12, 2018

Coverage Status

Coverage remained the same at 55.947% when pulling df65c56 on jlewi:test_flakes into 16b2a7a on kubeflow:master.

@jlewi jlewi mentioned this pull request Jun 12, 2018
@jlewi
Copy link
Contributor Author

jlewi commented Jun 12, 2018

/assign @gaocegege
/assign @ankushagarwal

@ankushagarwal
Copy link

ankushagarwal commented Jun 12, 2018

What do the yaml files do? Are they checked-in on intentionally?

@jlewi
Copy link
Contributor Author

jlewi commented Jun 12, 2018

@ankushagarwal Removed the YAML files; Looks like they were spurious; probably the result of updating the test app in another branch.

@ankushagarwal
Copy link

/lgtm
/approve

Thanks for fixing this!

@jlewi
Copy link
Contributor Author

jlewi commented Jun 12, 2018

/approve

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ankushagarwal, jlewi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit e164ba5 into kubeflow:master Jun 12, 2018
@jlewi jlewi mentioned this pull request Jun 12, 2018
yph152 pushed a commit to yph152/tf-operator that referenced this pull request Jun 18, 2018
* Add proper error handling for deploying the tests.

* Add retries for ksonnet errors because it looks like with 0.11 we start
  having problems because GPU and non GPU tests both try to add the environment

* If the ksonnet environment already exists this will cause an error;
  we should keep going.

Fix kubeflow#640

* * Add retries to test_runner
* Fix lint

* Fix lint.

* Remove YAML files.
jetmuffin pushed a commit to jetmuffin/tf-operator that referenced this pull request Jul 9, 2018
* Add proper error handling for deploying the tests.

* Add retries for ksonnet errors because it looks like with 0.11 we start
  having problems because GPU and non GPU tests both try to add the environment

* If the ksonnet environment already exists this will cause an error;
  we should keep going.

Fix kubeflow#640

* * Add retries to test_runner
* Fix lint

* Fix lint.

* Remove YAML files.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants