-
Notifications
You must be signed in to change notification settings - Fork 699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modify presubmits to support testing with v1alpha2 #632
Conversation
pylint failures should be fixed by kubeflow/testing#156 The other error looks like a problem with ks being too old in the test image after we upgraded the test app This should be fixed by kubeflow/testing#155 |
/test all |
1 similar comment
/test all |
* The tests are currently disabled because they aren't passing yet because termination policy isn't handled correctly (kubeflow#634) * Changed the v1alpha2 test to use the same smoke test as used by v1alpha1 as opposed to using mnist. mnist causing problems because of issues downloading the data see kubeflow/kubeflow#974 * We want a simpler test that allows for more direct testing of the distributed communication pattern * Also mnist is expensive in that it tries to download data. * Add a parameter tfJobVersion to the deploy script so we can control whether we deploy v1alpha1 or v1alpha2 * Parameterize the E2E test workflow by the TFJob version we want to run. * update test-app - We need to pull in a version of the app which has the TFJobVersion flag. * Create a script to regenerate the test-app for future use. Related to kubeflow#589
Most recent failure:
|
Looking at the event logs I see the error
|
Exec into debug worker and check the ksonnet test app
So looks like tfJobImage wasn't set correctly Argo logs look like the image isn't set correctly
|
My suspicion is that when we pushed a new testing worker image; we pushed some updates to the run_e2e_workflow.py and that broke things. |
If params.versionTag in the workflow isn't set we should use the name I changed versionTag from |
Tests are passing; this is ready for review. /assign @ankushagarwal |
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ankushagarwal, gaocegege The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* Changes to support v1alpha2 testing in presubmits. * The tests are currently disabled because they aren't passing yet because termination policy isn't handled correctly (kubeflow#634) * Changed the v1alpha2 test to use the same smoke test as used by v1alpha1 as opposed to using mnist. mnist causing problems because of issues downloading the data see kubeflow/kubeflow#974 * We want a simpler test that allows for more direct testing of the distributed communication pattern * Also mnist is expensive in that it tries to download data. * Add a parameter tfJobVersion to the deploy script so we can control whether we deploy v1alpha1 or v1alpha2 * Parameterize the E2E test workflow by the TFJob version we want to run. * update test-app - We need to pull in a version of the app which has the TFJobVersion flag. * Create a script to regenerate the test-app for future use. Related to kubeflow#589 * Fix versionTag logic; we need to allow for case where versionTag is an empty string.
* Changes to support v1alpha2 testing in presubmits. * The tests are currently disabled because they aren't passing yet because termination policy isn't handled correctly (kubeflow#634) * Changed the v1alpha2 test to use the same smoke test as used by v1alpha1 as opposed to using mnist. mnist causing problems because of issues downloading the data see kubeflow/kubeflow#974 * We want a simpler test that allows for more direct testing of the distributed communication pattern * Also mnist is expensive in that it tries to download data. * Add a parameter tfJobVersion to the deploy script so we can control whether we deploy v1alpha1 or v1alpha2 * Parameterize the E2E test workflow by the TFJob version we want to run. * update test-app - We need to pull in a version of the app which has the TFJobVersion flag. * Create a script to regenerate the test-app for future use. Related to kubeflow#589 * Fix versionTag logic; we need to allow for case where versionTag is an empty string.
Changes to support v1alpha2 testing in presubmits.
The tests are currently disabled because they aren't passing yet because
termination policy isn't handled correctly (TFJob not marked as success when master exits but not workers #634)
Changed the v1alpha2 test to use the same smoke test as used by v1alpha1 as
opposed to using mnist.
mnist causing problems because of issues downloading the data
see Presubmit failures; Timeout waiting for TFJob v1alpha2 job kubeflow#974
We want a simpler test that allows for more direct testing of the distributed
communication pattern
Also mnist is expensive in that it tries to download data.
Add a parameter tfJobVersion to the deploy script so we can control
whether we deploy v1alpha1 or v1alpha2
Parameterize the E2E test workflow by the TFJob version we want to run.
update test-app - We need to pull in a version of the app which
has the TFJobVersion flag.
Create a script to regenerate the test-app for future use.
Related to #589
This change is