Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update releaser to use Argo. #400

Closed
jlewi opened this issue Feb 23, 2018 · 2 comments
Closed

Update releaser to use Argo. #400

jlewi opened this issue Feb 23, 2018 · 2 comments

Comments

@jlewi
Copy link
Contributor

jlewi commented Feb 23, 2018

Our releaser for tf-operator is currently broken.

Due to various issues.

HEAD is now at 330eb92... Add an option to release.py to specify the tag for the image to use. (#357)

INFO|2018-02-23T00:51:58|py/util.py|79| Running: git rev-parse HEAD 
cwd=/tmp/tfk8s.src.tmp.g8aL4E
INFO|2018-02-23T00:51:58|py/util.py|88| Subprocess output:
330eb9239e006c2bcd04f90d57b01f873124c3e9

+ GOPATH=/tmp/tfk8s.src.tmp.g8aL4E/go
+ mkdir -p /tmp/tfk8s.src.tmp.g8aL4E/go
+ cd /tmp/tfk8s.src.tmp.g8aL4E
+ python -m py.release build_new_release --src_dir=/tmp/tfk8s.src.tmp.g8aL4E --registry=gcr.io/tf-on-k8s-dogfood --project=tf-on-k8s-releasing --releases_path=gs://tf-on-k8s-dogfood-releases
INFO|2018-02-23T00:51:59|/tmp/tfk8s.src.tmp.g8aL4E/py/release.py|469| Latest passing postsubmit is 330eb9239e006c2bcd04f90d57b01f873124c3e9
INFO|2018-02-23T00:52:00|/tmp/tfk8s.src.tmp.g8aL4E/py/release.py|472| Most recent release was for cabc1c0-dirty-e3b0c44
INFO|2018-02-23T00:52:00|/tmp/tfk8s.src.tmp.g8aL4E/py/release.py|330| Use --src_dir=/tmp/tfk8s.src.tmp.g8aL4E
INFO|2018-02-23T00:52:00|/tmp/tfk8s.src.tmp.g8aL4E/py/release.py|339| /tmp/tfk8s.src.tmp.g8aL4E/go/src/github.com/tensorflow/k8s does not exist.
INFO|2018-02-23T00:52:00|/tmp/tfk8s.src.tmp.g8aL4E/py/release.py|346| Creating symbolic link /tmp/tfk8s.src.tmp.g8aL4E/go/src/github.com/tensorflow/k8s pointing to /tmp/tfk8s.src.tmp.g8aL4E
INFO|2018-02-23T00:52:00|/tmp/tfk8s.src.tmp.g8aL4E/py/release.py|369| vendor directory exists; not installing go dependencies.
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/tmp/tfk8s.src.tmp.g8aL4E/py/release.py", line 700, in <module>
    main()
  File "/tmp/tfk8s.src.tmp.g8aL4E/py/release.py", line 697, in main
    args.func(args)
  File "/tmp/tfk8s.src.tmp.g8aL4E/py/release.py", line 480, in build_new_release
    build(args)
  File "/tmp/tfk8s.src.tmp.g8aL4E/py/release.py", line 373, in build
    build_and_push(go_dir, args.src_dir, args)
  File "/tmp/tfk8s.src.tmp.g8aL4E/py/release.py", line 393, in build_and_push
    version_tag=args.version_tag)
AttributeError: 'Namespace' object has no attribute 'version_tag'

I think we should refactor this to be an Argo workflow that builds ontop of our E2E workflows so that we validate the image as part of a release.

We could start with what our E2E tests do and then if the tests pass push the image to an appropriate registry with the appropriate tag.

jlewi added a commit that referenced this issue Feb 27, 2018
…403)

Update our E2E test to use ksonnet not helm to deploy

As part of this we have a slightl loss in test coverage because our
helm test provides some verification that our python tests don't.
Create a releasing environment for workflows that has parameters set
as needed to build using our release cluster.

Related to
#400 Use Argo for releases
#373 Improve our test harness
@jlewi
Copy link
Contributor Author

jlewi commented Feb 27, 2018

I don't think it makes sense to update the cron job/releaser that regularly rebuilds the TFJob operator image. We should just delete that code right now since its no longer used.

Instead of updating that, we should think about how to continuosly buiild docker image images as needed as part of our release process for Kubeflow ksonnet packages.

Per kubeflow/testing#40 right now we would just manually run the Argo workflow to build the TFJob image before cutting the ksonnet release.

@gaocegege
Copy link
Member

SGTM, I think we should establish the continuous deployment process to build and publish docker image.

jlewi added a commit to kubeflow/testing that referenced this issue Feb 28, 2018
Our release infra is pretty much a mirror of our test infra except more restricted (e.g. we don't expose the Argo UI.)

We also need to grant the service account permissions on projects used as
GCR registries.

Update the instructions for setting up our test infra.
Provide gcloud commands for some steps.

Related to kubeflow/trainer#400
@jlewi jlewi closed this as completed Jul 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants