This repository has been archived by the owner on Sep 19, 2022. It is now read-only.
Features
- support cleanPodPolicy is Running, same as tf operator (#288, @jiaqianjing)
- Migrate pytorch-operator to go modules (#272, @Jeffwan)
- feat(init_container): Add init container image CLI argument (#265, @gaocegege)
- SDK supports getting PyTorchJob training process or logs (#252, @jinchihe)
- Add watch function for PyTorchJob python Client API (#246, @jinchihe)
- Add more APIs for SDK (#240, @jinchihe)
- feat(deletePodsAndServices):only delete master service (#233, @leileiwan)
- Generate Kubeflow PyTorchJob SDK (#227, @jinchihe)
- feat: Replace common with kubeflow/common (#225, @gaocegege)
- feat: Use golanglint (#226, @gaocegege)
- Removing v1beta2 support (#222, @johnugeorge)
- feat: Support running although it is uesless (#194, @gaocegege)
- use init container for worker pod to wait master pod ready (#187, @zlcnju)
- fix: Fix the comments (#193, @gaocegege)
- Minor fix to add CoreV1 to scheme (#184, @johnugeorge)
- update release script; fix post submit (#189, @johnugeorge)
Bug fixed
- Fix Unit Tests (#293, @andreyvelich)
- Fix minor OpenShift issues - resource requests, Dockerfile (#276, @vpavlin)
- Fix the link to run_e2e_workflow.py script (#266, @terrytangyuan)
- fix: Add resource limits for init container (#253, @gaocegege)
- fix the reconcile flow (#242, @ChanYiLin)
- fix(*) rm work service in controller_test.go (#235, @leileiwan)
- fix(job_test) test case should not include worker service (#231, @leileiwan)
Chores
- Change mnist example to use FashionMNIST (#327, @Jeffwan)
- pytorch-operator: Consolidate manifests (#323, @yanniszark)
- Temporarily disable mnist test case (#326, @Jeffwan)
- PyTorch Operator: Move manifests development upstream (#320, @yanniszark)
- Migrate to new test-infra (#316, @PatrickXYS)
- update pytorch-operator deployment manifests file (#295, @myonlyzzy)
- Add @andreyvelich to approvers (#309, @andreyvelich)
- Reuse Common Scripts for Creating / Deleting EKS clusters (#308, @PatrickXYS)
- Add Jeffwan@ to OWNERS (#306, @Jeffwan)
- Move PyTorch Operator e2e tests to AWS Prow (#305, @Jeffwan)
- Update openapi-gen to not rely on vendor (#274, @Jeffwan)
- Update README.md (#290, @pingsutw)
- Update CRD link (#289, @pingsutw)
- Adds notes and example annotation for pytorch job (#285, @shawnzhu)
- chore: Update OWNERS (#286, @gaocegege)
- fix Dockerfile-mpi download miniconda.sh (#277, @jiaqianjing)
- Update swagger-codegen-cli URL (#280, @jinchihe)
- pin kubenertes client version to work around a bug (#262, @jinchihe)
- Added The Pytorch GPU Docker under the appropriate folder (#255, @MATRIX4284)
- Copy third party vendor source code to Docker image (#251, @johnugeorge)
- Add third party license info (#250, @johnugeorge)
- ConvertPyTorchJobToUnstructured uses function ToUnstructured to convert PyTorchJob to Unstructured instead of json (#241, @leileiwan)
- replace gopkg.in/yaml.v2 with github.com/kubernetes-sigs/yaml repo (#238, @xrmzju)
- Update tf operator branch dep (#223, @johnugeorge)
- Avoiding unnecessary status update (#220, @johnugeorge)
- Removing unnecessary rbac permissions (#221, @johnugeorge)
- add mnist example dockerfile for ppc64le (#218, @zheddie)
- Fix nslookup cannot work well in initContainerTemplate (#216, @hougangliu)
- Minor change in log (#213, @johnugeorge)
- Delete v1beta2 code (#212, @johnugeorge)
- Add qps and burst options (#210, @ohmystack)
- Set pytorchjob defaults in test utils (#208, @ohmystack)
- Update codegen and verify in CI (#207, @ohmystack)
- Update manifest to v0.6.0 (#200, @hougangliu)
- Common label changes with K8s upgrade to 1.12.3 (#204, @johnugeorge)
- Use multi-build to build pytorch-operator image (#198, @hmtai)
- add total suffix in counter metrics (#201, @yeya24)
- add kubeconfig flag (#192, @yeya24)
- Remove unnecessary services for worker (#191, @hougangliu)