Paddle Operator makes it easy to run paddle distributed training job on kubernetes by providing PaddleJob custom resource etc.
- Kubernetes >= 1.8
- kubectl
With kubernetes ready, you can install paddle operator with configuration in deploy folder (use deploy/v1 for kubernetes v1.16+ or deploy/v1beta1 for kubernetes 1.15-).
Create PaddleJob crd,
kubectl apply -f https://raw.githubusercontent.com/PaddleFlow/paddle-operator/main/deploy/v1/crd.yaml
A succeed creation leads to result as follows,
kubectl get crd
NAME CREATED AT
paddlejobs.batch.paddlepaddle.org 2021-02-08T07:43:24Z
Then deploy controller,
kubectl apply -f https://raw.githubusercontent.com/PaddleFlow/paddle-operator/main/deploy/v1/operator.yaml
the ready state of controller would be as follow,
kubectl -n paddle-system get pods
NAME READY STATUS RESTARTS AGE
paddle-controller-manager-698dd7b855-n65jr 1/1 Running 0 1m
By default, paddle controller runs in namespace paddle-system and only controls jobs in that namespace.
To run controller in a different namespace or controll jobs in other namespaces, you can edit charts/paddle-operator/values.yaml
and install the helm chart.
You can also edit kustomization files or edit deploy/v1/operator.yaml
directly for that purpose.
Deploy your first paddlejob demo with
kubectl -n paddle-system apply -f https://raw.githubusercontent.com/PaddleFlow/paddle-operator/main/deploy/examples/wide_and_deep.yaml
Check pods status
kubectl -n paddle-system get pods
Check paddle job status
kubectl -n paddle-system get pdj
Enable volcano before installation, add the following args in deploy/v1/operator.yaml
containers:
- args:
- --leader-elect
- --namespace=paddle-system # watch this ns only
- --scheduling=volcano # enable volcano
command:
- /manager
then, job as in deploy/examples/wide_and_deep_volcano.yaml can be handled correctly.
Change the following args in deploy/v1/operator.yaml before deployment,
- args:
- --leader-elect # enable leader election
- --namespace=paddle-system # watch this ns only, set to "" for all namespace
- --scheduling=volcano # enable volcano
- --initImage= # init container image, default to alpine:3.10, empty to disable
command:
- /manager
Simply
kubectl delete -f https://raw.githubusercontent.com/PaddleFlow/paddle-operator/main/deploy/v1/crd.yaml -f https://raw.githubusercontent.com/PaddleFlow/paddle-operator/main/deploy/v1/operator.yaml
More configuration can be found in Makefile, clone this repo and enjoy it. If you have any questions or concerns about the usage, please do not hesitate to contact us.
Please refer to the 中文文档 for more information about paddle configuration.