diff --git a/docs/images/gpu-share-flow.png b/docs/images/gpu-share-flow.png new file mode 100644 index 0000000000..bf54b55e12 Binary files /dev/null and b/docs/images/gpu-share-flow.png differ diff --git a/docs/user-guide/how_to_use_gpu_sharing.md b/docs/user-guide/how_to_use_gpu_sharing.md new file mode 100644 index 0000000000..c7040ca36f --- /dev/null +++ b/docs/user-guide/how_to_use_gpu_sharing.md @@ -0,0 +1,165 @@ +# GPU Sharing User guide + +## Environment setup + +### Install volcano + + +#### 1. Install from source + +Refer to [Install Guide](../../installer/README.md) to install volcano. + +After installed, update the scheduler configuration: + +```shell script +kubectl edit cm -n volcano-system volcano-scheduler-configmap +``` + +```yaml +kind: ConfigMap +apiVersion: v1 +metadata: + name: volcano-scheduler-configmap + namespace: volcano-system +data: + volcano-scheduler.conf: | + actions: "enqueue, allocate, backfill" + tiers: + - plugins: + - name: priority + - name: gang + - name: conformance + - plugins: + - name: drf + - name: predicates + arguments: + predicate.GPUSharingEnable: true # enable gpu sharing + - name: proportion + - name: nodeorder + - name: binpack +``` + +#### 2. Install from release package. + +Same as above, after installed, update the scheduler configuration in `volcano-scheduler-configmap` configmap. + +### Install Volcano device plugin + +Please refer to [volcano device plugin](https://github.com/volcano-sh/devices/blob/master/README.md#quick-start) + +### Verify environment is ready + +Check the node status, it is ok if `volcano.sh/gpu-memory` and `volcano.sh/gpu-number` are included in the allocatable resources. + +```shell script +$ kubectl get node {node name} -oyaml +... +status: + addresses: + - address: 172.17.0.3 + type: InternalIP + - address: volcano-control-plane + type: Hostname + allocatable: + cpu: "4" + ephemeral-storage: 123722704Ki + hugepages-1Gi: "0" + hugepages-2Mi: "0" + memory: 8174332Ki + pods: "110" + volcano.sh/gpu-memory: "89424" + volcano.sh/gpu-number: "8" # GPU resource + capacity: + cpu: "4" + ephemeral-storage: 123722704Ki + hugepages-1Gi: "0" + hugepages-2Mi: "0" + memory: 8174332Ki + pods: "110" + volcano.sh/gpu-memory: "89424" + volcano.sh/gpu-number: "8" # GPU resource +``` + +### Running GPU Sharing Jobs + +NVIDIA GPUs can now be shared via container level resource requirements using the resource name `volcano.sh/gpu-memory`: +```shell script +$ cat <