This guide assumes that the NVIDIA drivers and nvidia-docker2 have been installed.
Enable the Nvidia runtime as your default runtime on your node. To do this, please edit the docker daemon config file which is usually present at /etc/docker/daemon.json:
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
if
runtimes
is not already present, head to the install page of nvidia-docker
cd /etc/kubernetes/
curl -O https://raw.githubusercontent.com/AliyunContainerService/gpushare-scheduler-extender/master/config/scheduler-policy-config.json
cd /tmp/
curl -O https://raw.githubusercontent.com/AliyunContainerService/gpushare-scheduler-extender/master/config/gpushare-schd-extender.yaml
kubectl create -f gpushare-schd-extender.yaml
The goal is to include /etc/kubernetes/scheduler-policy-config.json
into the scheduler configuration.
Here is the sample of the modified kube-scheduler.yaml
- --policy-config-file=/etc/kubernetes/scheduler-policy-config.json
- mountPath: /etc/kubernetes/scheduler-policy-config.json
name: scheduler-policy-config
readOnly: true
- hostPath:
path: /etc/kubernetes/scheduler-policy-config.json
type: FileOrCreate
name: scheduler-policy-config
Notice: If your Kubernetes default scheduler is deployed as static pod, don't edit the yaml file inside /etc/kubernetes/manifest. You need to edit the yaml file outside the
/etc/kubernetes/manifest
directory. and copy the yaml file you edited to the '/etc/kubernetes/manifest/' directory, and then kubernetes will update the default static pod with the yaml file automatically.
wget https://raw.githubusercontent.com/AliyunContainerService/gpushare-device-plugin/master/device-plugin-rbac.yaml
kubectl create -f device-plugin-rbac.yaml
wget https://raw.githubusercontent.com/AliyunContainerService/gpushare-device-plugin/master/device-plugin-ds.yaml
kubectl create -f device-plugin-ds.yaml
Notice: please remove default GPU device plugin, for example, if you are using nvidia-device-plugin, you can run
kubectl delete ds -n kube-system nvidia-device-plugin-daemonset
to delete.
You need to add a label "gpushare=true" to all node where you want to install device plugin because the device plugin is deamonset.
kubectl label node <target_node> gpushare=true
For example:
kubectl label no mynode gpushare=true
You can download and install kubectl
for linux
curl -LO https://storage.googleapis.com/kubernetes-release/release/v1.12.1/bin/linux/amd64/kubectl
chmod +x ./kubectl
sudo mv ./kubectl /usr/bin/kubectl
cd /usr/bin/
wget https://github.com/AliyunContainerService/gpushare-device-plugin/releases/download/v0.3.0/kubectl-inspect-gpushare
chmod u+x /usr/bin/kubectl-inspect-gpushare