How to Run nvidia-smi in a K8S Cluster

This shows how to deploy a job to run nvidia-smi on a node in a Kubernetes cluster.

Note

Before you start

This how-to assumes that you have a Kubernetes cluster with at least one node with an NVIDIA GPU. And the NVIDIA GPU Operator should be deployed in the cluster.

Ingredients

A Kubernetes cluster
An accessible node with kubectl and a text editor that can reach the cluster.
A cluster node with an NVIDIA GPU.

Instructions

Use your favorite text editor and a system with kubectl that can access the cluster.

Create a file to describe the job
```
 touch nvidia-smi-job.yaml
```
Open the file in a text editor. For example,
```
nano nvidia-smi-job.yaml
```
Paste the following job description into the file

Make sure to change the metadata.namespace value to the namespace in which you want to launch the job. Similarly, the spec.template.affinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms.matchExpressions.values should be changed to the node name on which you want to run the job.

apiVersion: batch/v1
kind: Job
metadata:
  name: nvidia-smi
  namespace: playground
spec:
  template:
    metadata:
      name: nvidia-smi
    spec:
      containers:
        - name: nvidia-smi
          image: 'nvcr.io/nvidia/cuda:12.1.0-runtime-ubuntu20.04'
          command:
            - nvidia-smi
      restartPolicy: Never
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: kubernetes.io/hostname
                    operator: In
                    values:
                      - us-5501-nb.supermicro.com

Deploy the job to the cluster

kubectl apply -f nvidia-smi.yaml

Examine the logs of the job

After getting the pods with the first command, use the pod's name in the second command to get the logs for the pod created for the job.

kubectl get -n <namespace> po
kubectl logs -n <namespace> <pod name> -f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

How to Run nvidia-smi in a K8S Cluster

Ingredients

Instructions

Files

README.md

Latest commit

History

README.md

File metadata and controls

How to Run nvidia-smi in a K8S Cluster

Ingredients

Instructions