Skip to content

Commit

Permalink
Introduce gpu_operator_preinstalled_nvidia_software to manually tell …
Browse files Browse the repository at this point in the history
…GPU Operator to use driver containers
  • Loading branch information
supertetelman committed Feb 11, 2022
1 parent 1075a47 commit 740eaaf
Show file tree
Hide file tree
Showing 4 changed files with 13 additions and 9 deletions.
3 changes: 3 additions & 0 deletions config.example/group_vars/k8s-cluster.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,9 @@ kubelet_flexvolumes_plugins_dir: /usr/libexec/kubernetes/kubelet-plugins/volume/
# Docker configuration.
deepops_gpu_operator_enabled: false

# Install NVIDIA Driver and nvidia-docker on node (true), not as part of GPU Operator (false)
gpu_operator_preinstalled_nvidia_software: false

# Set the MIG labeling and use strategy to none, single, or mixed. See https://github.com/NVIDIA/k8s-device-plugin
k8s_gpu_mig_strategy: "mixed"

Expand Down
4 changes: 2 additions & 2 deletions playbooks/k8s-cluster.yml
Original file line number Diff line number Diff line change
Expand Up @@ -127,13 +127,13 @@
- include: nvidia-software/nvidia-driver.yml
tags:
- nvidia
when: deepops_gpu_operator_enabled | default('false') | bool == false
when: deepops_gpu_operator_enabled | default('false') | bool == false or gpu_operator_preinstalled_nvidia_software

# Install NVIDIA container runtime on GPU servers
- include: container/nvidia-docker.yml
tags:
- nvidia
when: deepops_gpu_operator_enabled | default('false') | bool == false
when: deepops_gpu_operator_enabled | default('false') | bool == false or gpu_operator_preinstalled_nvidia_software

# Install k8s GPU feature discovery
- include: k8s-cluster/nvidia-k8s-gpu-feature-discovery.yml
Expand Down
6 changes: 6 additions & 0 deletions roles/nvidia-gpu-operator/defaults/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,12 @@ gpu_operator_enable_toolkit: true
gpu_operator_enable_dcgm: true
gpu_operator_enable_migmanager: true

# Set to true fo DGX and other systems with pre-installed drivers
# In an upcoming GPU Operator release, components can be enabled/disabled per-node
# In the current version, the cluster is expected to be more homogeneous
# TODO: Remove this flag and make detection dynamic per-node in this new release
gpu_operator_preinstalled_nvidia_software: false

# Configuration customization
gpu_operator_namespace: "gpu-operator-resources"
gpu_operator_grid_config_dir: "{{ deepops_dir }}/gpu_operator"
Expand Down
9 changes: 2 additions & 7 deletions roles/nvidia-gpu-operator/tasks/main.yml
Original file line number Diff line number Diff line change
@@ -1,14 +1,9 @@
---
- name: Check for DGX packages
stat:
path: /etc/dgx-release
register: is_dgx

- name: Set DGX specific GPU Operator flags
- name: Set GPU Operator flags for systems with preinstalled NVIDIA software (DGX, etc).
set_fact:
gpu_operator_enable_driver: false
gpu_operator_enable_toolkit: false
when: is_dgx.stat.exists == False
when: gpu_operator_preinstalled_nvidia_software

- include: k8s.yml
when: not gpu_operator_nvaie_enable
Expand Down

0 comments on commit 740eaaf

Please sign in to comment.