Skip to content

Commit

Permalink
Account for HA clusters with K8s monitoring stack deployment
Browse files Browse the repository at this point in the history
  • Loading branch information
supertetelman committed Mar 23, 2022
1 parent bb86d9d commit 0e1ab1f
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions scripts/k8s/deploy_monitoring.sh
Original file line number Diff line number Diff line change
Expand Up @@ -276,8 +276,8 @@ install_dependencies
setup_prom_monitoring

# Install DCGM-Exporter and setup custom metrics, if needed
# # GPU Device Plugin is installed into kube-system, GPU Operator installs it into gpu-operator-resources
plugin_namespace=$( kubectl get pods -A -l app.kubernetes.io/instance=nvidia-device-plugin --no-headers --no-headers -o custom-columns=NAMESPACE:.metadata.namespace)
# # GPU Device Plugin is installed into kube-system, GPU Operator installs it into gpu-operator-resources, use uniq for HA K8s clusters
plugin_namespace=$( kubectl get pods -A -l app.kubernetes.io/instance=nvidia-device-plugin --no-headers --no-headers -o custom-columns=NAMESPACE:.metadata.namespace | uniq)
if [ "${plugin_namespace}" == "kube-system" ] ; then
# No GPU Operator DCGM-Exporter Stack
setup_gpu_monitoring
Expand Down

0 comments on commit 0e1ab1f

Please sign in to comment.