Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no namespace "nvidia-device-plugin" when attempting to provision GPU nodes #515

Closed
SamuelJenkinsML opened this issue Nov 29, 2022 · 6 comments
Labels
bug Something isn't working

Comments

@SamuelJenkinsML
Copy link

Describe the bug
When modifying local terraform variables to provision GPU nodes, when tf attempts to provision the nvidia-device-plugin, it fails saying that there is no namespace for the plugin. The variables in the terraform code suggest that a namespace should be provisioned, but it's failing.

Steps To Reproduce

Modify the variable "node_instance_type_gpu" to a gpu based node,
terraform init && plan
make deploy

Expected behavior
I expect an autoscaled cluster of GPU nodes to be provisioned by terraform on the EKS cluster.

Environment

  • Kubernetes version - 1.22
  • Using EKS (yes/no), if so version? yes 1.22
  • Kubeflow version - 1.6.1
  • AWS build number
  • AWS service targeted (S3, RDS, etc.) - fails on S3/RDS and S3/RDS/Cognito

Screenshots

╷
│ Error: create: failed to create: namespaces "nvidia-device-plugin" not found
│ 
│   with module.eks_blueprints_kubernetes_addons.module.nvidia_device_plugin[0].module.helm_addon.helm_release.addon[0],
│   on .terraform/modules/eks_blueprints_kubernetes_addons/modules/kubernetes-addons/helm-addon/main.tf line 1, in resource "helm_release" "addon":
│    1: resource "helm_release" "addon" {
│ 
╵
@SamuelJenkinsML SamuelJenkinsML added the bug Something isn't working label Nov 29, 2022
@jbgerth
Copy link

jbgerth commented Nov 29, 2022

I ran into the same issue in EKS 1.23 / KF 1.6.1
As a hotfix I added the namespace creation to the make target deploy-eks-blueprints-k8s-addons before running terraform.

deploy-eks-blueprints-k8s-addons:
	kubectl create ns nvidia-device-plugin --dry-run=client -o yaml | kubectl apply -f -
	terraform apply -target="module.eks_blueprints_kubernetes_addons" -auto-approve

@ghost
Copy link

ghost commented Dec 4, 2022

This does not working

@surajkota
Copy link
Contributor

Hey folks, as a workaround, can you create the namespace as @jbgerth suggested or using terraform namespace module?

Let me know if any of you interested in contributing for the fix? I will try to reproduce and file a PR to fix it by this week

@surajkota
Copy link
Contributor

@chiennh2 can you please provide details on what is not working

@surajkota
Copy link
Contributor

surajkota commented Dec 6, 2022

Looks like we have a PR to fix it, its a known issue - #516, aws-ia/terraform-aws-eks-blueprints#1019

@ryansteakley
Copy link
Contributor

closing resolved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants