Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Azure Terraform config to actually create dask pools #839

Merged
merged 3 commits into from
Nov 19, 2021

Conversation

sgibson91
Copy link
Member

@sgibson91 sgibson91 commented Nov 19, 2021

This PR fixes a conditional bug in our Azure terraform config that was preventing our dask node pools from actually being created, and also edits the node labels and taints to match that in our GKE config and allow dask pods to be scheduled to nodes. I ran this against the Azure CarbonPlan cluster. This should fix the bug I was seeing in #838.

Output of Terraform Plan:

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create
  ~ update in-place

Terraform will perform the following actions:

  # azurerm_kubernetes_cluster.jupyterhub will be updated in-place
  ~ resource "azurerm_kubernetes_cluster" "jupyterhub" {
        id                                  = "/subscriptions/c5e7a734-3dbf-4285-80e5-4c0afb1f65dc/resourcegroups/2i2c-carbonplan-cluster/providers/Microsoft.ContainerService/managedClusters/hub-cluster"
        name                                = "hub-cluster"
        tags                                = {}

        # (17 unchanged attributes hidden)

      ~ default_node_pool {
            name                         = "core"
          ~ node_count                   = 2 -> 1
            tags                         = {}
            # (20 unchanged attributes hidden)
        }

        # (7 unchanged blocks hidden)
    }

  # azurerm_kubernetes_cluster_node_pool.dask_pool["huge"] will be created
  + resource "azurerm_kubernetes_cluster_node_pool" "dask_pool" {
      + enable_auto_scaling   = true
      + id                    = (known after apply)
      + kubelet_disk_type     = (known after apply)
      + kubernetes_cluster_id = "/subscriptions/c5e7a734-3dbf-4285-80e5-4c0afb1f65dc/resourcegroups/2i2c-carbonplan-cluster/providers/Microsoft.ContainerService/managedClusters/hub-cluster"
      + max_count             = 20
      + max_pods              = (known after apply)
      + min_count             = 0
      + mode                  = "User"
      + name                  = "daskhuge"
      + node_count            = (known after apply)
      + node_labels           = {
          + "hub.jupyter.org/node-size" = "Standard_E32s_v4"
          + "k8s.dask.org/node-purpose" = "worker"
        }
      + node_taints           = [
          + "k8s.dask.org_dedicated=worker:NoSchedule",
        ]
      + orchestrator_version  = "1.20.7"
      + os_disk_size_gb       = 200
      + os_disk_type          = "Managed"
      + os_sku                = (known after apply)
      + os_type               = "Linux"
      + priority              = "Regular"
      + spot_max_price        = -1
      + ultra_ssd_enabled     = false
      + vm_size               = "Standard_E32s_v4"
      + vnet_subnet_id        = "/subscriptions/c5e7a734-3dbf-4285-80e5-4c0afb1f65dc/resourceGroups/2i2c-carbonplan-cluster/providers/Microsoft.Network/virtualNetworks/k8s-network/subnets/k8s-nodes-subnet"
    }

  # azurerm_kubernetes_cluster_node_pool.dask_pool["large"] will be created
  + resource "azurerm_kubernetes_cluster_node_pool" "dask_pool" {
      + enable_auto_scaling   = true
      + id                    = (known after apply)
      + kubelet_disk_type     = (known after apply)
      + kubernetes_cluster_id = "/subscriptions/c5e7a734-3dbf-4285-80e5-4c0afb1f65dc/resourcegroups/2i2c-carbonplan-cluster/providers/Microsoft.ContainerService/managedClusters/hub-cluster"
      + max_count             = 20
      + max_pods              = (known after apply)
      + min_count             = 0
      + mode                  = "User"
      + name                  = "dasklarge"
      + node_count            = (known after apply)
      + node_labels           = {
          + "hub.jupyter.org/node-size" = "Standard_E8s_v4"
          + "k8s.dask.org/node-purpose" = "worker"
        }
      + node_taints           = [
          + "k8s.dask.org_dedicated=worker:NoSchedule",
        ]
      + orchestrator_version  = "1.20.7"
      + os_disk_size_gb       = 200
      + os_disk_type          = "Managed"
      + os_sku                = (known after apply)
      + os_type               = "Linux"
      + priority              = "Regular"
      + spot_max_price        = -1
      + ultra_ssd_enabled     = false
      + vm_size               = "Standard_E8s_v4"
      + vnet_subnet_id        = "/subscriptions/c5e7a734-3dbf-4285-80e5-4c0afb1f65dc/resourceGroups/2i2c-carbonplan-cluster/providers/Microsoft.Network/virtualNetworks/k8s-network/subnets/k8s-nodes-subnet"
    }

  # azurerm_kubernetes_cluster_node_pool.dask_pool["medium"] will be created
  + resource "azurerm_kubernetes_cluster_node_pool" "dask_pool" {
      + enable_auto_scaling   = true
      + id                    = (known after apply)
      + kubelet_disk_type     = (known after apply)
      + kubernetes_cluster_id = "/subscriptions/c5e7a734-3dbf-4285-80e5-4c0afb1f65dc/resourcegroups/2i2c-carbonplan-cluster/providers/Microsoft.ContainerService/managedClusters/hub-cluster"
      + max_count             = 20
      + max_pods              = (known after apply)
      + min_count             = 0
      + mode                  = "User"
      + name                  = "daskmedium"
      + node_count            = (known after apply)
      + node_labels           = {
          + "hub.jupyter.org/node-size" = "Standard_E4s_v4"
          + "k8s.dask.org/node-purpose" = "worker"
        }
      + node_taints           = [
          + "k8s.dask.org_dedicated=worker:NoSchedule",
        ]
      + orchestrator_version  = "1.20.7"
      + os_disk_size_gb       = 200
      + os_disk_type          = "Managed"
      + os_sku                = (known after apply)
      + os_type               = "Linux"
      + priority              = "Regular"
      + spot_max_price        = -1
      + ultra_ssd_enabled     = false
      + vm_size               = "Standard_E4s_v4"
      + vnet_subnet_id        = "/subscriptions/c5e7a734-3dbf-4285-80e5-4c0afb1f65dc/resourceGroups/2i2c-carbonplan-cluster/providers/Microsoft.Network/virtualNetworks/k8s-network/subnets/k8s-nodes-subnet"
    }

  # azurerm_kubernetes_cluster_node_pool.dask_pool["small"] will be created
  + resource "azurerm_kubernetes_cluster_node_pool" "dask_pool" {
      + enable_auto_scaling   = true
      + id                    = (known after apply)
      + kubelet_disk_type     = (known after apply)
      + kubernetes_cluster_id = "/subscriptions/c5e7a734-3dbf-4285-80e5-4c0afb1f65dc/resourcegroups/2i2c-carbonplan-cluster/providers/Microsoft.ContainerService/managedClusters/hub-cluster"
      + max_count             = 20
      + max_pods              = (known after apply)
      + min_count             = 0
      + mode                  = "User"
      + name                  = "dasksmall"
      + node_count            = (known after apply)
      + node_labels           = {
          + "hub.jupyter.org/node-size" = "Standard_E2s_v4"
          + "k8s.dask.org/node-purpose" = "worker"
        }
      + node_taints           = [
          + "k8s.dask.org_dedicated=worker:NoSchedule",
        ]
      + orchestrator_version  = "1.20.7"
      + os_disk_size_gb       = 200
      + os_disk_type          = "Managed"
      + os_sku                = (known after apply)
      + os_type               = "Linux"
      + priority              = "Regular"
      + spot_max_price        = -1
      + ultra_ssd_enabled     = false
      + vm_size               = "Standard_E2s_v4"
      + vnet_subnet_id        = "/subscriptions/c5e7a734-3dbf-4285-80e5-4c0afb1f65dc/resourceGroups/2i2c-carbonplan-cluster/providers/Microsoft.Network/virtualNetworks/k8s-network/subnets/k8s-nodes-subnet"
    }

  # azurerm_kubernetes_cluster_node_pool.dask_pool["vhuge"] will be created
  + resource "azurerm_kubernetes_cluster_node_pool" "dask_pool" {
      + enable_auto_scaling   = true
      + id                    = (known after apply)
      + kubelet_disk_type     = (known after apply)
      + kubernetes_cluster_id = "/subscriptions/c5e7a734-3dbf-4285-80e5-4c0afb1f65dc/resourcegroups/2i2c-carbonplan-cluster/providers/Microsoft.ContainerService/managedClusters/hub-cluster"
      + max_count             = 20
      + max_pods              = (known after apply)
      + min_count             = 0
      + mode                  = "User"
      + name                  = "daskvhuge"
      + node_count            = (known after apply)
      + node_labels           = {
          + "hub.jupyter.org/node-size" = "Standard_M64s_v2"
          + "k8s.dask.org/node-purpose" = "worker"
        }
      + node_taints           = [
          + "k8s.dask.org_dedicated=worker:NoSchedule",
        ]
      + orchestrator_version  = "1.20.7"
      + os_disk_size_gb       = 200
      + os_disk_type          = "Managed"
      + os_sku                = (known after apply)
      + os_type               = "Linux"
      + priority              = "Regular"
      + spot_max_price        = -1
      + ultra_ssd_enabled     = false
      + vm_size               = "Standard_M64s_v2"
      + vnet_subnet_id        = "/subscriptions/c5e7a734-3dbf-4285-80e5-4c0afb1f65dc/resourceGroups/2i2c-carbonplan-cluster/providers/Microsoft.Network/virtualNetworks/k8s-network/subnets/k8s-nodes-subnet"
    }

  # azurerm_kubernetes_cluster_node_pool.dask_pool["vvhuge"] will be created
  + resource "azurerm_kubernetes_cluster_node_pool" "dask_pool" {
      + enable_auto_scaling   = true
      + id                    = (known after apply)
      + kubelet_disk_type     = (known after apply)
      + kubernetes_cluster_id = "/subscriptions/c5e7a734-3dbf-4285-80e5-4c0afb1f65dc/resourcegroups/2i2c-carbonplan-cluster/providers/Microsoft.ContainerService/managedClusters/hub-cluster"
      + max_count             = 20
      + max_pods              = (known after apply)
      + min_count             = 0
      + mode                  = "User"
      + name                  = "daskvvhuge"
      + node_count            = (known after apply)
      + node_labels           = {
          + "hub.jupyter.org/node-size" = "Standard_M128s_v2"
          + "k8s.dask.org/node-purpose" = "worker"
        }
      + node_taints           = [
          + "k8s.dask.org_dedicated=worker:NoSchedule",
        ]
      + orchestrator_version  = "1.20.7"
      + os_disk_size_gb       = 200
      + os_disk_type          = "Managed"
      + os_sku                = (known after apply)
      + os_type               = "Linux"
      + priority              = "Regular"
      + spot_max_price        = -1
      + ultra_ssd_enabled     = false
      + vm_size               = "Standard_M128s_v2"
      + vnet_subnet_id        = "/subscriptions/c5e7a734-3dbf-4285-80e5-4c0afb1f65dc/resourceGroups/2i2c-carbonplan-cluster/providers/Microsoft.Network/virtualNetworks/k8s-network/subnets/k8s-nodes-subnet"
    }

Plan: 6 to add, 1 to change, 0 to destroy.

@sgibson91 sgibson91 requested a review from yuvipanda November 19, 2021 14:25
@@ -117,7 +117,7 @@ resource "azurerm_kubernetes_cluster_node_pool" "user_pool" {
resource "azurerm_kubernetes_cluster_node_pool" "dask_pool" {
# If dask_nodes is set, we use that. If it isn't, we use notebook_nodes.
# This lets us set dask_nodes to an empty array to get no dask nodes
for_each = try(var.dask_nodes, var.notebook_nodes)
for_each = length(var.dask_nodes) == 0 ? var.notebook_nodes : var.dask_nodes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I used try instead in #834 is that without that, I can't tell JIL to not have any dask nodes - nil or null weren't accepted as variables values, and the default of {} meant it wasn't possible to figure out a way to not have any nodes :( Maybe if the default for dask_nodes is set to something other than {} might help? Not sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not sure how to fix this. Because without the suggested change I'm struggling to tell carbon plan to create the nodes. Unless we switch to not automatically creating a dask pool from defined notebook pools

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that terraform is not a procedural language, we may be trying to do something that is too complex for it's capabilities

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, how about we:

  1. Remove the one liner setting dask_nodes = {} in the JIL tfvars,
  2. Merge this as is?

This means that when I tf apply JIL later, it'll also create those extra nodepools, but they should be empty and cost nothing. Let's just do that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems ok to me for the time being - I've pushed that change

While this creates nodepools, they should be empty and cost nothing. This
helps to unblock the CarbonPlan deployment.
@yuvipanda
Copy link
Member

@sgibson91 longer term, if you think we should decouple these two lists, am very happy for you to pursue that instead.

@sgibson91
Copy link
Member Author

@yuvipanda I think that might be the only way without doing something really hacky and tough to understand, but I've only had a half hour's worth of thought on it :D

@sgibson91
Copy link
Member Author

Terraform apply complete!

@sgibson91 sgibson91 merged commit 2f49365 into 2i2c-org:master Nov 19, 2021
@sgibson91 sgibson91 deleted the tf-az-create-dask-pools branch November 19, 2021 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants