Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tweaking NFS settings and enabling for UToronto and CarbonPlan on Azure #887

Merged
merged 19 commits into from
Dec 10, 2021

Conversation

sgibson91
Copy link
Member

@sgibson91 sgibson91 commented Dec 9, 2021

Summary

This PR is doing a few things.

In terraform:

For CarbonPlan on Azure:

  • Enables NFS protocol on the Fileshare
  • Renames some files because I regret my choice of carbonplan-azure. I have gone with azure.carbonplan.* instead, to match URLs.
  • Updates the kubeconfig secret

For UToronto:

  • Enables NFS on the Azure Fileshare
  • Updates the hub config to mount the NFS and correctly use the 1000 guid

Note

After applying the terraform fix @yuvipanda applied in sgibson91#94, I'm worried that while we have granted the k8s cluster access to the NFS server, we may have also blocked our terraform output as I could not run a successful terraform plan after making this upgrade. See #887 (comment)

NFS is only availabe on Storage Accounts with kind=FileStorage and
tier=Premium
@sgibson91 sgibson91 changed the title Redeploy-carbonplan-azure Tweaking NFS settings and enabling for CarbonPlan on Azure Dec 9, 2021
@sgibson91
Copy link
Member Author

Output of terraform plan
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # azurerm_container_registry.container_registry will be created
  + resource "azurerm_container_registry" "container_registry" {
      + admin_enabled                 = true
      + admin_password                = (sensitive value)
      + admin_username                = (known after apply)
      + encryption                    = (known after apply)
      + georeplication_locations      = (known after apply)
      + georeplications               = (known after apply)
      + id                            = (known after apply)
      + location                      = "westus2"
      + login_server                  = (known after apply)
      + name                          = "2i2ccarbonplanhubregistry"
      + network_rule_bypass_option    = "AzureServices"
      + network_rule_set              = (known after apply)
      + public_network_access_enabled = true
      + resource_group_name           = "2i2c-carbonplan-cluster"
      + retention_policy              = (known after apply)
      + sku                           = "premium"
      + storage_account_id            = (known after apply)
      + trust_policy                  = (known after apply)
      + zone_redundancy_enabled       = false

      + identity {
          + identity_ids = (known after apply)
          + principal_id = (known after apply)
          + tenant_id    = (known after apply)
          + type         = (known after apply)
        }
    }

  # azurerm_kubernetes_cluster.jupyterhub will be created
  + resource "azurerm_kubernetes_cluster" "jupyterhub" {
      + dns_prefix                          = "k8s"
      + fqdn                                = (known after apply)
      + id                                  = (known after apply)
      + kube_admin_config                   = (known after apply)
      + kube_admin_config_raw               = (sensitive value)
      + kube_config                         = (known after apply)
      + kube_config_raw                     = (sensitive value)
      + kubernetes_version                  = "1.20.7"
      + location                            = "westus2"
      + name                                = "hub-cluster"
      + node_resource_group                 = (known after apply)
      + portal_fqdn                         = (known after apply)
      + private_cluster_enabled             = (known after apply)
      + private_cluster_public_fqdn_enabled = false
      + private_dns_zone_id                 = (known after apply)
      + private_fqdn                        = (known after apply)
      + private_link_enabled                = (known after apply)
      + resource_group_name                 = "2i2c-carbonplan-cluster"
      + sku_tier                            = "Free"

      + addon_profile {
          + aci_connector_linux {
              + enabled     = (known after apply)
              + subnet_name = (known after apply)
            }

          + azure_policy {
              + enabled = (known after apply)
            }

          + http_application_routing {
              + enabled                            = (known after apply)
              + http_application_routing_zone_name = (known after apply)
            }

          + ingress_application_gateway {
              + effective_gateway_id                 = (known after apply)
              + enabled                              = (known after apply)
              + gateway_id                           = (known after apply)
              + gateway_name                         = (known after apply)
              + ingress_application_gateway_identity = (known after apply)
              + subnet_cidr                          = (known after apply)
              + subnet_id                            = (known after apply)
            }

          + kube_dashboard {
              + enabled = (known after apply)
            }

          + oms_agent {
              + enabled                    = (known after apply)
              + log_analytics_workspace_id = (known after apply)
              + oms_agent_identity         = (known after apply)
            }

          + open_service_mesh {
              + enabled = (known after apply)
            }
        }

      + auto_scaler_profile {
          + balance_similar_node_groups      = false
          + empty_bulk_delete_max            = (known after apply)
          + expander                         = (known after apply)
          + max_graceful_termination_sec     = (known after apply)
          + max_node_provisioning_time       = "15m"
          + max_unready_nodes                = 3
          + max_unready_percentage           = 45
          + new_pod_scale_up_delay           = (known after apply)
          + scale_down_delay_after_add       = (known after apply)
          + scale_down_delay_after_delete    = (known after apply)
          + scale_down_delay_after_failure   = (known after apply)
          + scale_down_unneeded              = (known after apply)
          + scale_down_unready               = (known after apply)
          + scale_down_utilization_threshold = (known after apply)
          + scan_interval                    = (known after apply)
          + skip_nodes_with_local_storage    = true
          + skip_nodes_with_system_pods      = true
        }

      + default_node_pool {
          + enable_auto_scaling  = true
          + kubelet_disk_type    = (known after apply)
          + max_count            = 10
          + max_pods             = (known after apply)
          + min_count            = 1
          + name                 = "core"
          + node_count           = 1
          + node_labels          = {
              + "hub.jupyter.org/node-purpose" = "core"
              + "k8s.dask.org/node-purpose"    = "core"
            }
          + orchestrator_version = "1.20.7"
          + os_disk_size_gb      = 40
          + os_disk_type         = "Managed"
          + os_sku               = (known after apply)
          + type                 = "VirtualMachineScaleSets"
          + ultra_ssd_enabled    = false
          + vm_size              = "Standard_E4s_v3"
          + vnet_subnet_id       = (known after apply)
        }

      + identity {
          + principal_id = (known after apply)
          + tenant_id    = (known after apply)
          + type         = "SystemAssigned"
        }

      + kubelet_identity {
          + client_id                 = (known after apply)
          + object_id                 = (known after apply)
          + user_assigned_identity_id = (known after apply)
        }

      + linux_profile {
          + admin_username = "hub-admin"

          + ssh_key {
              + key_data = "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQCrtD6l/2S2dDT1OFLIUa46KAuzlUkckneQ4oyK5c2izT0nlBOxOcWACvnJ4rqntZVuhML7GYS35tQQnXyEXctTS2LnB7illl+oCT63WQzhhutfPLZSW3gJ+/CFVVp0VdHq4t1O8CQO+EgeV1cvuBre/neNZo5tcqXEsoV7tw/qw8XmD7Dk3yPz4NEFYZyZ4xQM7Qn2j9krZlRO2wCrosMFLo5Rv108PSDTHn3VtXEpJwemUO93D0ipJvCDE62M3vJxamqP4k5HLVn3Jun5KEvA7fINvgP3NSXkPVEEE4Prcqc1IJIzSxzygEN7xkZYsUEfJkybGXUWtUXdNjOEfLLlhYToMhdcP4jZluAjTtQmnJTr1sLegH+kbGIvI8MFIJAgXXLXS8/ubXHMozb+4gXRhyT8SMPwmgXGA+FtDduy5gnpPMSgmozQY17av5r25T6viWX5ivtL+n/CNJGf4npM1vKWcUNKiGyJrwzxQdpwJZ2uMP3EIdCYxOvqCQ+7UEM= sgibson@Athena.broadband"
            }
        }

      + network_profile {
          + dns_service_ip     = (known after apply)
          + docker_bridge_cidr = (known after apply)
          + load_balancer_sku  = "standard"
          + network_mode       = (known after apply)
          + network_plugin     = "kubenet"
          + network_policy     = "calico"
          + outbound_type      = "loadBalancer"
          + pod_cidr           = (known after apply)
          + service_cidr       = (known after apply)

          + load_balancer_profile {
              + effective_outbound_ips    = (known after apply)
              + idle_timeout_in_minutes   = (known after apply)
              + managed_outbound_ip_count = (known after apply)
              + outbound_ip_address_ids   = (known after apply)
              + outbound_ip_prefix_ids    = (known after apply)
              + outbound_ports_allocated  = (known after apply)
            }

          + nat_gateway_profile {
              + effective_outbound_ips    = (known after apply)
              + idle_timeout_in_minutes   = (known after apply)
              + managed_outbound_ip_count = (known after apply)
            }
        }

      + role_based_access_control {
          + enabled = (known after apply)

          + azure_active_directory {
              + admin_group_object_ids = (known after apply)
              + azure_rbac_enabled     = (known after apply)
              + client_app_id          = (known after apply)
              + managed                = (known after apply)
              + server_app_id          = (known after apply)
              + server_app_secret      = (sensitive value)
              + tenant_id              = (known after apply)
            }
        }

      + windows_profile {
          + admin_password = (sensitive value)
          + admin_username = (known after apply)
          + license        = (known after apply)
        }
    }

  # azurerm_kubernetes_cluster_node_pool.dask_pool["huge"] will be created
  + resource "azurerm_kubernetes_cluster_node_pool" "dask_pool" {
      + enable_auto_scaling   = true
      + eviction_policy       = (known after apply)
      + id                    = (known after apply)
      + kubelet_disk_type     = (known after apply)
      + kubernetes_cluster_id = (known after apply)
      + max_count             = 20
      + max_pods              = (known after apply)
      + min_count             = 0
      + mode                  = "User"
      + name                  = "daskhuge"
      + node_count            = (known after apply)
      + node_labels           = {
          + "hub.jupyter.org/node-size" = "Standard_E32s_v4"
          + "k8s.dask.org/node-purpose" = "worker"
        }
      + node_taints           = [
          + "k8s.dask.org_dedicated=worker:NoSchedule",
        ]
      + orchestrator_version  = "1.20.7"
      + os_disk_size_gb       = 200
      + os_disk_type          = "Managed"
      + os_sku                = (known after apply)
      + os_type               = "Linux"
      + priority              = "Regular"
      + spot_max_price        = -1
      + ultra_ssd_enabled     = false
      + vm_size               = "Standard_E32s_v4"
      + vnet_subnet_id        = (known after apply)
    }

  # azurerm_kubernetes_cluster_node_pool.dask_pool["large"] will be created
  + resource "azurerm_kubernetes_cluster_node_pool" "dask_pool" {
      + enable_auto_scaling   = true
      + eviction_policy       = (known after apply)
      + id                    = (known after apply)
      + kubelet_disk_type     = (known after apply)
      + kubernetes_cluster_id = (known after apply)
      + max_count             = 20
      + max_pods              = (known after apply)
      + min_count             = 0
      + mode                  = "User"
      + name                  = "dasklarge"
      + node_count            = (known after apply)
      + node_labels           = {
          + "hub.jupyter.org/node-size" = "Standard_E8s_v4"
          + "k8s.dask.org/node-purpose" = "worker"
        }
      + node_taints           = [
          + "k8s.dask.org_dedicated=worker:NoSchedule",
        ]
      + orchestrator_version  = "1.20.7"
      + os_disk_size_gb       = 200
      + os_disk_type          = "Managed"
      + os_sku                = (known after apply)
      + os_type               = "Linux"
      + priority              = "Regular"
      + spot_max_price        = -1
      + ultra_ssd_enabled     = false
      + vm_size               = "Standard_E8s_v4"
      + vnet_subnet_id        = (known after apply)
    }

  # azurerm_kubernetes_cluster_node_pool.dask_pool["medium"] will be created
  + resource "azurerm_kubernetes_cluster_node_pool" "dask_pool" {
      + enable_auto_scaling   = true
      + eviction_policy       = (known after apply)
      + id                    = (known after apply)
      + kubelet_disk_type     = (known after apply)
      + kubernetes_cluster_id = (known after apply)
      + max_count             = 20
      + max_pods              = (known after apply)
      + min_count             = 0
      + mode                  = "User"
      + name                  = "daskmedium"
      + node_count            = (known after apply)
      + node_labels           = {
          + "hub.jupyter.org/node-size" = "Standard_E4s_v4"
          + "k8s.dask.org/node-purpose" = "worker"
        }
      + node_taints           = [
          + "k8s.dask.org_dedicated=worker:NoSchedule",
        ]
      + orchestrator_version  = "1.20.7"
      + os_disk_size_gb       = 200
      + os_disk_type          = "Managed"
      + os_sku                = (known after apply)
      + os_type               = "Linux"
      + priority              = "Regular"
      + spot_max_price        = -1
      + ultra_ssd_enabled     = false
      + vm_size               = "Standard_E4s_v4"
      + vnet_subnet_id        = (known after apply)
    }

  # azurerm_kubernetes_cluster_node_pool.dask_pool["small"] will be created
  + resource "azurerm_kubernetes_cluster_node_pool" "dask_pool" {
      + enable_auto_scaling   = true
      + eviction_policy       = (known after apply)
      + id                    = (known after apply)
      + kubelet_disk_type     = (known after apply)
      + kubernetes_cluster_id = (known after apply)
      + max_count             = 20
      + max_pods              = (known after apply)
      + min_count             = 0
      + mode                  = "User"
      + name                  = "dasksmall"
      + node_count            = (known after apply)
      + node_labels           = {
          + "hub.jupyter.org/node-size" = "Standard_E2s_v4"
          + "k8s.dask.org/node-purpose" = "worker"
        }
      + node_taints           = [
          + "k8s.dask.org_dedicated=worker:NoSchedule",
        ]
      + orchestrator_version  = "1.20.7"
      + os_disk_size_gb       = 200
      + os_disk_type          = "Managed"
      + os_sku                = (known after apply)
      + os_type               = "Linux"
      + priority              = "Regular"
      + spot_max_price        = -1
      + ultra_ssd_enabled     = false
      + vm_size               = "Standard_E2s_v4"
      + vnet_subnet_id        = (known after apply)
    }

  # azurerm_kubernetes_cluster_node_pool.dask_pool["vhuge"] will be created
  + resource "azurerm_kubernetes_cluster_node_pool" "dask_pool" {
      + enable_auto_scaling   = true
      + eviction_policy       = (known after apply)
      + id                    = (known after apply)
      + kubelet_disk_type     = (known after apply)
      + kubernetes_cluster_id = (known after apply)
      + max_count             = 20
      + max_pods              = (known after apply)
      + min_count             = 0
      + mode                  = "User"
      + name                  = "daskvhuge"
      + node_count            = (known after apply)
      + node_labels           = {
          + "hub.jupyter.org/node-size" = "Standard_M64s_v2"
          + "k8s.dask.org/node-purpose" = "worker"
        }
      + node_taints           = [
          + "k8s.dask.org_dedicated=worker:NoSchedule",
        ]
      + orchestrator_version  = "1.20.7"
      + os_disk_size_gb       = 200
      + os_disk_type          = "Managed"
      + os_sku                = (known after apply)
      + os_type               = "Linux"
      + priority              = "Regular"
      + spot_max_price        = -1
      + ultra_ssd_enabled     = false
      + vm_size               = "Standard_M64s_v2"
      + vnet_subnet_id        = (known after apply)
    }

  # azurerm_kubernetes_cluster_node_pool.dask_pool["vvhuge"] will be created
  + resource "azurerm_kubernetes_cluster_node_pool" "dask_pool" {
      + enable_auto_scaling   = true
      + eviction_policy       = (known after apply)
      + id                    = (known after apply)
      + kubelet_disk_type     = (known after apply)
      + kubernetes_cluster_id = (known after apply)
      + max_count             = 20
      + max_pods              = (known after apply)
      + min_count             = 0
      + mode                  = "User"
      + name                  = "daskvvhuge"
      + node_count            = (known after apply)
      + node_labels           = {
          + "hub.jupyter.org/node-size" = "Standard_M128s_v2"
          + "k8s.dask.org/node-purpose" = "worker"
        }
      + node_taints           = [
          + "k8s.dask.org_dedicated=worker:NoSchedule",
        ]
      + orchestrator_version  = "1.20.7"
      + os_disk_size_gb       = 200
      + os_disk_type          = "Managed"
      + os_sku                = (known after apply)
      + os_type               = "Linux"
      + priority              = "Regular"
      + spot_max_price        = -1
      + ultra_ssd_enabled     = false
      + vm_size               = "Standard_M128s_v2"
      + vnet_subnet_id        = (known after apply)
    }

  # azurerm_kubernetes_cluster_node_pool.user_pool["huge"] will be created
  + resource "azurerm_kubernetes_cluster_node_pool" "user_pool" {
      + enable_auto_scaling   = true
      + eviction_policy       = (known after apply)
      + id                    = (known after apply)
      + kubelet_disk_type     = (known after apply)
      + kubernetes_cluster_id = (known after apply)
      + max_count             = 20
      + max_pods              = (known after apply)
      + min_count             = 0
      + mode                  = "User"
      + name                  = "nbhuge"
      + node_count            = (known after apply)
      + node_labels           = {
          + "hub.jupyter.org/node-purpose" = "user"
          + "hub.jupyter.org/node-size"    = "Standard_E32s_v4"
          + "k8s.dask.org/node-purpose"    = "scheduler"
        }
      + node_taints           = [
          + "hub.jupyter.org_dedicated=user:NoSchedule",
        ]
      + orchestrator_version  = "1.20.7"
      + os_disk_size_gb       = 200
      + os_disk_type          = "Managed"
      + os_sku                = (known after apply)
      + os_type               = "Linux"
      + priority              = "Regular"
      + spot_max_price        = -1
      + ultra_ssd_enabled     = false
      + vm_size               = "Standard_E32s_v4"
      + vnet_subnet_id        = (known after apply)
    }

  # azurerm_kubernetes_cluster_node_pool.user_pool["large"] will be created
  + resource "azurerm_kubernetes_cluster_node_pool" "user_pool" {
      + enable_auto_scaling   = true
      + eviction_policy       = (known after apply)
      + id                    = (known after apply)
      + kubelet_disk_type     = (known after apply)
      + kubernetes_cluster_id = (known after apply)
      + max_count             = 20
      + max_pods              = (known after apply)
      + min_count             = 0
      + mode                  = "User"
      + name                  = "nblarge"
      + node_count            = (known after apply)
      + node_labels           = {
          + "hub.jupyter.org/node-purpose" = "user"
          + "hub.jupyter.org/node-size"    = "Standard_E8s_v4"
          + "k8s.dask.org/node-purpose"    = "scheduler"
        }
      + node_taints           = [
          + "hub.jupyter.org_dedicated=user:NoSchedule",
        ]
      + orchestrator_version  = "1.20.7"
      + os_disk_size_gb       = 200
      + os_disk_type          = "Managed"
      + os_sku                = (known after apply)
      + os_type               = "Linux"
      + priority              = "Regular"
      + spot_max_price        = -1
      + ultra_ssd_enabled     = false
      + vm_size               = "Standard_E8s_v4"
      + vnet_subnet_id        = (known after apply)
    }

  # azurerm_kubernetes_cluster_node_pool.user_pool["medium"] will be created
  + resource "azurerm_kubernetes_cluster_node_pool" "user_pool" {
      + enable_auto_scaling   = true
      + eviction_policy       = (known after apply)
      + id                    = (known after apply)
      + kubelet_disk_type     = (known after apply)
      + kubernetes_cluster_id = (known after apply)
      + max_count             = 20
      + max_pods              = (known after apply)
      + min_count             = 0
      + mode                  = "User"
      + name                  = "nbmedium"
      + node_count            = (known after apply)
      + node_labels           = {
          + "hub.jupyter.org/node-purpose" = "user"
          + "hub.jupyter.org/node-size"    = "Standard_E4s_v4"
          + "k8s.dask.org/node-purpose"    = "scheduler"
        }
      + node_taints           = [
          + "hub.jupyter.org_dedicated=user:NoSchedule",
        ]
      + orchestrator_version  = "1.20.7"
      + os_disk_size_gb       = 200
      + os_disk_type          = "Managed"
      + os_sku                = (known after apply)
      + os_type               = "Linux"
      + priority              = "Regular"
      + spot_max_price        = -1
      + ultra_ssd_enabled     = false
      + vm_size               = "Standard_E4s_v4"
      + vnet_subnet_id        = (known after apply)
    }

  # azurerm_kubernetes_cluster_node_pool.user_pool["small"] will be created
  + resource "azurerm_kubernetes_cluster_node_pool" "user_pool" {
      + enable_auto_scaling   = true
      + eviction_policy       = (known after apply)
      + id                    = (known after apply)
      + kubelet_disk_type     = (known after apply)
      + kubernetes_cluster_id = (known after apply)
      + max_count             = 20
      + max_pods              = (known after apply)
      + min_count             = 0
      + mode                  = "User"
      + name                  = "nbsmall"
      + node_count            = (known after apply)
      + node_labels           = {
          + "hub.jupyter.org/node-purpose" = "user"
          + "hub.jupyter.org/node-size"    = "Standard_E2s_v4"
          + "k8s.dask.org/node-purpose"    = "scheduler"
        }
      + node_taints           = [
          + "hub.jupyter.org_dedicated=user:NoSchedule",
        ]
      + orchestrator_version  = "1.20.7"
      + os_disk_size_gb       = 200
      + os_disk_type          = "Managed"
      + os_sku                = (known after apply)
      + os_type               = "Linux"
      + priority              = "Regular"
      + spot_max_price        = -1
      + ultra_ssd_enabled     = false
      + vm_size               = "Standard_E2s_v4"
      + vnet_subnet_id        = (known after apply)
    }

  # azurerm_kubernetes_cluster_node_pool.user_pool["vhuge"] will be created
  + resource "azurerm_kubernetes_cluster_node_pool" "user_pool" {
      + enable_auto_scaling   = true
      + eviction_policy       = (known after apply)
      + id                    = (known after apply)
      + kubelet_disk_type     = (known after apply)
      + kubernetes_cluster_id = (known after apply)
      + max_count             = 20
      + max_pods              = (known after apply)
      + min_count             = 0
      + mode                  = "User"
      + name                  = "nbvhuge"
      + node_count            = (known after apply)
      + node_labels           = {
          + "hub.jupyter.org/node-purpose" = "user"
          + "hub.jupyter.org/node-size"    = "Standard_M64s_v2"
          + "k8s.dask.org/node-purpose"    = "scheduler"
        }
      + node_taints           = [
          + "hub.jupyter.org_dedicated=user:NoSchedule",
        ]
      + orchestrator_version  = "1.20.7"
      + os_disk_size_gb       = 200
      + os_disk_type          = "Managed"
      + os_sku                = (known after apply)
      + os_type               = "Linux"
      + priority              = "Regular"
      + spot_max_price        = -1
      + ultra_ssd_enabled     = false
      + vm_size               = "Standard_M64s_v2"
      + vnet_subnet_id        = (known after apply)
    }

  # azurerm_kubernetes_cluster_node_pool.user_pool["vvhuge"] will be created
  + resource "azurerm_kubernetes_cluster_node_pool" "user_pool" {
      + enable_auto_scaling   = true
      + eviction_policy       = (known after apply)
      + id                    = (known after apply)
      + kubelet_disk_type     = (known after apply)
      + kubernetes_cluster_id = (known after apply)
      + max_count             = 20
      + max_pods              = (known after apply)
      + min_count             = 0
      + mode                  = "User"
      + name                  = "nbvvhuge"
      + node_count            = (known after apply)
      + node_labels           = {
          + "hub.jupyter.org/node-purpose" = "user"
          + "hub.jupyter.org/node-size"    = "Standard_M128s_v2"
          + "k8s.dask.org/node-purpose"    = "scheduler"
        }
      + node_taints           = [
          + "hub.jupyter.org_dedicated=user:NoSchedule",
        ]
      + orchestrator_version  = "1.20.7"
      + os_disk_size_gb       = 200
      + os_disk_type          = "Managed"
      + os_sku                = (known after apply)
      + os_type               = "Linux"
      + priority              = "Regular"
      + spot_max_price        = -1
      + ultra_ssd_enabled     = false
      + vm_size               = "Standard_M128s_v2"
      + vnet_subnet_id        = (known after apply)
    }

  # azurerm_resource_group.jupyterhub will be created
  + resource "azurerm_resource_group" "jupyterhub" {
      + id       = (known after apply)
      + location = "westus2"
      + name     = "2i2c-carbonplan-cluster"
    }

  # azurerm_storage_account.homes will be created
  + resource "azurerm_storage_account" "homes" {
      + access_tier                      = (known after apply)
      + account_kind                     = "FileStorage"
      + account_replication_type         = "LRS"
      + account_tier                     = "Premium"
      + allow_blob_public_access         = false
      + enable_https_traffic_only        = true
      + id                               = (known after apply)
      + is_hns_enabled                   = false
      + large_file_share_enabled         = (known after apply)
      + location                         = "westus2"
      + min_tls_version                  = "TLS1_0"
      + name                             = "2i2ccarbonplanhubstorage"
      + nfsv3_enabled                    = false
      + primary_access_key               = (sensitive value)
      + primary_blob_connection_string   = (sensitive value)
      + primary_blob_endpoint            = (known after apply)
      + primary_blob_host                = (known after apply)
      + primary_connection_string        = (sensitive value)
      + primary_dfs_endpoint             = (known after apply)
      + primary_dfs_host                 = (known after apply)
      + primary_file_endpoint            = (known after apply)
      + primary_file_host                = (known after apply)
      + primary_location                 = (known after apply)
      + primary_queue_endpoint           = (known after apply)
      + primary_queue_host               = (known after apply)
      + primary_table_endpoint           = (known after apply)
      + primary_table_host               = (known after apply)
      + primary_web_endpoint             = (known after apply)
      + primary_web_host                 = (known after apply)
      + queue_encryption_key_type        = "Service"
      + resource_group_name              = "2i2c-carbonplan-cluster"
      + secondary_access_key             = (sensitive value)
      + secondary_blob_connection_string = (sensitive value)
      + secondary_blob_endpoint          = (known after apply)
      + secondary_blob_host              = (known after apply)
      + secondary_connection_string      = (sensitive value)
      + secondary_dfs_endpoint           = (known after apply)
      + secondary_dfs_host               = (known after apply)
      + secondary_file_endpoint          = (known after apply)
      + secondary_file_host              = (known after apply)
      + secondary_location               = (known after apply)
      + secondary_queue_endpoint         = (known after apply)
      + secondary_queue_host             = (known after apply)
      + secondary_table_endpoint         = (known after apply)
      + secondary_table_host             = (known after apply)
      + secondary_web_endpoint           = (known after apply)
      + secondary_web_host               = (known after apply)
      + shared_access_key_enabled        = true
      + table_encryption_key_type        = "Service"

      + blob_properties {
          + change_feed_enabled      = (known after apply)
          + default_service_version  = (known after apply)
          + last_access_time_enabled = (known after apply)
          + versioning_enabled       = (known after apply)

          + container_delete_retention_policy {
              + days = (known after apply)
            }

          + cors_rule {
              + allowed_headers    = (known after apply)
              + allowed_methods    = (known after apply)
              + allowed_origins    = (known after apply)
              + exposed_headers    = (known after apply)
              + max_age_in_seconds = (known after apply)
            }

          + delete_retention_policy {
              + days = (known after apply)
            }
        }

      + identity {
          + identity_ids = (known after apply)
          + principal_id = (known after apply)
          + tenant_id    = (known after apply)
          + type         = (known after apply)
        }

      + network_rules {
          + bypass                     = (known after apply)
          + default_action             = (known after apply)
          + ip_rules                   = (known after apply)
          + virtual_network_subnet_ids = (known after apply)

          + private_link_access {
              + endpoint_resource_id = (known after apply)
              + endpoint_tenant_id   = (known after apply)
            }
        }

      + queue_properties {
          + cors_rule {
              + allowed_headers    = (known after apply)
              + allowed_methods    = (known after apply)
              + allowed_origins    = (known after apply)
              + exposed_headers    = (known after apply)
              + max_age_in_seconds = (known after apply)
            }

          + hour_metrics {
              + enabled               = (known after apply)
              + include_apis          = (known after apply)
              + retention_policy_days = (known after apply)
              + version               = (known after apply)
            }

          + logging {
              + delete                = (known after apply)
              + read                  = (known after apply)
              + retention_policy_days = (known after apply)
              + version               = (known after apply)
              + write                 = (known after apply)
            }

          + minute_metrics {
              + enabled               = (known after apply)
              + include_apis          = (known after apply)
              + retention_policy_days = (known after apply)
              + version               = (known after apply)
            }
        }

      + routing {
          + choice                      = (known after apply)
          + publish_internet_endpoints  = (known after apply)
          + publish_microsoft_endpoints = (known after apply)
        }

      + share_properties {
          + cors_rule {
              + allowed_headers    = (known after apply)
              + allowed_methods    = (known after apply)
              + allowed_origins    = (known after apply)
              + exposed_headers    = (known after apply)
              + max_age_in_seconds = (known after apply)
            }

          + retention_policy {
              + days = (known after apply)
            }

          + smb {
              + authentication_types            = (known after apply)
              + channel_encryption_type         = (known after apply)
              + kerberos_ticket_encryption_type = (known after apply)
              + versions                        = (known after apply)
            }
        }
    }

  # azurerm_storage_share.homes will be created
  + resource "azurerm_storage_share" "homes" {
      + enabled_protocol     = "NFS"
      + id                   = (known after apply)
      + metadata             = (known after apply)
      + name                 = "homes"
      + quota                = 100
      + resource_manager_id  = (known after apply)
      + storage_account_name = "2i2ccarbonplanhubstorage"
      + url                  = (known after apply)
    }

  # azurerm_subnet.node_subnet will be created
  + resource "azurerm_subnet" "node_subnet" {
      + address_prefix                                 = (known after apply)
      + address_prefixes                               = [
          + "10.1.0.0/16",
        ]
      + enforce_private_link_endpoint_network_policies = false
      + enforce_private_link_service_network_policies  = false
      + id                                             = (known after apply)
      + name                                           = "k8s-nodes-subnet"
      + resource_group_name                            = "2i2c-carbonplan-cluster"
      + virtual_network_name                           = "k8s-network"
    }

  # azurerm_virtual_network.jupyterhub will be created
  + resource "azurerm_virtual_network" "jupyterhub" {
      + address_space         = [
          + "10.0.0.0/8",
        ]
      + dns_servers           = (known after apply)
      + guid                  = (known after apply)
      + id                    = (known after apply)
      + location              = "westus2"
      + name                  = "k8s-network"
      + resource_group_name   = "2i2c-carbonplan-cluster"
      + subnet                = (known after apply)
      + vm_protection_enabled = false
    }

  # kubernetes_namespace.homes will be created
  + resource "kubernetes_namespace" "homes" {
      + id = (known after apply)

      + metadata {
          + generation       = (known after apply)
          + name             = "azure-file"
          + resource_version = (known after apply)
          + uid              = (known after apply)
        }
    }

  # kubernetes_secret.homes will be created
  + resource "kubernetes_secret" "homes" {
      + data = (sensitive value)
      + id   = (known after apply)
      + type = "Opaque"

      + metadata {
          + generation       = (known after apply)
          + name             = "access-credentials"
          + namespace        = "azure-file"
          + resource_version = (known after apply)
          + uid              = (known after apply)
        }
    }

Plan: 21 to add, 0 to change, 0 to destroy.

Changes to Outputs:
  + kubeconfig               = (sensitive value)
  + registry_creds_config    = (sensitive value)
  + service_principal_config = (sensitive value)

I'm going to go ahead and apply this straight away so that I can deploy a hub and test if the NFS protocol resolves #871

@sgibson91
Copy link
Member Author

Support chart successfully deployed

@sgibson91
Copy link
Member Author

I'm now trying to deploy a staging hub and figuring out the mount args.


I'm pretty sure in the first attempt, I did not get the value of serverIP right:

Events:
  Type     Reason       Age               From               Message
  ----     ------       ----              ----               -------
  Normal   Scheduled    5m11s             default-scheduler  Successfully assigned staging/nfs-share-creator-tmlzz to aks-core-34239724-vmss000000
  Warning  FailedMount  3m8s              kubelet            Unable to attach or mount volumes: unmounted volumes=[home-base], unattached volumes=[default-token-gs7xg home-base]: timed out waiting for the condition
  Warning  FailedMount  52s               kubelet            Unable to attach or mount volumes: unmounted volumes=[home-base], unattached volumes=[home-base default-token-gs7xg]: timed out waiting for the condition
  Warning  FailedMount  48s (x2 over 3m)  kubelet            MountVolume.SetUp failed for volume "home-base" : mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t nfs 2i2ccarbonplanhubstorage.file.core.windows.net:/homes/ /var/lib/kubelet/pods/e2bd3d46-647e-41cf-95de-a09be1281e33/volumes/kubernetes.io~nfs/home-base
Output: mount.nfs: Connection timed out

I don't think the colon between ...windows.net and /homes/ should be there judging by the Fileshare URL in the console.

Screenshot 2021-12-09 at 15 30 56


The second attempt also failed.

Events:
  Type     Reason       Age               From               Message
  ----     ------       ----              ----               -------
  Normal   Scheduled    25s               default-scheduler  Successfully assigned staging/nfs-share-creator-thlx2 to aks-core-34239724-vmss000000
  Warning  FailedMount  8s (x6 over 24s)  kubelet            MountVolume.SetUp failed for volume "home-base" : mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t nfs 2i2ccarbonplanhubstorage.file.core.windows.net/homes:/export/home-01/homes/ /var/lib/kubelet/pods/5f4bb712-8a21-415c-87a4-47ade6b9b41c/volumes/kubernetes.io~nfs/home-base
Output: mount.nfs: Failed to resolve server 2i2ccarbonplanhubstorage.file.core.windows.net/homes: Name or service not known

@sgibson91 sgibson91 mentioned this pull request Dec 9, 2021
9 tasks
@sgibson91
Copy link
Member Author

Deployed grafana dashboards in the interim.

@yuvipanda will brainstorm the above tomorrow. These warnings in the console feel very relevant.

Screenshot 2021-12-09 at 15 38 15

@yuvipanda yuvipanda force-pushed the redeploy-carbonplan-azure branch from 9694275 to c48d8da Compare December 10, 2021 06:13
@yuvipanda
Copy link
Member

I've pushed two new commits here, and am trying it out on the UToronto cluster. The cluster can mount the NFS setup now! \o/. It's still owned as 'root' though, so need to figure out how to make that instead be uid 1000.

yuvipanda and others added 4 commits December 10, 2021 12:37
Co-authored-by: GeorgianaElena <georgiana.dolocan@gmail.com>
This was a replacement for nfs-share-creator for Azure File
when we were using it with SMB / CIFS. However, we are now using
NFS directly, and this was just causing problems where the
actual notebook container was mounting a different path than
the volume-mount initcontainer that was setting uid!
@sgibson91 sgibson91 changed the title Tweaking NFS settings and enabling for CarbonPlan on Azure Tweaking NFS settings and enabling for UToronto and CarbonPlan on Azure Dec 10, 2021
@sgibson91
Copy link
Member Author

Hmmm, whatever network connection we made in terraform may block us from continuing to manage the infrastructure with terraform once applied 😕

I applied the latest changes, and then did another plan expecting a "Your infrastructure is up-to-date" message, and instead got:

│ Error: shares.Client#GetProperties: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="AuthorizationFailure" Message="This request is not authorized to perform this operation.\nRequestId:8150832e-d01a-0012-63ad-edeedb000000\nTime:2021-12-10T10:02:58.0047296Z"
│ 
│   with azurerm_storage_share.homes,
│   on storage.tf line 21, in resource "azurerm_storage_share" "homes":
│   21: resource "azurerm_storage_share" "homes" {

@sgibson91
Copy link
Member Author

I have deployed the staging hub to the cluster and can confirm that chmod does appropriate things now!

Screenshot 2021-12-10 at 10 34 24

My only concern is the terraform issue #887 (comment)

@sgibson91 sgibson91 marked this pull request as ready for review December 10, 2021 10:37
Copy link
Member

@GeorgianaElena GeorgianaElena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me @sgibson91!
We should however investigate that 403 when terraform apply 😕 But I have no idea if that should block merging this PR. We already deployed the changes here...

@sgibson91
Copy link
Member Author

We should however investigate that 403 when terraform apply 😕 But I have no idea if that should block merging this PR. We already deployed the changes here...

I have opened #890 to track this since the damage is already done to these clusters

@sgibson91
Copy link
Member Author

sgibson91 commented Dec 10, 2021

I have deployed the prod hub and grafana dashboards again. Merging this!

@sgibson91 sgibson91 merged commit 6688499 into 2i2c-org:master Dec 10, 2021
@sgibson91 sgibson91 deleted the redeploy-carbonplan-azure branch December 10, 2021 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants