Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Deployment issues with Intel CPU #626

Closed
tripadvisor101 opened this issue Mar 6, 2023 · 10 comments
Closed

[Bug]: Deployment issues with Intel CPU #626

tripadvisor101 opened this issue Mar 6, 2023 · 10 comments
Labels
bug Something isn't working

Comments

@tripadvisor101
Copy link
Contributor

Description

Previous week I was able to successfully deploy single node cluster by choosing CX31 machine for control-plane. Starting from today I am not able to deploy anymore. Only works for AMD CPU's like CPX11, CPX21, CPX31. The deployment stuck at kustomization provisioning step:

module.kube-hetzner.null_resource.kustomization: Provisioning with 'file'...
module.kube-hetzner.null_resource.kustomization: Provisioning with 'file'...
module.kube-hetzner.null_resource.kustomization: Provisioning with 'file'...
module.kube-hetzner.null_resource.kustomization: Provisioning with 'file'...
module.kube-hetzner.null_resource.kustomization: Provisioning with 'remote-exec'...
module.kube-hetzner.null_resource.kustomization (remote-exec): (output suppressed due to sensitive value in config)
module.kube-hetzner.null_resource.kustomization: Still creating... [10s elapsed]
module.kube-hetzner.null_resource.kustomization (remote-exec): (output suppressed due to sensitive value in config)
module.kube-hetzner.null_resource.kustomization (remote-exec): (output suppressed due to sensitive value in config)
module.kube-hetzner.null_resource.kustomization (remote-exec): (output suppressed due to sensitive value in config)
module.kube-hetzner.null_resource.kustomization (remote-exec): (output suppressed due to sensitive value in config)
module.kube-hetzner.null_resource.kustomization (remote-exec): (output suppressed due to sensitive value in config)
module.kube-hetzner.null_resource.kustomization (remote-exec): (output suppressed due to sensitive value in config)
module.kube-hetzner.null_resource.kustomization (remote-exec): (output suppressed due to sensitive value in config)
module.kube-hetzner.null_resource.kustomization (remote-exec): (output suppressed due to sensitive value in config)
module.kube-hetzner.null_resource.kustomization (remote-exec): (output suppressed due to sensitive value in config)
module.kube-hetzner.null_resource.kustomization (remote-exec): (output suppressed due to sensitive value in config)
module.kube-hetzner.null_resource.kustomization (remote-exec): (output suppressed due to sensitive value in config)
module.kube-hetzner.null_resource.kustomization (remote-exec): (output suppressed due to sensitive value in config)
module.kube-hetzner.null_resource.kustomization (remote-exec): (output suppressed due to sensitive value in config)
module.kube-hetzner.null_resource.kustomization: Provisioning with 'remote-exec'...
module.kube-hetzner.null_resource.kustomization (remote-exec): Connecting to remote host via SSH...
module.kube-hetzner.null_resource.kustomization (remote-exec):   Host: 
module.kube-hetzner.null_resource.kustomization (remote-exec):   User: root
module.kube-hetzner.null_resource.kustomization (remote-exec):   Password: false
module.kube-hetzner.null_resource.kustomization (remote-exec):   Private key: true
module.kube-hetzner.null_resource.kustomization (remote-exec):   Certificate: false
module.kube-hetzner.null_resource.kustomization (remote-exec):   SSH Agent: true
module.kube-hetzner.null_resource.kustomization (remote-exec):   Checking Host Key: false
module.kube-hetzner.null_resource.kustomization (remote-exec):   Target Platform: unix
module.kube-hetzner.null_resource.kustomization (remote-exec): Connected!
module.kube-hetzner.null_resource.kustomization (remote-exec): + sed -i 's/^- |[0-9]\+$/- |/g' /var/post_install/kustomization.yaml
module.kube-hetzner.null_resource.kustomization (remote-exec): + timeout 180 bash
module.kube-hetzner.null_resource.kustomization (remote-exec): + kubectl apply -k /var/post_install
module.kube-hetzner.null_resource.kustomization (remote-exec): namespace/cert-manager created
module.kube-hetzner.null_resource.kustomization (remote-exec): namespace/system-upgrade created
module.kube-hetzner.null_resource.kustomization (remote-exec): namespace/traefik created
module.kube-hetzner.null_resource.kustomization (remote-exec): storageclass.storage.k8s.io/hcloud-volumes created
module.kube-hetzner.null_resource.kustomization (remote-exec): serviceaccount/cloud-controller-manager created
module.kube-hetzner.null_resource.kustomization (remote-exec): serviceaccount/hcloud-csi-controller created
module.kube-hetzner.null_resource.kustomization (remote-exec): serviceaccount/kured created
module.kube-hetzner.null_resource.kustomization (remote-exec): serviceaccount/system-upgrade created
module.kube-hetzner.null_resource.kustomization (remote-exec): role.rbac.authorization.k8s.io/kured created
module.kube-hetzner.null_resource.kustomization (remote-exec): clusterrole.rbac.authorization.k8s.io/hcloud-csi-controller created
module.kube-hetzner.null_resource.kustomization (remote-exec): clusterrole.rbac.authorization.k8s.io/kured created
module.kube-hetzner.null_resource.kustomization (remote-exec): rolebinding.rbac.authorization.k8s.io/kured created
module.kube-hetzner.null_resource.kustomization (remote-exec): clusterrolebinding.rbac.authorization.k8s.io/hcloud-csi-controller created
module.kube-hetzner.null_resource.kustomization (remote-exec): clusterrolebinding.rbac.authorization.k8s.io/kured created
module.kube-hetzner.null_resource.kustomization (remote-exec): clusterrolebinding.rbac.authorization.k8s.io/system-upgrade created
module.kube-hetzner.null_resource.kustomization (remote-exec): clusterrolebinding.rbac.authorization.k8s.io/system:cloud-controller-manager created
module.kube-hetzner.null_resource.kustomization (remote-exec): configmap/default-controller-env created
module.kube-hetzner.null_resource.kustomization (remote-exec): service/hcloud-csi-controller-metrics created
module.kube-hetzner.null_resource.kustomization (remote-exec): service/hcloud-csi-node-metrics created
module.kube-hetzner.null_resource.kustomization (remote-exec): deployment.apps/hcloud-cloud-controller-manager created
module.kube-hetzner.null_resource.kustomization (remote-exec): deployment.apps/hcloud-csi-controller created
module.kube-hetzner.null_resource.kustomization (remote-exec): deployment.apps/system-upgrade-controller created
module.kube-hetzner.null_resource.kustomization (remote-exec): daemonset.apps/hcloud-csi-node created
module.kube-hetzner.null_resource.kustomization (remote-exec): daemonset.apps/kured created
module.kube-hetzner.null_resource.kustomization (remote-exec): helmchart.helm.cattle.io/cert-manager created
module.kube-hetzner.null_resource.kustomization (remote-exec): helmchart.helm.cattle.io/traefik created
module.kube-hetzner.null_resource.kustomization (remote-exec): csidriver.storage.k8s.io/csi.hetzner.cloud created
module.kube-hetzner.null_resource.kustomization (remote-exec): + echo 'Waiting for the system-upgrade-controller deployment to become available...'
module.kube-hetzner.null_resource.kustomization (remote-exec): Waiting for the system-upgrade-controller deployment to become available...
module.kube-hetzner.null_resource.kustomization (remote-exec): + kubectl -n system-upgrade wait --for=condition=available --timeout=180s deployment/system-upgrade-controller
module.kube-hetzner.null_resource.kustomization: Still creating... [20s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [30s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [40s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [50s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [1m0s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [1m10s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [1m20s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [1m30s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [1m40s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [1m50s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [2m0s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [2m10s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [2m20s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [2m30s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [2m40s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [2m50s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [3m0s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [3m10s elapsed]
module.kube-hetzner.null_resource.kustomization (remote-exec): error: timed out waiting for the condition on deployments/system-upgrade-controller
╷
│ Error: remote-exec provisioner error
│
│   with module.kube-hetzner.null_resource.kustomization,
│   on .terraform/modules/kube-hetzner/init.tf line 247, in resource "null_resource" "kustomization":
│  247:   provisioner "remote-exec" {
│
│ error executing "/tmp/terraform_1019175183.sh": Process exited with status 1
╵ 

### Kube.tf file

```terraform
locals {
  hcloud_token = ""
}

module "kube-hetzner" {
  providers = {
    hcloud = hcloud
  }
  hcloud_token = var.hcloud_token != "" ? var.hcloud_token : local.hcloud_token

  source = "kube-hetzner/kube-hetzner/hcloud"

  ssh_public_key = file("id_ed25519.pub")
  ssh_private_key = file("id_ed25519")

  network_region = "eu-central"

  control_plane_nodepools = [
    {
      name        = "control-plane-nbg1",
      server_type = "cx31",
      location    = "nbg1",
      labels      = [],
      taints      = [],
      count       = 1
    }
  ]

  agent_nodepools = [
    {
      name        = "agent-small",
      server_type = "cx21",
      location    = "nbg1",
      labels      = [],
      taints      = [],
      count       = 0
    }
  ]

  control_planes_custom_config = {
    etcd-expose-metrics         = true,
    kube-controller-manager-arg = "bind-address=0.0.0.0",
    kube-proxy-arg              = "metrics-bind-address=0.0.0.0",
    kube-scheduler-arg          = "bind-address=0.0.0.0",
  }

  load_balancer_type     = "lb11"
  load_balancer_location = "nbg1"

  automatically_upgrade_os = false
}

provider "hcloud" {
  token = var.hcloud_token != "" ? var.hcloud_token : local.hcloud_token
}

terraform {
  required_version = ">= 1.3.3"
  required_providers {
    hcloud = {
      source  = "hetznercloud/hcloud"
      version = ">= 1.35.2"
    }
  }
}

output "kubeconfig" {
  value     = module.kube-hetzner.kubeconfig
  sensitive = true
}

variable "hcloud_token" {
  sensitive = true
  default   = ""
}

Screenshots

No response

Platform

Mac

@tripadvisor101 tripadvisor101 added the bug Something isn't working label Mar 6, 2023
@mysticaltech
Copy link
Collaborator

@tripadvisor101 This may be due to #623, please destroy, terraform init -upgrade to get the latest version of the module and apply again.

@bulnv
Copy link
Contributor

bulnv commented Mar 6, 2023

@tripadvisor101 which version of the module works for you with AMD machines?
@mysticaltech I've checked quickly looks like secondary IP can't get IP address and its down, that's why k3s cant start

@tripadvisor101
Copy link
Contributor Author

@mysticaltech Unfortunately, version 1.9.8 does not fix this issue.
@bulnv Sorry, didn't understand your question.

@bulnv
Copy link
Contributor

bulnv commented Mar 6, 2023

@mysticaltech Unfortunately, version 1.9.8 does not fix this issue. @bulnv Sorry, didn't understand your question.

sorry for the typo. Which terraform module terraform-hcloud-kube-hetzner works for you with AMD based instances?

@tripadvisor101
Copy link
Contributor Author

@bulnv works with 1.9.7 and 1.9.8 versions.

@bulnv
Copy link
Contributor

bulnv commented Mar 6, 2023

strange but for me 1.9.7 not working with AMD I am running into exactly the same issue

@mysticaltech
Copy link
Collaborator

@bulnv Please ssh into the node, see in the readme how to. And run ip address show, I want to confirm something. Maybe the name is not the same for all node kinds, in which name we need to determine what it is dynamically and inject that into the config.

@mysticaltech
Copy link
Collaborator

Folks, let's move the discussion over to the original issue #623

@bulnv
Copy link
Contributor

bulnv commented Mar 6, 2023

@mysticaltech my bad, i've been running new version of module with old Microos, cause was struggling all the weekend. Checking from scratch

@mysticaltech
Copy link
Collaborator

I know it's not completely fixed, because on a quick test, I got ens10 as the name for cx31, which will not work now, so we will need to determine the name more dynamically via remote-exec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants