Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: agents are hang in status "still creating" #623

Closed
bulnv opened this issue Mar 4, 2023 · 30 comments
Closed

[Bug]: agents are hang in status "still creating" #623

bulnv opened this issue Mar 4, 2023 · 30 comments
Labels
bug Something isn't working

Comments

@bulnv
Copy link
Contributor

bulnv commented Mar 4, 2023

Description

  • agents are hang in status "still creating" for more than 1h
  • /bin/sh /tmp/terraform_748037606.sh in the process
  • "server is not ready: unable to find interface: route ip+net: no such netwo rk interface" in the logs of k3s agent

Kube.tf file

module "kube-hetzner" {
  providers = {
    hcloud = hcloud
  }
  hcloud_token = var.hcloud_token
  source = "kube-hetzner/kube-hetzner/hcloud"
  version = "1.9.6"
  ssh_public_key  = file("./id_ed25519_k8s.pub")
  ssh_private_key = file("./id_ed25519_k8s")
  network_region  = "eu-central" # change to `us-east` if location is ash
  # control_plane_nodepools = local.env == "staging" ? local.staging_control_plane_nodepools : local.production_control_plane_nodepools
  # agent_nodepools = local.env == "staging" ? local.staging_agent_nodepools : local.production_agent_nodepools
  load_balancer_type     = "lb11"
  load_balancer_location = "nbg1"
  control_plane_nodepools = [    {
      name        = "control-plane",
      server_type = "cpx21",
      location    = "nbg1",
      labels      = [],
      taints      = [],
      count       = 1
    } 
    ]
  agent_nodepools =  [
    {
      name        = "agent",
      server_type = "cpx31",
      location    = "nbg1",
      labels      = [],
      taints      = [],
      count       = 2
    }
  ]
  enable_cert_manager = true
  # etcd_s3_backup = {
  #   etcd-s3-endpoint   = "***"
  #   etcd-s3-access-key = "k8s-backups"
  #   etcd-s3-secret-key = "***"
  #   etcd-s3-bucket     = "k8s-backups"
  # }
  automatically_upgrade_k3s = false
  automatically_upgrade_os = false
  cluster_name = format("%s-%s-k8s", local.project, local.env)
  restrict_outbound_traffic = false
  disable_network_policy = true
}

Screenshots

No response

Platform

linux

@bulnv bulnv added the bug Something isn't working label Mar 4, 2023
@ianwesleyarmstrong
Copy link
Contributor

Just ran into this as well! The only error I can see from the logs is a timeout for the system-upgrade-controller:

null_resource.kustomization (remote-exec): error: timed out waiting for the condition on deployments/system-upgrade-controller

@zek
Copy link

zek commented Mar 5, 2023

Same here. It was working a few days ago

@Mammut-Felix
Copy link

I also have the same problem.
For me it was working as expected with the latest version on Friday.
Yesterday it stopped working. Maybe there is a correlation with the new snapshot of MicroOS released yesterday.

@bulnv
Copy link
Contributor Author

bulnv commented Mar 5, 2023

I also have the same problem. For me it was working as expected with the latest version on Friday. Yesterday it stopped working. Maybe there is a correlation with the new snapshot of MicroOS released yesterday.

Any chances to point old version?

@aleksasiriski
Copy link
Member

aleksasiriski commented Mar 5, 2023

Any chances to point old version?

In kube.tf:

version = "1.9.5"

Or whichever was the latest that worked for you, after changing that run terraform init -upgrade

@zek
Copy link

zek commented Mar 5, 2023

Pointing old version didn't work for me.

@ifeulner
Copy link
Contributor

ifeulner commented Mar 5, 2023

I just did a quick test - unfortunately it seems that with the newest MicroOS image the kustomizationprocess got stuck... currently no time to look into it, but I only changed the image and got this problem, too.

Newest MicroOS image reports kernel 6.2.1-1
image

terraform console output:

...
module.kube-hetzner.null_resource.kustomization: Still creating... [2m20s elapsed]
module.kube-hetzner.null_resource.agents["0-0-agent-small"]: Still creating... [2m20s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [2m30s elapsed]
module.kube-hetzner.null_resource.agents["0-0-agent-small"]: Still creating... [2m30s elapsed]
module.kube-hetzner.null_resource.agents["0-0-agent-small"]: Still creating... [2m40s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [2m40s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [2m50s elapsed]
module.kube-hetzner.null_resource.agents["0-0-agent-small"]: Still creating... [2m50s elapsed]
module.kube-hetzner.null_resource.agents["0-0-agent-small"]: Still creating... [3m0s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [3m0s elapsed]
module.kube-hetzner.null_resource.kustomization (remote-exec): error: timed out waiting for the condition on deployments/system-upgrade-controller
module.kube-hetzner.null_resource.agents["0-0-agent-small"]: Still creating... [3m10s elapsed]
module.kube-hetzner.null_resource.agents["0-0-agent-small"]: Still creating... [3m20s elapsed]
module.kube-hetzner.null_resource.agents["0-0-agent-small"]: Still creating... [3m30s elapsed]
module.kube-hetzner.null_resource.agents["0-0-agent-small"]: Still creating... [3m40s elapsed]
module.kube-hetzner.null_resource.agents["0-0-agent-small"]: Still creating... [3m50s elapsed]
...

It's a pity that these MicroOS images are not versioned...

@bulnv
Copy link
Contributor Author

bulnv commented Mar 5, 2023

I just did a quick test - unfortunately it seems that with the newest MicroOS image the kustomizationprocess got stuck... currently no time to look into it, but I only changed the image and got this problem, too.

Newest MicroOS image reports kernel 6.2.1-1 image

terraform console output:

...
module.kube-hetzner.null_resource.kustomization: Still creating... [2m20s elapsed]
module.kube-hetzner.null_resource.agents["0-0-agent-small"]: Still creating... [2m20s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [2m30s elapsed]
module.kube-hetzner.null_resource.agents["0-0-agent-small"]: Still creating... [2m30s elapsed]
module.kube-hetzner.null_resource.agents["0-0-agent-small"]: Still creating... [2m40s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [2m40s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [2m50s elapsed]
module.kube-hetzner.null_resource.agents["0-0-agent-small"]: Still creating... [2m50s elapsed]
module.kube-hetzner.null_resource.agents["0-0-agent-small"]: Still creating... [3m0s elapsed]
module.kube-hetzner.null_resource.kustomization: Still creating... [3m0s elapsed]
module.kube-hetzner.null_resource.kustomization (remote-exec): error: timed out waiting for the condition on deployments/system-upgrade-controller
module.kube-hetzner.null_resource.agents["0-0-agent-small"]: Still creating... [3m10s elapsed]
module.kube-hetzner.null_resource.agents["0-0-agent-small"]: Still creating... [3m20s elapsed]
module.kube-hetzner.null_resource.agents["0-0-agent-small"]: Still creating... [3m30s elapsed]
module.kube-hetzner.null_resource.agents["0-0-agent-small"]: Still creating... [3m40s elapsed]
module.kube-hetzner.null_resource.agents["0-0-agent-small"]: Still creating... [3m50s elapsed]
...

It's a pity that these MicroOS images are not versioned...

I've tried so far

  • MicroOS from 20230303 seems to be previous snapshot
  • K3s 1.25 which I've ran beforoe
  • now trying 1.9.5 module but I guess vainly

@ianwesleyarmstrong
Copy link
Contributor

ianwesleyarmstrong commented Mar 5, 2023

I started digging through https://mirror.dogado.de/opensuse/tumbleweed/appliances/ and was able to find a mirror link that pinned the version.

opensuse_microos_mirror_link = (
  "https://mirror.dogado.de/opensuse/tumbleweed/appliances/openSUSE-MicroOS.x86_64-16.0.0-OpenStack-Cloud-Snapshot20230222.qcow2"
  )

I'm sure the break was more recent (~20230302?), but I just used the pin from the last time I successfully deployed. Can confirm that pinning the version will fix this specific issue.

@bulnv
Copy link
Contributor Author

bulnv commented Mar 5, 2023

I started digging through https://mirror.dogado.de/opensuse/tumbleweed/appliances/ and was able to find a mirror link that pinned the version.

opensuse_microos_mirror_link = (
  "https://mirror.dogado.de/opensuse/tumbleweed/appliances/openSUSE-MicroOS.x86_64-16.0.0-OpenStack-Cloud-Snapshot20230222.qcow2"
  )

I'm sure the break was more recent (~20230302?), but I just used the pin from the last time I successfully deployed. Can confirm that pinning the version will fix this specific issue.

getting now remote executor timeout

╷
│ Error: remote-exec provisioner error
│ 
│   with module.kube-hetzner.null_resource.kustomization,
│   on .terraform/modules/kube-hetzner/init.tf line 247, in resource "null_resource" "kustomization":
│  247:   provisioner "remote-exec" {
│ 
│ error executing "/tmp/terraform_1215862405.sh": Process exited with status 1

@ianwesleyarmstrong
Copy link
Contributor

getting now remote executor timeout

╷
│ Error: remote-exec provisioner error
│ 
│   with module.kube-hetzner.null_resource.kustomization,
│   on .terraform/modules/kube-hetzner/init.tf line 247, in resource "null_resource" "kustomization":
│  247:   provisioner "remote-exec" {
│ 
│ error executing "/tmp/terraform_1215862405.sh": Process exited with status 1

Did you redeploy from scratch?

@bulnv
Copy link
Contributor Author

bulnv commented Mar 6, 2023

getting now remote executor timeout

╷
│ Error: remote-exec provisioner error
│ 
│   with module.kube-hetzner.null_resource.kustomization,
│   on .terraform/modules/kube-hetzner/init.tf line 247, in resource "null_resource" "kustomization":
│  247:   provisioner "remote-exec" {
│ 
│ error executing "/tmp/terraform_1215862405.sh": Process exited with status 1

Did you redeploy from scratch?

Sure thing. Twice or trice, even deleted .terraform folder

@mysticaltech
Copy link
Collaborator

People, no need to look into MicroOS, all we need to do if SSH into the node, and execute the failing bash files manually, and see what's happening. Also looking at the logs via journalctl. Will do ASAP, keep you posted.

@bulnv
Copy link
Contributor Author

bulnv commented Mar 6, 2023

People, no need to look into MicroOS, all we need to do if SSH into the node, and execute the failing bash files manually, and see what's happening. Also looking at the logs via journalctl. Will do ASAP, keep you posted.

Thanks for jumping in! as I posted in the beginning seems k3s agent unable to start because of "server is not ready: unable to find interface: route ip+net: no such netwo rk interface" in the logs of k3s agent" but tbh I haven't found any solution or reason for that

@mysticaltech
Copy link
Collaborator

@bulnv Exactly! The private interface name has changed, could be coming from Hetzner themselves, working on a backward compatible fix now.

ksnip_20230306-111628

@mysticaltech
Copy link
Collaborator

Alright folks, that's fixed as part of v1.9.7 just released now. I just renamed eth1 everywhere used in the code to the new name enp7s0. For reference, the details of the fix are here: e5225d3

According to chatgpt, it's the newest linux kernels that are now working this way, so it should be permanent from now on. The interface also gets discovered automatically now, so we are able to drop a few lines too from the cloud-init! Enjoy 🚀 ✨

@bulnv
Copy link
Contributor Author

bulnv commented Mar 6, 2023

@mysticaltech Still, not working for me and can't dig deeper cause ssh has been blocked on the server side

module.kube-hetzner.null_resource.first_control_plane (remote-exec): Waiting for the k3s server to start...
╷
│ Error: remote-exec provisioner error
│ 
│   with module.kube-hetzner.null_resource.first_control_plane,
│   on .terraform/modules/kube-hetzner/init.tf line 48, in resource "null_resource" "first_control_plane":
│   48:   provisioner "remote-exec" {
│ 
│ error executing "/tmp/terraform_462273567.sh": Process exited with status 124
22:2: Too many authentication failures

@mysticaltech
Copy link
Collaborator

mysticaltech commented Mar 6, 2023

@bulnv Just terraform destroy, and try again fresh, it will unblock SSH. Also, make sure to run terraform init -upgrade and before that remove the version tag that you might have set manually in your kube.tf.

@bulnv
Copy link
Contributor Author

bulnv commented Mar 6, 2023

@mysticaltech I did it. SSH was blocked because of my fault. I've logged in to the machine and found that the secondary IP is not assigned to eth1, that's why k3s cant start. Lets proceed in opened 626 issue

@mysticaltech
Copy link
Collaborator

mysticaltech commented Mar 6, 2023

@bulnv Yes, the name has changed now, what is it assigned to on your end? I expect it to be enp7s0, now if it gives eth1 or ens10, it will not work. So we will need to determine the name of the interface dynamically.

And rename it with something like:

  if ip link show real-interface-name >/dev/null 2>&1; then
    ip link set enp7s0 down
    ip link set enp7s0 name eth1
    ip link set eth1 up
  fi

So that we can move back all config to use eth1, the thing is that the above command will not be permanent, we need to make it permanent.

Right now, I am at work and cannot focus on that, I will come back to it tonight, please don't hesitate to send PR fixes. Otherwise, I will look into it more tonight ASAP.

@mysticaltech mysticaltech reopened this Mar 6, 2023
@mysticaltech
Copy link
Collaborator

If you folks could ssh into your nodes (see readme) and run ip address show and post the result here, it would be super useful later on, please. I need to know what are the names that are dynamically chosen, to effectively rename those during the deploy.

@tripadvisor101
Copy link
Contributor

@mysticaltech

k3s-control-plane-nbg1-vzm:~ # ip address show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether *removed* brd ff:ff:ff:ff:ff:ff
    altname enp1s0
    inet *removed*/32 scope global dynamic noprefixroute eth0
       valid_lft 84222sec preferred_lft 84222sec
    inet6 *removed*/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
3: enp7s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 86:00:00:3b:4c:69 brd ff:ff:ff:ff:ff:ff
    inet 10.255.0.101/32 scope global dynamic noprefixroute enp7s0
       valid_lft 84233sec preferred_lft 84233sec
    inet6 fe80::378b:7622:31b2:b3ab/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
4: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UNKNOWN group default
    link/ether 86:15:e0:4f:e6:99 brd ff:ff:ff:ff:ff:ff
    inet 10.42.0.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet6 fe80::8415:e0ff:fe4f:e699/64 scope link
       valid_lft forever preferred_lft forever
5: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP group default qlen 1000
    link/ether 46:99:00:47:e5:23 brd ff:ff:ff:ff:ff:ff
    inet 10.42.0.1/24 brd 10.42.0.255 scope global cni0
       valid_lft forever preferred_lft forever
    inet6 fe80::4499:ff:fe47:e523/64 scope link
       valid_lft forever preferred_lft forever
6: veth7716d907@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master cni0 state UP group default
    link/ether fa:47:6f:df:1c:c6 brd ff:ff:ff:ff:ff:ff link-netns cni-dbbdaacd-0f37-e892-0555-4733d5768eb6
    inet6 fe80::f847:6fff:fedf:1cc6/64 scope link
       valid_lft forever preferred_lft forever
7: veth37ad7a3f@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master cni0 state UP group default
    link/ether f6:ae:5b:6c:a1:54 brd ff:ff:ff:ff:ff:ff link-netns cni-4179eed4-230c-f100-45a7-886e263a56f4
    inet6 fe80::f4ae:5bff:fe6c:a154/64 scope link
       valid_lft forever preferred_lft forever
8: veth5303db59@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master cni0 state UP group default
    link/ether 16:3e:da:e5:eb:75 brd ff:ff:ff:ff:ff:ff link-netns cni-b7e81d3c-8b09-22c1-56d0-6c0a726119c1
    inet6 fe80::143e:daff:fee5:eb75/64 scope link
       valid_lft forever preferred_lft forever
9: vethb9b121c6@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master cni0 state UP group default
    link/ether 9a:cb:f2:ef:08:2e brd ff:ff:ff:ff:ff:ff link-netns cni-0b4700ce-4823-8500-f11e-380b542f81b2
    inet6 fe80::98cb:f2ff:feef:82e/64 scope link
       valid_lft forever preferred_lft forever
10: vethb8709d02@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master cni0 state UP group default
    link/ether 72:c4:dd:d0:ff:73 brd ff:ff:ff:ff:ff:ff link-netns cni-149f2045-9fd1-a6e1-d41b-3608163c1672
    inet6 fe80::70c4:ddff:fed0:ff73/64 scope link
       valid_lft forever preferred_lft forever
11: veth7d7c83b9@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master cni0 state UP group default
    link/ether 06:c1:18:a4:e9:f8 brd ff:ff:ff:ff:ff:ff link-netns cni-b467a1e2-5f7c-7766-78e0-b7873211950e
    inet6 fe80::4c1:18ff:fea4:e9f8/64 scope link
       valid_lft forever preferred_lft forever
13: veth60661c45@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master cni0 state UP group default
    link/ether de:3f:33:30:87:7e brd ff:ff:ff:ff:ff:ff link-netns cni-75ba1060-5bfd-612c-275c-805a838707ad
    inet6 fe80::dc3f:33ff:fe30:877e/64 scope link
       valid_lft forever preferred_lft forever
14: veth42ff93e0@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master cni0 state UP group default
    link/ether 3a:28:ac:c2:8d:b4 brd ff:ff:ff:ff:ff:ff link-netns cni-886098e8-f2bd-fb4e-da73-68b85c426a3a
    inet6 fe80::3828:acff:fec2:8db4/64 scope link
       valid_lft forever preferred_lft forever
15: veth2400d7d9@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master cni0 state UP group default
    link/ether 46:ff:ca:4d:9b:3e brd ff:ff:ff:ff:ff:ff link-netns cni-69c3e090-767e-987b-f1a9-db57287f5d70
    inet6 fe80::44ff:caff:fe4d:9b3e/64 scope link
       valid_lft forever preferred_lft forever
16: veth29c6c173@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master cni0 state UP group default
    link/ether 62:01:56:ad:ab:30 brd ff:ff:ff:ff:ff:ff link-netns cni-779f91c9-ae37-1daf-f600-a73584438192
    inet6 fe80::6001:56ff:fead:ab30/64 scope link
       valid_lft forever preferred_lft forever
17: veth0acb701a@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master cni0 state UP group default
    link/ether 0e:18:df:a7:5a:2e brd ff:ff:ff:ff:ff:ff link-netns cni-79b6b2b7-e576-a9a4-fbcc-7242a05b77cb
    inet6 fe80::c18:dfff:fea7:5a2e/64 scope link
       valid_lft forever preferred_lft forever

@bulnv
Copy link
Contributor Author

bulnv commented Mar 6, 2023

CPX31

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 96:00:01:f9:a6:c0 brd ff:ff:ff:ff:ff:ff
    altname enp1s0
    inet 5.75.178.42/32 scope global dynamic noprefixroute eth0
       valid_lft 86041sec preferred_lft 86041sec
    inet6 fe80::dbad:ea5:ad0e:7c83/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
3: enp7s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 86:00:00:3b:4e:22 brd ff:ff:ff:ff:ff:ff
    inet 10.255.0.101/32 scope global dynamic noprefixroute enp7s0
       valid_lft 86051sec preferred_lft 86051sec
    inet6 fe80::64de:71f9:af66:94f5/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
4: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1430 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
    inet 10.42.113.128/32 scope global tunl0
       valid_lft forever preferred_lft forever
7: calif79dd18072c@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1430 qdisc noqueue state UP group default 
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-16e2cc94-18b7-67a1-7eae-f818756eaa1f
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link 
       valid_lft forever preferred_lft forever

@bulnv
Copy link
Contributor Author

bulnv commented Mar 6, 2023

@mysticaltech i guess for me and CPX nodes issue is resolved so far. Thanks a bunch for help

@mysticaltech
Copy link
Collaborator

We now support all kinds of interface names, it doesn't matter which comes up, we end up renaming it to eth1.

@tripadvisor101
Copy link
Contributor

@mysticaltech Thanks! I can confirm it is now working on CX series VM's with Intel CPU.

@mysticaltech
Copy link
Collaborator

Good to hear @tripadvisor101, thanks for the confirmation!

@valkenburg-prevue-ch
Copy link
Contributor

Hi there, sorry for being late to the party, I work on a cached image of microos and didn't update kube-hetzner for a while, so had no issues. However, now I checked out the latest updates (kube-hetzner:1.10.0), still using my cached image of microos (from January), and guess what, networking all broken... Now on yesterday's microos and kube-hetzner:1.10.0, things work again.

I would say that for stability reasons, each release of kube-hetzner should be explicitly tied to a version of microos. In reality it is tied, but we just don't keep track of it.

I'm afraid that microos turns out to be quite a pain point. Its image gets updated every few days, without tracking versions and hence hardly an easy way to freeze the version your working with! The different releases of kube-hetzner only work with specific historic microos versions which are now mostly lost. @mysticaltech 's hard work to fix this (THANKS!), is not the first iteration of this cat-and-mouse game.

What is our way out of this?

@aleksasiriski
Copy link
Member

Hi there, sorry for being late to the party, I work on a cached image of microos and didn't update kube-hetzner for a while, so had no issues. However, now I checked out the latest updates (kube-hetzner:1.10.0), still using my cached image of microos (from January), and guess what, networking all broken... Now on yesterday's microos and kube-hetzner:1.10.0, things work again.

I would say that for stability reasons, each release of kube-hetzner should be explicitly tied to a version of microos. In reality it is tied, but we just don't keep track of it.

I'm afraid that microos turns out to be quite a pain point. Its image gets updated every few days, without tracking versions and hence hardly an easy way to freeze the version your working with! The different releases of kube-hetzner only work with specific historic microos versions which are now mostly lost. @mysticaltech 's hard work to fix this (THANKS!), is not the first iteration of this cat-and-mouse game.

What is our way out of this?

Maybe Fedora CoreOS? Idk if it's been discussed before, but it's used by Redhat for Openshift

@mysticaltech
Copy link
Collaborator

@valkenburg-prevue-ch @aleksasiriski In one year, we had one breaking change of microOS that broke the deployment of new nodes, which was in fact due to an improvement in the networking stack. So it's a nonissue for me! It's actually excellent and has no versions, it's a rolling release based on tumbleweed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

9 participants