Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shoot worker node hostname changes after machine reboot #569

Open
timebertt opened this issue Feb 8, 2023 · 0 comments
Open

Shoot worker node hostname changes after machine reboot #569

timebertt opened this issue Feb 8, 2023 · 0 comments
Labels
area/robustness Robustness, reliability, resilience related kind/bug Bug lifecycle/rotten Nobody worked on this for 12 months (final aging stage) platform/openstack OpenStack platform/infrastructure

Comments

@timebertt
Copy link
Member

How to categorize this issue?

/area robustness
/kind bug
/platform openstack

What happened:

When rebooting a shoot worker node, its hostname changes.

# cat /etc/hostname # before
shoot--1ad1ca31bc--migrate0-pool-xuyn9ea3hs-z1-54964-td9lx
# cat /etc/hostname # after
shoot--1ad1ca31bc--migrate0-pool-xuyn9ea3hs-z1-54964-td9lx.noval

This causes kubelet to fail to start after the machine reboot because it can't get the Node object with the new name:

kubelet.go:2424] "Error getting node" err="node \"shoot--1ad1ca31bc--migrate0-pool-xuyn9ea3hs-z1-54964-td9lx.noval\" not found"

Note: the default dns_domain for neutron network is novalocal in our installation, which is appended to the server name. Because the entire FQDN hostname is too long, it is shortened in the above example.
provider-openstack doesn't set the dns_domain in the created neutron networks explicitly.

What you expected to happen:

The hostname should be stable and kubelet should be able to start again after a node reboot.

How to reproduce it (as minimally and precisely as possible):

  • SSH into a node
  • reboot the machine
  • observe that kubelet fails to start and the Node is not able to recover from state Unready

Anything else we need to know?:

This extension adds an ExecStartPre directive to the kubelet unit which changes the hostname:

Section: "Service",
Name: "ExecStartPre",
Value: `/bin/sh -c 'hostnamectl set-hostname $(cat /etc/hostname | cut -d '.' -f 1)'`,

On the initial boot of the machine, this always works as the kubelet unit and the hostnamectl command is always invoked after any cloud-init mechanisms (the unit is only present after the first successful run of the cloud-config downloader/executor).
However, after rebooting the machine, all the kubelet unit and its hostnamectl command race with other cloud-init mechanisms which can lead to a changed hostname.

Environment:

  • Gardener version (if relevant): v1.62.1
  • Extension version: v1.31.0
  • Kubernetes version (use kubectl version): v1.24.8
  • Cloud provider or hardware configuration: STACKIT / OpenStack (Queens/Yoga)
@gardener-robot gardener-robot added area/robustness Robustness, reliability, resilience related kind/bug Bug platform/openstack OpenStack platform/infrastructure labels Feb 8, 2023
@gardener-robot gardener-robot added the lifecycle/stale Nobody worked on this for 6 months (will further age) label Oct 18, 2023
@gardener-robot gardener-robot added lifecycle/rotten Nobody worked on this for 12 months (final aging stage) and removed lifecycle/stale Nobody worked on this for 6 months (will further age) labels Jun 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/robustness Robustness, reliability, resilience related kind/bug Bug lifecycle/rotten Nobody worked on this for 12 months (final aging stage) platform/openstack OpenStack platform/infrastructure
Projects
None yet
Development

No branches or pull requests

2 participants