Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dracut/hostname: shorten overlong hostname #509

Closed
lucab opened this issue Oct 26, 2020 · 7 comments · Fixed by #673
Closed

dracut/hostname: shorten overlong hostname #509

lucab opened this issue Oct 26, 2020 · 7 comments · Fixed by #673
Labels
jira for syncing to jira

Comments

@lucab
Copy link
Contributor

lucab commented Oct 26, 2020

On a subset of platforms, the afterburn-hostname.service tries to retrieve the node hostname from a metadata field and forward any value from there into /etc/hostname as the machine static hostname.

According to POSIX.1 "Host names (not including the terminating null byte) are limited to HOST_NAME_MAX bytes". On Linux, HOST_NAME_MAX is defined with the value 64, see https://man7.org/linux/man-pages/man2/gethostname.2.html.

From prior experiences, we know that it's possible that either humans or platforms try to set up names longer than than (e.g. FQDN up to 255 bytes), resulting in much pain.

Afterburn should have some additional logic in write_hostname() to truncate overlong hostnames to the first dot or to HOST_MAX_LEN, whatever comes earlier.
Note that this will possibly mean a desync between the hostname file and the metadata attributes files.

I wrote similar logic (with tests) for systemd-networkd once (systemd/systemd#7616) and it could be ported from there.

@jamescassell

This comment has been minimized.

@cgwalters
Copy link
Member

For Kubernetes it's required that the hostname be routable. Which means that the "hostname shortening" logic must match whatever the cloud provider (GCP) will do.

@cgwalters
Copy link
Member

OK this came up again recently in openshift/machine-config-operator#2401 and here's my proposal:

We ship afterburn-transient-hostname.service that is disabled by default, but can be enabled e.g. by the MCO. We also move the special GCP hostname truncation into this project.

That way the MCO doesn't need to carry bash code that shleps the hostname to a file, etc.

@cgwalters
Copy link
Member

It turned out this bit us badly in https://bugzilla.redhat.com/show_bug.cgi?id=2008521#c40

Clearly, we should have gone this route from the start.

@cgwalters
Copy link
Member

@lucab Did you have any further thoughts on this?

It seems like we're all in agreement that afterburn should do this?

@lucab
Copy link
Contributor Author

lucab commented Jan 10, 2022

Yes, I agree Afterburn should more strictly honor HOST_NAME_MAX if not doing so causes further bugs in downstream components. That may cause a regression for folks that were previously using overlong hostnames, but I think those should be considered buggy cases.

However I don't think this would resolve the BZ you linked, for two reasons:

  • on GCP, the hostname comes from the DHCP. There, NM is in a better position to properly handle localname and search-domains compared to than what Afterburn can do. Something like Turn on Afterburn hostname support for GCP #512 has the potential to introduce further split-brain situations in an already messy scenario, so we'd better avoid interposing and let NM lead the dance instead.
  • on OCP, the kubelet-imposed limit is shorter (by 1 char) than the kernel limit. So there is a subset of Linux- (and systemd-, afterburn-, etc) valid names which are not OCP-valid node names.

So bottom line is that we should avoid bringing afterburn-hostname into more cases where DHCP is functional (e.g. GCP), it is only meant to handle those platforms where DHCP does not provide the hostname (e.g. Azure).
For platforms in the latter bucket we should do some sanity-truncation if other components do so. But let's keep in mind that the trade-offs for doing so is that 1) it will introduce further drift with the infrastructure inventory, and 2) may not respect search-domain lookup, and 3) won't satisfy kubelet stricter requirement anyway.

@cgwalters
Copy link
Member

on GCP, the hostname comes from the DHCP.

Yes, but not in OpenShift today. The mco-hostname.service owns it, but only after the pivot, and crucially only after NM has run.

There, NM is in a better position to properly handle localname and search-domains compared to than what Afterburn can do.

Yes, I think everyone agrees with this. However, since it's afterburn that is writing a too-long /sysroot/etc/hostname today, the idea is it's easier to fix there first.

on OCP, the kubelet-imposed limit is shorter (by 1 char) than the kernel limit. So there is a subset of Linux- (and systemd-, afterburn-, etc) valid names which are not OCP-valid node names.

This is a good point. I really hope there aren't cases where we're hitting that today. In the BZ above, it's much more about having a better truncation strategy for the FQDN I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira for syncing to jira
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants