-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faulty DNS queries in musl-based containers due to "search ." in resolv.conf #1287
Comments
Thanks for the report.
For reference, which Additionally I'd wait to see how the upstream kubernetes bug discussion goes, as the in-container DNS resolution is really owned by the orchestrator. Overall I feel this may be a |
I think I can see what's going wrong with musl DNS resolution. Even outside of FCOS or kubernetes, just running a plain container with the
The interesting part is what happens under the hood. |
An even smaller reproducer is:
For some reason, the bug seems to only happen if there are multiple search domains configured.
|
@lucab awesome digging!
That or having the "options ndots:N" line (as happens in k8s) seem to be key for reproducing it. Now I can see this with a Podman container, adjusting
Fully agreed. This seems like a musl DNS issue, surfaced on certain hosts via the systemd change, and recently propagated into Kubernetes by kubernetes/kubernetes#109441. I think the workaround in openshift/okd-machine-os#159 makes sense to be applied at the k8s distro level for now. Not at the OS level, forget my original ask. Hopefully Kubernetes can react or eventually musl has a fix. I was also using
|
* systemd adds "search ." to hosts /run/systemd/resolve/resolv.conf on hosts with a fqdn hostname * Kubelet v1.25 began propagating "search ." from the host node into containers' `/etc/resolv.conf` * musl-based DNS resolvers don't behave correctly when `search .` is used in their `/etc/resolv.conf`. This breaks Alpine images * Adapt the same workaround used by Openshift to strip the "search ." * This only applies to bare-metal Typhoon nodes (where hostnames are set to fqdn's), nodes on cloud platforms aren't affected in the Typhoon configuration Kubernetes tracking issue: kubernetes/kubernetes#112135 Rel: * systemd/systemd#17201 * kubernetes/kubernetes#109441 * coreos/fedora-coreos-tracker#1287 * openshift/okd-machine-os#159
* systemd adds "search ." to hosts /run/systemd/resolve/resolv.conf on hosts with a fqdn hostname * Kubelet v1.25 began propagating "search ." from the host node into containers' `/etc/resolv.conf` * musl-based DNS resolvers don't behave correctly when `search .` is used in their `/etc/resolv.conf`. This breaks Alpine images * Adapt the same workaround used by Openshift to strip the "search ." * This only applies to bare-metal Typhoon nodes (where hostnames are set to fqdn's), nodes on cloud platforms aren't affected in the Typhoon configuration Kubernetes tracking issue: kubernetes/kubernetes#112135 Rel: * systemd/systemd#17201 * kubernetes/kubernetes#109441 * coreos/fedora-coreos-tracker#1287 * openshift/okd-machine-os#159
I agree with this assessment. Were you planning to report the musl bug to their ML too? |
I sent an email to musl at lists.openwall.com, EDIT: visible at https://www.openwall.com/lists/musl/2022/08/31/1 |
fyi @vrutkovs |
* systemd adds "search ." to hosts /run/systemd/resolve/resolv.conf on hosts with a fqdn hostname * Kubelet v1.25 began propagating "search ." from the host node into containers' `/etc/resolv.conf` * musl-based DNS resolvers don't behave correctly when `search .` is used in their `/etc/resolv.conf`. This breaks Alpine images * Adapt the same workaround used by Openshift to strip the "search ." * This only applies to bare-metal Typhoon nodes (where hostnames are set to fqdn's), nodes on cloud platforms aren't affected in the Typhoon configuration Kubernetes tracking issue: kubernetes/kubernetes#112135 Rel: * systemd/systemd#17201 * kubernetes/kubernetes#109441 * coreos/fedora-coreos-tracker#1287 * openshift/okd-machine-os#159
Looks like Kubernetes will stop propagating Thank you @lucab for the digging to help confirm and fix! |
Thank all for the discussion! |
* systemd adds "search ." to hosts /run/systemd/resolve/resolv.conf on hosts with a fqdn hostname * Kubelet v1.25 began propagating "search ." from the host node into containers' `/etc/resolv.conf` * musl-based DNS resolvers don't behave correctly when `search .` is used in their `/etc/resolv.conf`. This breaks Alpine images * Adapt the same workaround used by Openshift to strip the "search ." * This only applies to bare-metal Typhoon nodes (where hostnames are set to fqdn's), nodes on cloud platforms aren't affected in the Typhoon configuration Kubernetes tracking issue: kubernetes/kubernetes#112135 Rel: * systemd/systemd#17201 * kubernetes/kubernetes#109441 * coreos/fedora-coreos-tracker#1287 * openshift/okd-machine-os#159
Describe the bug
Could we pull in openshift/okd-machine-os#159 to FCOS? Kubernetes v1.25.0 landed and the systemd behavior that adds "search ." on nodes with FQDNs now propagates into containers, which breaks musl-based DNS resolution apparently. I can see why OpenShift reverted it.
Reproduction steps
Originally I was preparing the Typhoon Kubernetes distro for v1.25.0 release and I did some sleuthing. The full write-up for Kubernetes is kubernetes/kubernetes#112135. The original bug in Kubelet was added to fix a problem RedHat customers saw, but I imagine the side effect wasn't noticed since OpenShift has openshift/okd-machine-os#159 to mitigate it.
Unfortunately, just starting a container with Podman (and mucking with
/etc/resolv.conf
) doesn't seem to demonstrate the issue. I'd like to try to get a single-node reproducer for you guys. But for now it seems to just be Typhoon and OpenShiftExpected behavior
Don't put "search ." at the end of
/etc/resolv.conf
on hosts.Actual behavior
https://github.com/systemd/systemd/pull/17201/files adds "search ." to
/run/systemd/resolve/stub-resolv.conf
, which seems to confuse any musl-based DNS resolution when this propagates into Kubernetes containers. This didn't become apparent until Kubernetes v1.25.0, when Kubelet started propagating that "search ." directly into a container's/etc/resolv.conf
in kubernetes/kubernetes#109441System details
Bare Metal
Ignition config
As a workaround, I can apply the same unit OpenShift is using. Removing the "." fixes the musl DNS problems.
Related
Add any other information about the problem here.
The text was updated successfully, but these errors were encountered: