Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Fails to resolve nameserver when run own DNS server through docker. #4144

Closed
phantomjinx opened this issue May 3, 2024 · 21 comments
Closed
Labels
kind/bug Something isn't working status/need triage

Comments

@phantomjinx
Copy link

General information

  • OS: Linux (Fedora 38)
  • Hypervisor: KVM
  • Did you run crc setup before starting it (Yes/No)? Yes
  • Running CRC on: Baremetal-Server

CRC version

CRC version: 2.35.0+3956e8
OpenShift version: 4.15.10
Podman version: 4.4.4

CRC status

# Put `crc status --log-level debug` output here
DEBU CRC version: 2.35.0+3956e8                   
DEBU OpenShift version: 4.15.10                   
DEBU Podman version: 4.4.4                        
DEBU Running 'crc status'                         
crc does not seem to be setup correctly, have you run 'crc setup'?

CRC config

# Put `crc config view` output here
- consent-telemetry                     : no
- disk-size                             : 100
- kubeadmin-password                    : <my-password>
- memory                                : 32768
- nameserver                            : 192.168.200.1
- skip-check-crc-dnsmasq-file           : true
- skip-check-network-manager-config     : true

Host Operating System

# Put the output of `cat /etc/os-release` in case of Linux
NAME="Fedora Linux"
VERSION="38 (KDE Plasma)"
ID=fedora
VERSION_ID=38
VERSION_CODENAME=""
PLATFORM_ID="platform:f38"
PRETTY_NAME="Fedora Linux 38 (KDE Plasma)"
ANSI_COLOR="0;38;2;60;110;180"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:38"
DEFAULT_HOSTNAME="fedora"
HOME_URL="https://fedoraproject.org/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/f38/system-administrators-guide/"
SUPPORT_URL="https://ask.fedoraproject.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=38
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=38
SUPPORT_END=2024-05-14
VARIANT="KDE Plasma"
VARIANT_ID=kde

Steps to reproduce

  1. crc setup -> reports crc is setup correctly
  2. crc start --log-level info --nameserver 192.168.200.1 -p crc-pull-secret.json

Expected

crc to validate pull-secret and continue starting ...

Actual

....
INFO Configuring shared directories               
INFO Check internal and public DNS query...       
INFO Check DNS query from host...                 
INFO Verifying validity of the kubelet certificates... 
INFO Starting kubelet service                     
INFO Waiting for kube-apiserver availability... [takes around 2min] 
INFO Waiting until the user's pull secret is written to the instance disk... 
Failed to update pull secret on the disk: Temporary error: pull secret not updated to disk (x202)

Logs

Before gather the logs try following if that fix your issue - Have tried the following:

$ crc delete -f
$ crc cleanup
$ crc setup
$ crc start --log-level debug

Please consider posting the output of crc start --log-level debug on http://gist.github.com/ and post the link in the issue.

https://gist.github.com/phantomjinx/2e87e94860df04d4a0275bed88e52d19

@phantomjinx phantomjinx added kind/bug Something isn't working status/need triage labels May 3, 2024
@adrianriobo
Copy link
Contributor

adrianriobo commented May 3, 2024

Hey this one is already reported, can you check #4110 also the fix is coming #4143

@adrianriobo adrianriobo added the resolution/duplicate This issue or pull request already exists label May 3, 2024
@phantomjinx
Copy link
Author

phantomjinx commented May 3, 2024

The workaround does not apply to me as I routinely disable system-resolved and run by own DNS server through docker.

So not convinced it can necessarily be closed since my setup is different and the workaround is not really a resolution.

@phantomjinx
Copy link
Author

phantomjinx commented May 3, 2024

Needless to say I was running 2.27 and everything was working without a problem.

@phantomjinx
Copy link
Author

If required, I can post the configuration of my /etc/sysconfig/iptables and docker ps.

@praveenkumar praveenkumar changed the title [BUG] [BUG] Fails to resolve nameserver when run own DNS server through docker. May 3, 2024
@cfergeau
Copy link
Contributor

cfergeau commented May 3, 2024

@praveenkumar should be able to provide more details next week.

My understanding is that there are 2 bugs.
crc changes the guest's resolv.conf to something like

search crc.testing
nameserver 192.168.130.11
nameserver 192.168.130.1

This allows the processes running inside the cluster to resolve (for example) api.crc.testing
This resolv.conf guest modification broke when we upgraded openshift to 4.15.

This means with 4.15 bundles, the guest will rely on the host DNS resolution to resolve *.crc.testing, and *.apps-crc.testing to the guest external IP, which is 192.168.130.11. bug #4110 is one situation where the host DNS resolution of *crc.testing does not work as expected, and this causes cluster startup failures.

Given your custom DNS configuration, I would guess your host knows nothing about *crc.testing? If you can configure it to return 192.168.130.11 for *.crc.testing and *.apps-crc.testing, this might help to go further.

@praveenkumar
Copy link
Member

I can see the you skipped the crc-dnsmasq file and also network-manager configuration. you are also passing the nameserver which I am hoping the IP of the container which you are running dnsmasq. Before 4.15 bundle everything worked because we were using openshift-sdn which was not making change to network and restarting the NM in the VM. which is why in the VM resolv.conf stays as in your case and work great till 4.14 bundle

search crc.testing
nameserver 192.168.200.1
nameserver 192.168.130.1

but if you login to the VM (in current case) you will see only following

nameserver 192.168.130.1

try something which @cfergeau suggested #4144 (comment) and see if that works.

@phantomjinx
Copy link
Author

So 192.168.200.1 is my DNS container, piedharrier (running bind):

            "Networks": {
                "skynet": {
                    "IPAMConfig": {
                        "IPv4Address": "192.168.200.1",
                        "IPv6Address": "2001:8b0:1103:6ed4:cccc:1111::9"
                    },
                    "Links": null,
                    "Aliases": [
                        "f816a3454037",
                        "piedharrier"
                    ],

piedharrier is already capable of resolving the cluster:

[root@piedharrier /]# ping api.crc.testing
PING api.crc.testing (192.168.130.11) 56(84) bytes of data.
64 bytes from 192.168.130.11 (192.168.130.11): icmp_seq=1 ttl=63 time=0.366 ms
64 bytes from 192.168.130.11 (192.168.130.11): icmp_seq=2 ttl=63 time=0.340 ms
^C
--- api.crc.testing ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 0.340/0.353/0.366/0.013 ms

@praveenkumar
Copy link
Member

What about other domain like api-int.crc.testing or foo.apps-crc.testing ..etc?

@phantomjinx
Copy link
Author

phantomjinx commented May 3, 2024

Screenshot_20240503_165414

So I add aliases whenever I install a new app and they all alias api.crc.testing

@praveenkumar
Copy link
Member

@phantomjinx Thanks for all the info, I am working on it to resolve this issue.

@praveenkumar
Copy link
Member

@phantomjinx can you try to download the artifact from https://github.com/crc-org/crc/actions/runs/8985004693 one and extract it (which will have the crc binary and rpm) use that crc binary and see if that works for you?

@phantomjinx
Copy link
Author

Unfortunately, crc could not start ...

...
INFO Using bundle path /home/phantomjinx/.crc/cache/crc_libvirt_4.15.10_amd64.crcbundle 
INFO Checking if running as non-root              
INFO Checking if running inside WSL2              
INFO Checking if crc-admin-helper executable is cached 
WARN Preflight checks failed during `crc start`, please try to run `crc setup` first in case you haven't done so yet 
unexpected version of the crc-admin-helper executable: crc-admin-helper-linux version mismatch: 0.5.2 expected but 0.0.12 found in the cache

@praveenkumar
Copy link
Member

@phantomjinx you had to do crc setup first before crc start

@phantomjinx
Copy link
Author

@praveenkumar ran crc setup first (this time separately) and it completed with no problem. Then ran crc start and same error:

[phantomjinx@microraptor:/home/openshift/bin] 20s $ crc setup
INFO Using bundle path /home/phantomjinx/.crc/cache/crc_libvirt_4.15.10_amd64.crcbundle 
INFO .... <snip>
INFO Checking if libvirt 'crc' network is active  
INFO Checking if CRC bundle is extracted in '$HOME/.crc' 
INFO Checking if /home/phantomjinx/.crc/cache/crc_libvirt_4.15.10_amd64.crcbundle exists 
INFO Getting bundle for the CRC executable        
INFO Downloading bundle: /home/phantomjinx/.crc/cache/crc_libvirt_4.15.10_amd64.crcbundle... 
4.81 GiB / 4.81 GiB [--------------------------------------------------------------------------------------------------] 100.00% 15.34 MiB/s
INFO Uncompressing /home/phantomjinx/.crc/cache/crc_libvirt_4.15.10_amd64.crcbundle 
crc.qcow2:  20.41 GiB / 20.41 GiB [------------------------------------------------------------------------------------------------] 100.00%
oc:  149.79 MiB / 149.79 MiB [-----------------------------------------------------------------------------------------------------] 100.00%
Your system is correctly setup for using CRC. Use 'crc start' to start the instance

[phantomjinx@microraptor:/home/openshift/bin] 17m50s $ start-crc
Starting with current configuration ... continue? y
Changes to configuration property 'memory' are only applied when the CRC instance is started.
If you already have a running CRC instance, then for this configuration change to take effect, stop the CRC instance with 'crc stop' and restart it with 'crc start'.
Changes to configuration property 'disk-size' are only applied when the CRC instance is started.
If you already have a running CRC instance, then for this configuration change to take effect, stop the CRC instance with 'crc stop' and restart it with 'crc start'.
Successfully configured nameserver to 192.168.200.1
Successfully configured skip-check-crc-dnsmasq-file to true
Successfully configured skip-check-network-manager-config to true
Successfully configured kubeadmin-password to ########
INFO Using bundle path /home/phantomjinx/.crc/cache/crc_libvirt_4.15.10_amd64.crcbundle 
INFO Checking if running as non-root              
INFO Checking if running inside WSL2              
INFO Checking if crc-admin-helper executable is cached 
WARN Preflight checks failed during `crc start`, please try to run `crc setup` first in case you haven't done so yet 
unexpected version of the crc-admin-helper executable: crc-admin-helper-linux version mismatch: 0.5.2 expected but 0.0.12 found in the cache

Just to avoid any confusion, I execute a script start-crc which first runs crc config ... options.
Here it is:

#!/bin/bash

export CRCHOME="/home/openshift/bin"

help() {
  echo "$0 [-c] [-h]"
  echo "    -c: clear out and reset as new"
  echo "    -h: show this help"
  exit 1
}

while getopts ":ch" opt ; do
  case "${opt}" in
    c) CLEAR=1 ;;
    h) help ;;
    \?) help ;;
  esac
done

if [ "${CLEAR}" == "1" ]; then
  echo "Clearing out and resetting ..."
  crc delete
  rm -rf ${HOME}/.crc/*
fi

read -p "Starting with current configuration ... continue? " -n 1 -r
echo    # (optional) move to a new line
if [[ ${REPLY} =~ ^[Yy]$ ]]; then
  START_CRC=1
fi

if [ "${START_CRC}" != 1 ]; then
  echo "Not starting crc... Exiting"
  exit 0
fi

. ${CRCHOME}/crc-configure

if [ "${CLEAR}" == "1" ]; then
  echo "Running crc setup ..."
  crc setup
fi

crc start \
  --log-level info \
  --nameserver 192.168.200.1 \
  -p crc-pull-secret.json

if [ $? == 0 ]; then
  echo "Restarting iptables ..."
  sudo service iptables restart
fi

@praveenkumar
Copy link
Member

@phantomjinx Did you use export CRCHOME="/home/openshift/bin" same path for crc setup because this should've fix that issue? make sure you check with which crc for both the commands to make sure same path is used.

@phantomjinx
Copy link
Author

@praveenkumar

Yes. I have been using CRCHOME and explicitly setting it.

So I have successfully started up a cluster with the downloaded artifact but in order to get around the version checks, I had to compile the following separately and drop into the .crc/bin directory.

  • crc-admin-helper-linux (downloaded binary reports its version as 0.0.12 - compiling returned correct version of 0.5.2)
  • crc-driver-libvirt (downloaded binary reports its version as 0.13.5 - compiling returned correct version of 0.13.7)

So, upshot is that whatever was changed in the download snapshot did fix the DNS issues and allowed the pull secret to be correctly installed.

@cfergeau
Copy link
Contributor

For what it's worth, I've just checked that if I download https://developers.redhat.com/content-gateway/file/pub/openshift-v4/clients/crc/2.35.0/crc-linux-amd64.tar.xz , after running crc setup, I have the right version:

$ ./crc version
CRC version: 2.35.0+3956e8
OpenShift version: 4.15.10
Podman version: 4.4.4

$ ./crc setup
[...]
INFO Checking if crc-admin-helper executable is cached 
INFO Caching crc-admin-helper executable          
[...]

$ ~/.crc/bin/crc-admin-helper-linux --version
admin-helper version 0.5.2

@cfergeau
Copy link
Contributor

cfergeau commented May 14, 2024

(checking the gh actions artifact now...). It seems to be embedding an older version indeed, we'd need to fix that :)

@cfergeau
Copy link
Contributor

cfergeau added a commit to cfergeau/crc that referenced this issue May 14, 2024
`make test-rpmbuild` uses a container images with preinstalled
admin-helper/machine-driver-libvirt RPMs to generate a `crc` binary/rpm
embedding these binaries.
However, the versions used no longer match what crc expects, which
is causing issues.

This was reported in crc-org#4144

Signed-off-by: Christophe Fergeau <cfergeau@redhat.com>
praveenkumar pushed a commit that referenced this issue May 15, 2024
`make test-rpmbuild` uses a container images with preinstalled
admin-helper/machine-driver-libvirt RPMs to generate a `crc` binary/rpm
embedding these binaries.
However, the versions used no longer match what crc expects, which
is causing issues.

This was reported in #4144

Signed-off-by: Christophe Fergeau <cfergeau@redhat.com>
@praveenkumar
Copy link
Member

@phantomjinx We just released 2.36.0 which have the fix, can you try that version and close this issue if it is working for you?

@phantomjinx
Copy link
Author

2.36.0 is good. Thanks for sorting this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working status/need triage
Projects
None yet
Development

No branches or pull requests

4 participants