Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RHEL9: libvirt-sock file not found error during cluster bringup #1657

Open
pperiyasamy opened this issue May 2, 2024 · 0 comments
Open

RHEL9: libvirt-sock file not found error during cluster bringup #1657

pperiyasamy opened this issue May 2, 2024 · 0 comments

Comments

@pperiyasamy
Copy link

Describe the bug

OCP cluster installation failed with error:

failed to dial libvirt: dial unix /var/run/libvirt/libvirt-sock: connect: no such file or directory

To Reproduce

Bring up OCP cluster (4.15 nightly) with following steps:

$ git clone https://github.com/openshift-metal3/dev-scripts && \
cd dev-scripts/
# make
$ diff config_example.sh config_peri.sh
12c12
< export CI_TOKEN=''
---
> export CI_TOKEN='xxxxxxxx'
36,37c36,37
< #
< #export OPENSHIFT_RELEASE_STREAM=4.15
---
> 
> export OPENSHIFT_RELEASE_STREAM=4.15
227c227
< #export IP_STACK=v4
---
> export IP_STACK=v4
294c294
< #export NETWORK_TYPE="OVNKubernetes"
---
> export NETWORK_TYPE="OVNKubernetes
$ cat /etc/os-release 
NAME="Red Hat Enterprise Linux"
VERSION="9.4 (Plow)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="9.4"
PLATFORM_ID="platform:el9"
PRETTY_NAME="Red Hat Enterprise Linux 9.4 (Plow)"
ANSI_COLOR="0;31"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:redhat:enterprise_linux:9::baseos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 9"
REDHAT_BUGZILLA_PRODUCT_VERSION=9.4
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="9.

Expected/observed behavior

level=debug msg=[INFO] running Terraform command: /home/peri/dev-scripts/ocp/ostest/terraform/bin/terraform init -no-color -input=false -backend=true -get=true -upgrade=false -plugin-dir=/home/peri/dev-scripts/ocp/ostest/terraform/plugins
level=debug
level=debug msg=Initializing the backend...
level=debug
level=debug msg=Initializing provider plugins...
level=debug msg=- Finding latest version of openshift/local/ironic...
level=debug msg=- Finding latest version of openshift/local/libvirt...
level=debug msg=- Installing openshift/local/ironic v1.0.0...
level=debug msg=- Installed openshift/local/ironic v1.0.0 (unauthenticated)
level=debug msg=- Installing openshift/local/libvirt v1.0.0...
level=debug msg=- Installed openshift/local/libvirt v1.0.0 (unauthenticated)
level=debug
level=debug msg=Terraform has created a lock file .terraform.lock.hcl to record the provider
level=debug msg=selections it made above. Include this file in your version control repository
level=debug msg=so that Terraform can guarantee to make the same selections by default when
level=debug msg=you run "terraform init" in the future.
level=debug
level=debug
level=debug msg=Warning: Incomplete lock file information for providers
level=debug
level=debug msg=Due to your customized provider installation methods, Terraform was forced to
level=debug msg=calculate lock file checksums locally for the following providers:
level=debug msg=  - openshift/local/ironic
level=debug msg=  - openshift/local/libvirt
level=debug
level=debug msg=The current .terraform.lock.hcl file only includes checksums for linux_amd64,
level=debug msg=so Terraform running on another platform will fail to install these
level=debug msg=providers.
level=debug
level=debug msg=To calculate additional checksums for another platform, run:
level=debug msg=  terraform providers lock -platform=linux_amd64
level=debug msg=(where linux_amd64 is the platform to generate)
level=debug
level=debug msg=Terraform has been successfully initialized!
level=debug msg=[INFO] running Terraform command: /home/peri/dev-scripts/ocp/ostest/terraform/bin/terraform apply -no-color -auto-approve -input=false -var-file=/tmp/openshift-install-bootstrap-3795376676/terraform.tfvars.json -var-file=/tmp/openshift-install-bootstrap-3795376676/terraform.platform.auto.tfvars.json -lock=true -parallelism=10 -refresh=true
level=error
level=error msg=Error: failed to dial libvirt: dial unix /var/run/libvirt/libvirt-sock: connect: no such file or directory
level=error
level=error msg=  with provider["openshift/local/libvirt"],
level=error msg=  on main.tf line 1, in provider "libvirt":
level=error msg=   1: provider "libvirt" {
level=error
level=error msg=failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failure applying terraform for "bootstrap" stage: error applying Terraform configs: failed to apply Terraform: exit status 1
level=error
level=error msg=Error: failed to dial libvirt: dial unix /var/run/libvirt/libvirt-sock: connect: no such file or directory
level=error
level=error msg=  with provider["openshift/local/libvirt"],
level=error msg=  on main.tf line 1, in provider "libvirt":
level=error msg=   1: provider "libvirt" {
level=error
level=error
+(utils.sh:1): create_cluster(): auth_template_and_removetmp
+(utils.sh:866): auth_template_and_removetmp(): echo 4
+(utils.sh:867): auth_template_and_removetmp(): generate_auth_template
+(utils.sh:327): generate_auth_template(): set +x
E0502 06:48:12.764378   73376 memcache.go:265] couldn't get current server API group list: Get "https://api.ostest.test.metalkube.org:6443/api?timeout=32s": dial tcp 192.168.111.5:6443: connect: no route to host
E0502 06:48:15.836414   73376 memcache.go:265] couldn't get current server API group list: Get "https://api.ostest.test.metalkube.org:6443/api?timeout=32s": dial tcp 192.168.111.5:6443: connect: no route to host
E0502 06:48:18.908310   73376 memcache.go:265] couldn't get current server API group list: Get "https://api.ostest.test.metalkube.org:6443/api?timeout=32s": dial tcp 192.168.111.5:6443: connect: no route to host
E0502 06:48:21.980273   73376 memcache.go:265] couldn't get current server API group list: Get "https://api.ostest.test.metalkube.org:6443/api?timeout=32s": dial tcp 192.168.111.5:6443: connect: no route to host
E0502 06:48:25.052182   73376 memcache.go:265] couldn't get current server API group list: Get "https://api.ostest.test.metalkube.org:6443/api?timeout=32s": dial tcp 192.168.111.5:6443: connect: no route to host
Unable to connect to the server: dial tcp 192.168.111.5:6443: connect: no route to host

Additional context

The following change in configure host script fixes the problem.

$ git diff
diff --git a/02_configure_host.sh b/02_configure_host.sh
index 4f1ef60..f40d14f 100755
--- a/02_configure_host.sh
+++ b/02_configure_host.sh
@@ -31,6 +31,7 @@ manage_libvirtd() {
           sudo systemctl restart libvirtd.service
         ;;
 esac
+sudo systemctl restart libvirtd.service
 }
 
 # Generate user ssh key
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant