-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/destroy/bootstrap: Remove bootstrap from DNS for libvirt #614
pkg/destroy/bootstrap: Remove bootstrap from DNS for libvirt #614
Conversation
Without this, round-robin clients will fail when they hit the bootstrap DNS entry (after the bootstrap node stops serving its control plane). The implementation is a bit awkward; I'd have preferred the AWS approach, with: resource "aws_lb_target_group_attachment" "bootstrap" { count = "${var.target_group_arns_length}" target_group_arn = "${var.target_group_arns[count.index]}" target_id = "${aws_instance.bootstrap.private_ip}" } in the bootstrap module. But the libvirt host entries are only available as a subsection of a libvirt_network resource (because the whole network is defined in a single XML object, including the DNS entries [1]). So instead I've added an additional variable which we can tweak to disable the bootstrap entry. The default value for the new variable includes the bootstrap entry for the initial cluster 'apply' call; on destry I override it via an *.auto.tfvars file (which Terraform loads automatically [2]) to remove the bootstrap entry. [1]: https://libvirt.org/formatnetwork.html#elementsAddress [2]: https://www.terraform.io/docs/configuration/variables.html
Logs from a run of this look like: $ openshift-install --dir=wking --log-level=debug create cluster
...
libvirt_network.tectonic_net: Destroying... (ID: d5b38c5b-6105-4b57-b0b6-448bb1f4e87c)
libvirt_network.tectonic_net: Destruction complete after 5s
libvirt_network.tectonic_net: Creating...
addresses.#: "" => "1"
addresses.0: "" => "192.168.126.0/24"
autostart: "" => "true"
bridge: "" => "tt0"
dns.#: "" => "1"
dns.0.hosts.#: "" => "3"
dns.0.hosts.0.hostname: "" => "wking-api"
dns.0.hosts.0.ip: "" => "192.168.126.11"
dns.0.hosts.1.hostname: "" => "wking-etcd-0"
dns.0.hosts.1.ip: "" => "192.168.126.11"
dns.0.hosts.2.hostname: "" => "wking"
dns.0.hosts.2.ip: "" => "192.168.126.50"
dns.0.local_only: "" => "true"
dns.0.srvs.#: "" => "1"
dns.0.srvs.0.domain: "" => "wking.installer.testing"
dns.0.srvs.0.port: "" => "2380"
dns.0.srvs.0.protocol: "" => "tcp"
dns.0.srvs.0.service: "" => "etcd-server-ssl"
dns.0.srvs.0.target: "" => "wking-etcd-0.installer.testing"
dns.0.srvs.0.weight: "" => "10"
domain: "" => "installer.testing"
mode: "" => "nat"
name: "" => "wking"
libvirt_network.tectonic_net: Creation complete after 6s (ID: 787999bb-5eed-44fa-aa0e-4c440cecce4a)
...
module.bootstrap.libvirt_domain.bootstrap: Destroying... (ID: e6849cb6-f146-444d-aa63-edf9c70f4a86)
module.bootstrap.libvirt_domain.bootstrap: Destruction complete after 1s
module.bootstrap.libvirt_volume.bootstrap: Destroying... (ID: /home/trking/VirtualMachines/bootstrap)
module.bootstrap.libvirt_ignition.bootstrap: Destroying... (ID: /home/trking/VirtualMachines/bootstrap.ign;5be087d6-cfd0-ed41-e644-dedb336a089d)
module.bootstrap.libvirt_volume.bootstrap: Destruction complete after 0s
module.bootstrap.libvirt_ignition.bootstrap: Destruction complete after 0s
Destroy complete! Resources: 3 destroyed. And you can confirm the cleaned-up DNS entries with: $ dig wking-api.installer.testing +short
192.168.126.11 |
ok, while this gives us the correct DNS entries, it seems to clear the master IP address. After a successful auto-bootstrap-removal: $ dig wking-api.installer.testing +short
192.168.126.11
$ ping -c1 192.168.126.11
PING 192.168.126.11 (192.168.126.11) 56(84) bytes of data.
From 192.168.126.1 icmp_seq=1 Destination Host Unreachable
--- 192.168.126.11 ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms
The Terraform state still claims IP addresses for the master: $ terraform state show libvirt_domain.master | grep address
network_interface.0.addresses.# = 1
network_interface.0.addresses.0 = 192.168.126.11 But $ virsh -c "${OPENSHIFT_INSTALL_LIBVIRT_URI}" domifaddr master0
Name MAC address Protocol Address
-------------------------------------------------------------------------------
I'm not sure what's going on there yet. |
Ah, perhaps this is what's going on:
We may be able to update the logic from @eparis' dmacvicar/terraform-provider-libvirt#382 to allow for |
I've filed dmacvicar/terraform-provider-libvirt#468 for this and will see if I can work up a fix. |
Fix in flight with dmacvicar/terraform-provider-libvirt#469, after which, the post- $ virsh -c "${OPENSHIFT_INSTALL_LIBVIRT_URI}" domifaddr master0
Name MAC address Protocol Address
-------------------------------------------------------------------------------
vnet1 f2:0f:d9:25:11:aa ipv4 192.168.126.11/24
$ virsh -c "${OPENSHIFT_INSTALL_LIBVIRT_URI}" net-dumpxml wking | grep -A9 '<dns>'
<dns>
<srv service='etcd-server-ssl' protocol='tcp' domain='wking.installer.testing' target='wking-etcd-0.installer.testing' port='2380' weight='10'/>
<host ip='192.168.126.11'>
<hostname>wking-api</hostname>
<hostname>wking-etcd-0</hostname>
</host>
<host ip='192.168.126.50'>
<hostname>wking</hostname>
</host>
</dns>
$ dig wking-api.installer.testing +short
192.168.126.11 @sjenning, can you try again with a fresh plugin build from that upstream PR? $ ~/.terraform.d/plugins/terraform-provider-libvirt -version
/home/trking/.terraform.d/plugins/terraform-provider-libvirt ef44231327555f33fb4311de442c287bc0d7fbf6
Compiled against library: libvirt 3.9.0
Using library: libvirt 3.9.0
Running hypervisor: QEMU 2.9.0
Running against daemon: 3.9.0 |
/hold |
I'm fine waiting on upstream. If they take more than a few days, we can revisit temporarily pointing at a fork. |
👍 sounds good. unless @crawford recommends going with fork quicker. |
/hold cancel dmacvicar/terraform-provider-libvirt#469 landed :). |
testing locally. automatic bootstrap deletion was stuck and had to manually invoke $ dig adahiya-0-api.tt.testing +noall +answer
; <<>> DiG 9.11.4-P1-RedHat-9.11.4-5.P1.fc28 <<>> adahiya-0-api.tt.testing +noall +answer
;; global options: +cmd
adahiya-0-api.tt.testing. 0 IN A 192.168.126.11 $ oc get nodes
NAME STATUS ROLES AGE VERSION
adahiya-0-master-0 Ready master 18m v1.11.0+d4cacc0
adahiya-0-worker-0-tb78z Ready worker 9m v1.11.0+d4cacc0 /approve |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: abhinavdahiya, wking The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Just did an install this morning that seemed to be successful but AFAICS my master and worker didn't have IP addresses; looking in the logs:
I did have an |
Without this, round-robin clients will fail when they hit the bootstrap DNS entry (after the bootstrap node stops serving its control plane).
The implementation is a bit awkward; I'd have preferred the AWS approach, with:
in the bootstrap module. But the libvirt host entries are only available as a subsection of a
libvirt_network
resource (because the whole network is defined in a single XML object, including the DNS entries). So instead I've added an additional variable which we can tweak to disable the bootstrap entry. The default value for the new variable includes the bootstrap entry for the initial clusterapply
call; on destry I override it via an*.auto.tfvars
file (which Terraform loads automatically) to remove the bootstrap entry.CC @sjenning