Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/destroy/bootstrap: Separate load-balancer target teardown #1148

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 6 additions & 3 deletions cmd/openshift-install/create.go
Original file line number Diff line number Diff line change
Expand Up @@ -229,7 +229,7 @@ func destroyBootstrap(ctx context.Context, config *rest.Config, directory string
events := client.CoreV1().Events("kube-system")

eventTimeout := 30 * time.Minute
logrus.Infof("Waiting up to %v for the bootstrap-complete event...", eventTimeout)
logrus.Infof("Waiting up to %v for the bootstrap-success event...", eventTimeout)
eventContext, cancel := context.WithTimeout(ctx, eventTimeout)
defer cancel()
_, err = Until(
Expand Down Expand Up @@ -268,13 +268,16 @@ func destroyBootstrap(ctx context.Context, config *rest.Config, directory string
}

logrus.Debugf("added %s: %s", event.Name, event.Message)
return event.Name == "bootstrap-complete", nil
return event.Name == "bootstrap-success", nil
},
)
if err != nil {
return errors.Wrap(err, "waiting for bootstrap-complete")
return errors.Wrap(err, "waiting for bootstrap-success")
}

logrus.Info("Waiting 30 seconds for the production control-plane to enter the load balancer")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you have to send tear down event. Otherwise no draining will happen.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you have to send tear down event. Otherwise no draining will happen.

This isn't for draining. This is waiting for the production control plane to enter the load balancer. The 10-second sleep in the destroy logic is for draining.

time.Sleep(30*time.Second)

logrus.Info("Destroying the bootstrap resources...")
return destroybootstrap.Destroy(rootOpts.dir)
}
Expand Down
4 changes: 2 additions & 2 deletions data/data/aws/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@ module "bootstrap" {
iam_role = "${var.aws_master_iam_role_name}"
ignition = "${var.ignition_bootstrap}"
subnet_id = "${module.vpc.master_subnet_ids[0]}"
target_group_arns = "${module.vpc.aws_lb_target_group_arns}"
target_group_arns_length = "${module.vpc.aws_lb_target_group_arns_length}"
target_group_arns = "${var.bootstrap_load_balancer_targets ? module.vpc.aws_lb_target_group_arns : []}"
target_group_arns_length = "${var.bootstrap_load_balancer_targets ? module.vpc.aws_lb_target_group_arns_length : 0}"
vpc_id = "${module.vpc.vpc_id}"
vpc_security_group_ids = "${list(module.vpc.master_sg_id)}"

Expand Down
6 changes: 5 additions & 1 deletion data/data/bootstrap/files/usr/local/bin/bootkube.sh.template
Original file line number Diff line number Diff line change
Expand Up @@ -211,7 +211,11 @@ podman run \
--volume /etc/kubernetes:/etc/kubernetes:z \
--network=host \
"${CLUSTER_BOOTSTRAP_IMAGE}" \
start --asset-dir=/assets --required-pods openshift-kube-apiserver/openshift-kube-apiserver,openshift-kube-scheduler/openshift-kube-scheduler,openshift-kube-controller-manager/openshift-kube-controller-manager,openshift-cluster-version/cluster-version-operator
start \
--asset-dir=/assets \
--strict \
--tear-down-event run-forever \
--required-pods openshift-kube-apiserver/openshift-kube-apiserver,openshift-kube-scheduler/openshift-kube-scheduler,openshift-kube-controller-manager/openshift-kube-controller-manager,openshift-cluster-version/cluster-version-operator

# Workaround for https://github.com/opencontainers/runc/pull/1807
touch /opt/openshift/.bootkube.done
71 changes: 0 additions & 71 deletions data/data/bootstrap/files/usr/local/bin/openshift.sh

This file was deleted.

38 changes: 0 additions & 38 deletions data/data/bootstrap/files/usr/local/bin/report-progress.sh

This file was deleted.

12 changes: 0 additions & 12 deletions data/data/bootstrap/systemd/units/openshift.service

This file was deleted.

14 changes: 0 additions & 14 deletions data/data/bootstrap/systemd/units/progress.service

This file was deleted.

5 changes: 5 additions & 0 deletions data/data/config.tf
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,11 @@ terraform {
required_version = ">= 0.10.7"
}

variable "bootstrap_load_balancer_targets" {
default = true
description = "Whether to include load-balancer targets for the bootstrap machine."
}

variable "machine_cidr" {
type = "string"

Expand Down
2 changes: 1 addition & 1 deletion data/data/libvirt/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ resource "libvirt_domain" "master" {
}

data "libvirt_network_dns_host_template" "bootstrap" {
count = "${var.bootstrap_dns ? 1 : 0}"
count = "${var.bootstrap_load_balancer_targets ? 1 : 0}"
ip = "${var.libvirt_bootstrap_ip}"
hostname = "${var.cluster_name}-api"
}
Expand Down
5 changes: 0 additions & 5 deletions data/data/libvirt/variables-libvirt.tf
Original file line number Diff line number Diff line change
@@ -1,8 +1,3 @@
variable "bootstrap_dns" {
default = true
description = "Whether to include DNS entries for the bootstrap node or not."
}

variable "libvirt_uri" {
type = "string"
description = "libvirt connection URI"
Expand Down
2 changes: 1 addition & 1 deletion docs/design/assetgeneration.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ $ openshift-install create install-config
# Generate install-config.yaml

$ openshift-install create manifests
# Generate manifests/ and openshift/ dir, also remove install-config.yaml
# Generate manifests/ and post-pod-manifests/ directories, also remove install-config.yaml
```

## Target generation
Expand Down
2 changes: 1 addition & 1 deletion docs/user/aws/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Step 3: Download the Installer.
[~]$ openshift-install-linux-amd64 create cluster
INFO Waiting up to 30m0s for the Kubernetes API...
INFO API v1.11.0+85a0623 up
INFO Waiting up to 30m0s for the bootstrap-complete event...
INFO Waiting up to 30m0s for the bootstrap-success event...
INFO Destroying the bootstrap resources...
INFO Waiting up to 10m0s for the openshift-console route to be created...
INFO Install complete!
Expand Down
2 changes: 1 addition & 1 deletion pkg/asset/ignition/bootstrap/bootstrap.go
Original file line number Diff line number Diff line change
Expand Up @@ -221,7 +221,7 @@ func (a *Bootstrap) addStorageFiles(base string, uri string, templateData *boots

func (a *Bootstrap) addSystemdUnits(uri string, templateData *bootstrapTemplateData) (err error) {
enabled := map[string]struct{}{
"progress.service": {},
"bootkube.service": {},
"kubelet.service": {},
"systemd-journal-gatewayd.socket": {},
}
Expand Down
2 changes: 1 addition & 1 deletion pkg/asset/manifests/openshift.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ import (
)

const (
openshiftManifestDir = "openshift"
openshiftManifestDir = "post-pod-manifests"
)

var (
Expand Down
30 changes: 15 additions & 15 deletions pkg/destroy/bootstrap/bootstrap.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ import (
"io/ioutil"
"os"
"path/filepath"
"time"

"github.com/openshift/installer/pkg/asset/cluster"
"github.com/openshift/installer/pkg/terraform"
Expand All @@ -24,17 +25,12 @@ func Destroy(dir string) (err error) {
return errors.New("no platform configured in metadata")
}

copyNames := []string{terraform.StateFileName, cluster.TfVarsFileName}

if platform == "libvirt" {
err = ioutil.WriteFile(filepath.Join(dir, "disable-bootstrap.tfvars"), []byte(`{
"bootstrap_dns": false
err = ioutil.WriteFile(filepath.Join(dir, "disable-bootstrap-load-balancer-targets.tfvars"), []byte(`{
"bootstrap_load_balancer_targets": false
}
`), 0666)
if err != nil {
return err
}
copyNames = append(copyNames, "disable-bootstrap.tfvars")
if err != nil {
return err
}

tempDir, err := ioutil.TempDir("", "openshift-install-")
Expand All @@ -43,20 +39,24 @@ func Destroy(dir string) (err error) {
}
defer os.RemoveAll(tempDir)

for _, filename := range copyNames {
for _, filename := range []string{
terraform.StateFileName,
cluster.TfVarsFileName,
"disable-bootstrap-load-balancer-targets.tfvars",
} {
err = copy(filepath.Join(dir, filename), filepath.Join(tempDir, filename))
if err != nil {
return errors.Wrapf(err, "failed to copy %s to the temporary directory", filename)
}
}

if platform == "libvirt" {
_, err = terraform.Apply(tempDir, platform, fmt.Sprintf("-var-file=%s", filepath.Join(tempDir, "disable-bootstrap.tfvars")))
if err != nil {
return errors.Wrap(err, "Terraform apply")
}
_, err = terraform.Apply(tempDir, platform, fmt.Sprintf("-var-file=%s", filepath.Join(tempDir, "disable-bootstrap-load-balancer-targets.tfvars")))
if err != nil {
return errors.Wrap(err, "Terraform apply")
}

time.Sleep(10 * time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this?

This is

  1. Wait 10 seconds for requests to the bootstrap machine to drain out.

from my topic post.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the AWS network load-balancer docs:

Connection draining ensures that in-flight requests complete before existing connections are closed. The initial state of a deregistering target is draining. By default, the state of a deregistering target changes to unused after 300 seconds. To change the amount of time that Elastic Load Balancing waits before changing the state to unused, update the deregistration delay value. We recommend that you specify a value of at least 120 seconds to ensure that requests are completed.

And from here:

Deregistering a target removes it from your target group, but does not affect the target otherwise. The load balancer stops routing requests to a target as soon as it is deregistered. The target enters the draining state until in-flight requests have completed.

The Terraform attachment-deletion logic is here, and while it fires a deregister request, it does not wait around for draining to complete. I don't see any issues in the provider repository about waiting for the unused state, but we could push something like that if we wanted more finesse here than a 10-second cross-platform sleep. I'm also fine just saying "we know who our consumers are at this point, and none of them will keep an open request going for more than 10 seconds", which is how I have it now. Thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed 5ee7650 -> a349ed1 working the above into the commit message.


err = terraform.Destroy(tempDir, platform, "-target=module.bootstrap")
if err != nil {
return errors.Wrap(err, "Terraform destroy")
Expand Down