data/bootstrap/files/usr/local/bin/bootkube.sh.template: Set --tear-down-event #1147

wking · 2019-01-29T22:08:53Z

In master there's a bit of a window during the bootstrap-teardown dance:

cluster-bootstrap sees the requested pods.
cluster-bootstrap shuts itself down.
openshift.sh pushes the OpenShift-specific manifests.
report-progress.sh pushes bootstrap-complete.
The installer sees bootstrap-complete and removes the bootstrap resources, including the bootstrap load-balancer targets.
Subsequent Kubernetes API traffic hits the production control plane.

That leaves a fairly large window from 3 through 5 where Kubernetes API requests could be routed to the bootstrap machine and dropped because it no longer has anything listening on 6443.

With this commit, we take advantage of openshift/cluster-bootstrap@d07548e3 (openshift/cluster-bootstrap#9) to drop step 2 (waiting for an event we never send). That leaves the bootstrap control-plane running until we destroy that machine. We take advantage of openshift/cluster-bootstrap@180599bc (openshift/cluster-bootstrap#13) to replace our previous openshift.sh (with a minor change to the manifest directory). And we take advantage of openshift/cluster-bootstrap@e5095848 (openshift/cluster-bootstrap#9) to replace our previous report-progress.sh (with a minor change to the event name).

Also set --strict, because we want to fail-fast for these resources. The user is unlikely to scrape them out of the installer state and push them by hand if we fail to push them from the bootstrap node.

With these changes, the new transition is:

cluster-bootstrap sees the requested pods.
cluster-bootstrap pushes the OpenShift-specific manifests.
cluster-bootstrap pushes bootstrap-success.
The installer sees bootstrap-success and removes the bootstrap resources, including the bootstrap load-balancer targets.
subsequent Kubernetes API traffic hits the production control plane.

There's still a small window for lost Kubernetes API traffic:

The Terraform tear-down could remove the bootstrap machine before it removes the bootstrap load-balancer target, leaving the target pointing into empty space.
Bootstrap teardown does not allow existing client connections to drain after removing the load balancer target before removing the bootstrap machine.

Both of these could be addressed by:

Remove the bootstrap load-balancer targets.
Wait for the 30 seconds (healthy_threshold * interval for our aws_lb_target_group) for the load-balancer to notice the production control-plane targets are live. This assumes the post-pod manifests are all pushed in zero seconds, so it's overly conservative, but waiting an extra 30 seconds isn't a large cost.
Remove the remaining bootstrap resources, including the bootstrap machine.

But even without that delay, this commit reduces the window compared to what we have in master. I'll land the delay in follow-up work.

CC @sttts

We'll need openshift/cluster-bootstrap#13 for this, so...

/hold

…own-event In master there's a bit of a window during the bootstrap-teardown dance: 1. cluster-bootstrap sees the requested pods. 2. cluster-bootstrap shuts itself down. 3. openshift.sh pushes the OpenShift-specific manifests. 4. report-progress.sh pushes bootstrap-complete. 5. The installer sees bootstrap-complete and removes the bootstrap resources, including the bootstrap load-balancer targets. 6. subsequent Kubernetes API traffic hits the production control plane. That leaves a fairly large window from 3 through 5 where Kubernetes API requests could be routed to the bootstrap machine and dropped because it no longer has anything listening on 6443. With this commit, we take advantage of openshift/cluster-bootstrap@d07548e3 (Add --tear-down-event flag to delay tear down, 2019-01-24, openshift/cluster-bootstrap#9) to drop step 2 (waiting for an event we never send). That leaves the bootstrap control-plane running until we destroy that machine. We take advantage of openshift/cluster-bootstrap@180599bc (pkg/start/asset: Add support for post-pod-manifests, 2019-01-29, openshift/cluster-bootstrap#13) to replace our previous openshift.sh (with a minor change to the manifest directory). And we take advantage of openshift/cluster-bootstrap@e5095848 (Create bootstrap-success event before tear down, 2019-01-24, openshift/cluster-bootstrap#9) to replace our previous report-progress.sh (with a minor change to the event name). Also set --strict, because we want to fail-fast for these resources. The user is unlikely to scrape them out of the installer state and push them by hand if we fail to push them from the bootstrap node. With these changes, the new transition is: 1. cluster-bootstrap sees the requested pods. 2. cluster-bootstrap pushes the OpenShift-specific manifests. 3. cluster-bootstrap pushes bootstrap-success. 4. The installer sees bootstrap-success and removes the bootstrap resources, including the bootstrap load-balancer targets. 5. subsequent Kubernetes API traffic hits the production control plane. There's still a small window for lost Kubernetes API traffic: * The Terraform tear-down could remove the bootstrap machine before it removes the bootstrap load-balancer target, leaving the target pointing into empty space. * Bootstrap teardown does not allow existing client connections to drain after removing the load balancer target before removing the bootstrap machine. Both of these could be addressed by: 1. Remove the bootstrap load-balancer targets. 2. Wait for the 30 seconds (healthy_threshold * interval for our aws_lb_target_group [1]) for the load-balancer to notice the production control-plane targets are live. This assumes the post-pod manifests are all pushed in zero seconds, so it's overly conservative, but waiting an extra 30 seconds isn't a large cost. 3. Remove the remaining bootstrap resources, including the bootstrap machine. But even without that delay, this commit reduces the window compared to what we have in master. I'll land the delay in follow-up work. [1]: https://docs.aws.amazon.com/elasticloadbalancing/latest/network/target-group-health-checks.html

openshift-ci-robot · 2019-01-29T22:09:09Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

abhinavdahiya · 2019-01-29T23:42:46Z

We take advantage of openshift/cluster-bootstrap@180599b (openshift/cluster-bootstrap#13) to replace our previous openshift.sh (with a minor change to the manifest directory).

this is probably not going to work openshift/cluster-bootstrap#13 (comment)

sttts · 2019-01-30T10:10:10Z

data/data/bootstrap/files/usr/local/bin/bootkube.sh.template

+ start \
+ --asset-dir=/assets \
+ --strict \
+ --tear-down-event run-forever \


I am not convinced this is a good idea. We do not solve the load balancer problem, we only push it from ELB related to be kube-proxy related. All clients using in-cluster connection using the service IP will fall over (in the best case when the bootstrap machine is terminated). In the worst case, they will have pending watch TCP connection to a machine which is not there anymore.

@squeed can you comment about that? How do TCP connections to service IPs behave if the target machine disappears?

sttts · 2019-01-30T10:13:57Z

I am against this PR. It takes the ELB out of the failure equation, but at the same time introduces a time period of uncertainty with service IPs. This is bad. Let's solve the problem by proper synchronization instead of fixing some issues and creating new ones for other teams.

A correct way would be:

cluster-bootstrap sees the requested pods.
cluster-bootstrap pushes the OpenShift-specific manifests.
cluster-bootstrap pushes bootstrap-success.
The installer sees bootstrap-success and removes the bootstrap resources, including the bootstrap load-balancer targets.
installer sends the tear down event to cluster-bootstrap
cluster-bootstrap shuts down and removes itself from the in-cluster service IP endpoints
subsequent Kubernetes API ELB traffic hits the production control plane, subsequent Kubernetes API in-cluster traffic hits the production control plane
terraform takes down the bootstrap node.

mfojtik · 2019-01-30T10:29:42Z

Agree with @sttts. This change can potentially make other components unstable and introduce even more flakes. I think the proper solution would be what Stefan described. The cluster-bootstrap should tear down the bootstrap control plane when it receives an event from the installer saying i acknowledged bootstrap-success, removed whatever needed to be removed (ELB) and asking you to tear down the control plane. Then bootstrap can send another event saying "bootstrap-ready-to-shutdown" and then the installer can delete the bootstrap node.

wking · 2019-01-30T15:46:15Z

This change can potentially make other components unstable and introduce even more flakes.

I still don't see how a hard-shutdown of the bootstrap machine would be more disruptive than a production-master reboot (e.g. for a RHCOS update). But once some form of openshift/cluster-bootstrap#13 (which we all agree we want?) has landed, we can use CI testing for this PR to explore these different proposals.

sttts · 2019-01-30T16:11:17Z

I still don't see how a hard-shutdown of the bootstrap machine would be more disruptive than a production-master reboot (e.g. for a RHCOS update).

What is a hard shutdown? Switching off the machine like "power off"? Or is it a controlled shutdown where every process gets a TERM signal, some time to terminate and only then a kill? If it's the later, we should be fine. If it is the pure removal of the machine, it's not good.

We were fighting with those cases in other contexts like etcd where connections got stuck (timeouts like 30 min), only very recent etcd client got that under control. Killing the bootstrap machine will bring us into the same situation again (not necessarily with etcd of course, but client connection to the apiserver) and we want to avoid that.

In short: kube-apiserver needs the TERM signal. Everything else is bad.

sttts · 2019-01-30T16:12:22Z

Also note that a RHCOS update will preceded by a draining of the node. That way every service can do whatever it has to do for graceful termination.

crawford · 2019-01-30T23:18:11Z

In short: kube-apiserver needs the TERM signal. Everything else is bad.

Is this just the current state of affairs or is there some technical reason this will always be the case? We must be able to handle hard reboots and networking outtages.

deads2k · 2019-01-31T13:27:41Z

In short: kube-apiserver needs the TERM signal. Everything else is bad.

Is this just the current state of affairs or is there some technical reason this will always be the case? We must be able to handle hard reboots and networking outtages.

Handling of an unexpected condition like a machine shutdown is different than handling controlled shut downs. In the example Stefan provided, a hard shutdown is tolerated in the sense that traffic is routed, dropped, and well written clients can retry. However, given an appropriate signal, we can avoid unnecessary blips by driving traffic away and quiecing over a few seconds to allow short lived calls to complete, controlled removal of leases and endpoints, and load balancers to see rejecting traffic before disappearing entirely.

We aren't saying we can't handle being shutdown. We are saying that is objectively more disruptive in a way that appears to be unnecessary.

openshift-ci-robot · 2019-02-18T10:04:11Z

@wking: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot · 2019-05-15T19:25:00Z

@wking: The following tests failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
ci/prow/e2e-aws-upgrade	`4f68502`	link	`/test e2e-aws-upgrade`
ci/prow/verify-vendor	`4f68502`	link	`/test verify-vendor`
ci/prow/e2e-aws	`4f68502`	link	`/test e2e-aws`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

abhinavdahiya · 2019-05-15T19:36:48Z

The installer needs to continue to be READER for the api. Any dependency on bootstrapping the cluster on installer having access to the API to push things is a no-go.

/close

openshift-ci-robot · 2019-05-15T19:36:51Z

@abhinavdahiya: Closed this PR.

In response to this:

The installer needs to continue to be READER for the api. Any dependency on bootstrapping the cluster on installer having access to the API to push things is a no-go.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 29, 2019

openshift-ci-robot requested review from hardys and russellb January 29, 2019 22:09

openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 29, 2019

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 29, 2019

wking mentioned this pull request Jan 29, 2019

pkg/destroy/bootstrap: Separate load-balancer target teardown #1148

Closed

wking added the kind/bug Categorizes issue or PR as related to a bug. label Jan 29, 2019

wking mentioned this pull request Jan 29, 2019

Support providing MachineConfig objects directly #1144

Closed

sttts reviewed Jan 30, 2019

View reviewed changes

sttts mentioned this pull request Jan 30, 2019

pkg/start/asset: Add support for post-pod-manifests openshift/cluster-bootstrap#13

Closed

wking mentioned this pull request Feb 4, 2019

bootkube: Inject bootstrap MachineConfigs into cluster #1189

Closed

openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 18, 2019

wking mentioned this pull request Mar 7, 2019

Bug 1748452: data/bootstrap: Replace openshift.sh with cluster-bootstrap #1381

Merged

wking mentioned this pull request Apr 22, 2019

bootstrap: don't make openshift unit wait for kas #1659

Merged

openshift-ci-robot closed this May 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data/bootstrap/files/usr/local/bin/bootkube.sh.template: Set --tear-down-event #1147

data/bootstrap/files/usr/local/bin/bootkube.sh.template: Set --tear-down-event #1147

wking commented Jan 29, 2019

openshift-ci-robot commented Jan 29, 2019

abhinavdahiya commented Jan 29, 2019

sttts Jan 30, 2019

sttts commented Jan 30, 2019 •

edited

Loading

mfojtik commented Jan 30, 2019 •

edited

Loading

wking commented Jan 30, 2019

sttts commented Jan 30, 2019 •

edited

Loading

sttts commented Jan 30, 2019

crawford commented Jan 30, 2019

deads2k commented Jan 31, 2019

openshift-ci-robot commented Feb 18, 2019

openshift-ci-robot commented May 15, 2019

abhinavdahiya commented May 15, 2019

openshift-ci-robot commented May 15, 2019

data/bootstrap/files/usr/local/bin/bootkube.sh.template: Set --tear-down-event #1147

data/bootstrap/files/usr/local/bin/bootkube.sh.template: Set --tear-down-event #1147

Conversation

wking commented Jan 29, 2019

openshift-ci-robot commented Jan 29, 2019

abhinavdahiya commented Jan 29, 2019

sttts Jan 30, 2019

Choose a reason for hiding this comment

sttts commented Jan 30, 2019 • edited Loading

mfojtik commented Jan 30, 2019 • edited Loading

wking commented Jan 30, 2019

sttts commented Jan 30, 2019 • edited Loading

sttts commented Jan 30, 2019

crawford commented Jan 30, 2019

deads2k commented Jan 31, 2019

openshift-ci-robot commented Feb 18, 2019

openshift-ci-robot commented May 15, 2019

abhinavdahiya commented May 15, 2019

openshift-ci-robot commented May 15, 2019

sttts commented Jan 30, 2019 •

edited

Loading

mfojtik commented Jan 30, 2019 •

edited

Loading

sttts commented Jan 30, 2019 •

edited

Loading