Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cvo: Release our leader lease when we are gracefully terminated #87

Merged
merged 1 commit into from
Jan 18, 2019

Conversation

smarterclayton
Copy link
Contributor

On an upgrade the CVO is always restarted but we don't release our
lease, which causes the new CVO pod to have to wait. We should
release our lease on exit.

Kube 1.14 (pr 71490) contains a new flag on leader lease that allows
the caller to step down gracefully. Because backporting that change
to client-go is complicated, we instead emulate the logic. When that
code is available we can simplify down.

This also makes install about 90s faster

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 17, 2019
@openshift-ci-robot openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jan 17, 2019
cmd/start.go Outdated
time.Sleep(100 * time.Millisecond)
// if we still hold the leader lease, clear the owner identity (other lease watchers
// still have to wait for expiration) like the new ReleaseOnCancel code will do.
if err := lock.Update(resourcelock.LeaderElectionRecord{}); err == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reseting to empty looses the https://github.com/kubernetes/client-go/blob/v9.0.0/tools/leaderelection/resourcelock/interface.go#L42 is there a way we can step down without completely loosing that information...?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other lease watchers still have to wait for expiration

if we reset the https://github.com/kubernetes/client-go/blob/v9.0.0/tools/leaderelection/resourcelock/interface.go#L39-L41 wouldn't the next candidate immediately try to reelect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, see the upstream PR - clients wait until the time is clear. They don't step down today until my PR lands in 1.14.

cmd/start.go Outdated
// still have to wait for expiration) like the new ReleaseOnCancel code will do.
if err := lock.Update(resourcelock.LeaderElectionRecord{}); err == nil {
// if we successfully clear the owner identity, we can safely delete the record
if err := lock.Client.ConfigMaps(lock.ConfigMapMeta.Namespace).Delete(lock.ConfigMapMeta.Name, nil); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why delete the config map at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleting the config map is the only way to trigger a new election today.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, okay

@smarterclayton
Copy link
Contributor Author

/hold

I need to refactor this to be testable in an integration test (cmd/ is too full anyway)

@openshift-ci-robot openshift-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 17, 2019
@smarterclayton
Copy link
Contributor Author

I took your feedback from previous PR and refactored the start command to have reusable code, and then had the integration test use it and cut the internal dependencies. Should be cleaner, and start is noticeably simpler (well, to me).

@smarterclayton
Copy link
Contributor Author

I also added an integration test for step down.

@@ -452,3 +452,8 @@ func (optr *Operator) currentVersion() configv1.Update {
Payload: optr.releaseImage,
}
}

// SetSyncWorkerForTesting updates the sync worker for whitebox testing.
func (optr *Operator) SetSyncWorkerForTesting(worker ConfigSyncWorker) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I contemplated having start lazily configure the worker, but this is a reasonable taint.

@abhinavdahiya
Copy link
Contributor

Will take a look. 👍


// createResourceLock initializes the lock.
func createResourceLock(cb *ClientBuilder, namespace, name string) (*resourcelock.ConfigMapLock, error) {
recorder := record.NewBroadcaster().NewRecorder(runtime.NewScheme(), v1.EventSource{Component: namespace})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ques: shouldn't runtime.NewScheme() be kubernetes.Scheme ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not 100% sure, it doesn’t actually matter for events (we fall back to a generic path if an object isn’t in the scheme)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we didn't have a sink, so we weren't actually recording metrics. I fixed that here, and made sure each of the places in the CVO we log events we were sending it to our namespace rather than the default namespace (which if you pass "" is where you go). Added an integration test to verify we receive a leader event, which is a good proxy for us accidentally breaking it in the future.

}
t.Logf("lock deleted in %s", endTime.Sub(startTime))

select {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this ;)

@abhinavdahiya
Copy link
Contributor

/approve

i'll rebase #78 on this and add a test for rotation..

@abhinavdahiya
Copy link
Contributor

also can you

  • squash this to single commit or
  • 2 commits (move to pkg/start) and (leader step down on exit)

?

@smarterclayton
Copy link
Contributor Author

Will squash - teasing them apart is harder now

On an upgrade the CVO is always restarted but we don't release our
lease, which causes the new CVO pod to have to wait. We should
release our lease on exit.

Kube 1.14 (pr 71490) contains a new flag on leader lease that allows
the caller to step down gracefully. Because backporting that change
to client-go is complicated, we instead emulate the logic. When that
code is available we can simplify down.

cmd: Refactor how the CVO is started so the integration test is consistent

Move the integration test logic into a new package and reuse startup
logic so that we have a much cleaner start command than before and so
that we are testing what we run in the command. Remove rootOpts and
startOpts and replace them with nested operations.

Add a test that verifies we send leader election events.
@smarterclayton
Copy link
Contributor Author

Rebased, squashed, and fixed the event thing.

@abhinavdahiya
Copy link
Contributor

/lgtm
/hold cancel

@openshift-ci-robot openshift-ci-robot added lgtm Indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jan 18, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, smarterclayton

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [abhinavdahiya,smarterclayton]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@smarterclayton
Copy link
Contributor Author

/retest

1 similar comment
@smarterclayton
Copy link
Contributor Author

/retest

@openshift-merge-robot openshift-merge-robot merged commit 303ea4a into openshift:master Jan 18, 2019
wking added a commit to wking/cluster-version-operator that referenced this pull request Apr 15, 2020
We've had it since 2b81f47 (cvo: Release our leader lease when we
are gracefully terminated, 2019-01-16, openshift#87), but it's redundant
vs. "ListenAddr is not an empty string".

I'm also switching to:

  o.ListenAddr != ""

instead of:

  len(o.ListenAddr) > 0

because it seems slightly easier to understand, but obviously either
will work.
wking added a commit to wking/cluster-version-operator that referenced this pull request Apr 21, 2020
We've had it since 2b81f47 (cvo: Release our leader lease when we
are gracefully terminated, 2019-01-16, openshift#87), but it's redundant
vs. "ListenAddr is not an empty string".

I'm also switching to:

  o.ListenAddr != ""

instead of:

  len(o.ListenAddr) > 0

because it seems slightly easier to understand, but obviously either
will work.
wking added a commit to wking/cluster-version-operator that referenced this pull request Jun 16, 2020
We've had it since 2b81f47 (cvo: Release our leader lease when we
are gracefully terminated, 2019-01-16, openshift#87), but it's redundant
vs. "ListenAddr is not an empty string".

I'm also switching to:

  o.ListenAddr != ""

instead of:

  len(o.ListenAddr) > 0

because it seems slightly easier to understand, but obviously either
will work.
wking added a commit to wking/cluster-version-operator that referenced this pull request Aug 4, 2020
So the incoming cluster-version operator doesn't need to wait for the
outgoing operator's lease to expire, which can take a while [1]:

  I0802 10:06:01.056591       1 leaderelection.go:243] attempting to acquire leader lease  openshift-cluster-version/version...
  ...
  I0802 10:07:42.632719       1 leaderelection.go:253] successfully acquired lease openshift-cluster-version/version

and time out the:

  Cluster did not acknowledge request to upgrade in a reasonable time

testcase [2].  Using ReleaseOnCancel has been the plan since
2b81f47 (cvo: Release our leader lease when we are gracefully
terminated, 2019-01-16, openshift#87).  I'm not clear on why it (sometimes?)
doesn't work today.

The discrepancy between the "exit after 2s no matter what" comment and
the 5s After dates back to dbedb7a (cvo: When the CVO restarts,
perform one final sync to write status, 2019-04-27, openshift#179), which
bumped the After from 2s to 5s, but forgot to bump the comment.  I'm
removing that code here in favor of the two-minute timeout from
b30aa0e (pkg/cvo/metrics: Graceful server shutdown, 2020-04-15, openshift#349).
We still exit immediately on a second TERM, for folks who get
impatient waiting for the graceful timeout.

I'm also pushing the Background() initialization all the way up to the
command-line handler, to make it more obvious that the context is
scoped to the whole 'start' invocation.

[1]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/25365/pull-ci-openshift-origin-master-e2e-gcp-upgrade/1289853267223777280/artifacts/e2e-gcp-upgrade/pods/openshift-cluster-version_cluster-version-operator-5b6ff896c6-57ppb_cluster-version-operator.log
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=1843505#c7
wking added a commit to wking/cluster-version-operator that referenced this pull request Aug 4, 2020
So the incoming cluster-version operator doesn't need to wait for the
outgoing operator's lease to expire, which can take a while [1]:

  I0802 10:06:01.056591       1 leaderelection.go:243] attempting to acquire leader lease  openshift-cluster-version/version...
  ...
  I0802 10:07:42.632719       1 leaderelection.go:253] successfully acquired lease openshift-cluster-version/version

and time out the:

  Cluster did not acknowledge request to upgrade in a reasonable time

testcase [2].  Using ReleaseOnCancel has been the plan since
2b81f47 (cvo: Release our leader lease when we are gracefully
terminated, 2019-01-16, openshift#87).  I'm not clear on why it (sometimes?)
doesn't work today.

The discrepancy between the "exit after 2s no matter what" comment and
the 5s After dates back to dbedb7a (cvo: When the CVO restarts,
perform one final sync to write status, 2019-04-27, openshift#179), which
bumped the After from 2s to 5s, but forgot to bump the comment.  I'm
removing that code here in favor of the two-minute timeout from
b30aa0e (pkg/cvo/metrics: Graceful server shutdown, 2020-04-15, openshift#349).
We still exit immediately on a second TERM, for folks who get
impatient waiting for the graceful timeout.

I'm also pushing the Background() initialization all the way up to the
command-line handler, to make it more obvious that the context is
scoped to the whole 'start' invocation.

[1]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/25365/pull-ci-openshift-origin-master-e2e-gcp-upgrade/1289853267223777280/artifacts/e2e-gcp-upgrade/pods/openshift-cluster-version_cluster-version-operator-5b6ff896c6-57ppb_cluster-version-operator.log
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=1843505#c7
wking added a commit to wking/cluster-version-operator that referenced this pull request Aug 4, 2020
So the incoming cluster-version operator doesn't need to wait for the
outgoing operator's lease to expire, which can take a while [1]:

  I0802 10:06:01.056591       1 leaderelection.go:243] attempting to acquire leader lease  openshift-cluster-version/version...
  ...
  I0802 10:07:42.632719       1 leaderelection.go:253] successfully acquired lease openshift-cluster-version/version

and time out the:

  Cluster did not acknowledge request to upgrade in a reasonable time

testcase [2].  Using ReleaseOnCancel has been the plan since
2b81f47 (cvo: Release our leader lease when we are gracefully
terminated, 2019-01-16, openshift#87).  I'm not clear on why it (sometimes?)
doesn't work today.

The discrepancy between the "exit after 2s no matter what" comment and
the 5s After dates back to dbedb7a (cvo: When the CVO restarts,
perform one final sync to write status, 2019-04-27, openshift#179), which
bumped the After from 2s to 5s, but forgot to bump the comment.  I'm
removing that code here in favor of the two-minute timeout from
b30aa0e (pkg/cvo/metrics: Graceful server shutdown, 2020-04-15, openshift#349).
We still exit immediately on a second TERM, for folks who get
impatient waiting for the graceful timeout.

I'm also pushing the Background() initialization all the way up to the
command-line handler, to make it more obvious that the context is
scoped to the whole 'start' invocation.

[1]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/25365/pull-ci-openshift-origin-master-e2e-gcp-upgrade/1289853267223777280/artifacts/e2e-gcp-upgrade/pods/openshift-cluster-version_cluster-version-operator-5b6ff896c6-57ppb_cluster-version-operator.log
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=1843505#c7
wking added a commit to wking/cluster-version-operator that referenced this pull request Aug 4, 2020
So the incoming cluster-version operator doesn't need to wait for the
outgoing operator's lease to expire, which can take a while [1]:

  I0802 10:06:01.056591       1 leaderelection.go:243] attempting to acquire leader lease  openshift-cluster-version/version...
  ...
  I0802 10:07:42.632719       1 leaderelection.go:253] successfully acquired lease openshift-cluster-version/version

and time out the:

  Cluster did not acknowledge request to upgrade in a reasonable time

testcase [2].  Using ReleaseOnCancel has been the plan since
2b81f47 (cvo: Release our leader lease when we are gracefully
terminated, 2019-01-16, openshift#87).  I'm not clear on why it (sometimes?)
doesn't work today.

The discrepancy between the "exit after 2s no matter what" comment and
the 5s After dates back to dbedb7a (cvo: When the CVO restarts,
perform one final sync to write status, 2019-04-27, openshift#179), which
bumped the After from 2s to 5s, but forgot to bump the comment.  I'm
removing that code here in favor of the two-minute timeout from
b30aa0e (pkg/cvo/metrics: Graceful server shutdown, 2020-04-15, openshift#349).
We still exit immediately on a second TERM, for folks who get
impatient waiting for the graceful timeout.

Also set terminationGracePeriodSeconds 130 to extend the default 30s
[3], to give the container the full two-minute graceful timeout window
before the kubelet steps in with a KILL.

Also push the Background() initialization all the way up to the
command-line handler, to make it more obvious that the context is
scoped to the whole 'start' invocation.

[1]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/25365/pull-ci-openshift-origin-master-e2e-gcp-upgrade/1289853267223777280/artifacts/e2e-gcp-upgrade/pods/openshift-cluster-version_cluster-version-operator-5b6ff896c6-57ppb_cluster-version-operator.log
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=1843505#c7
[3]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#podspec-v1-core
wking added a commit to wking/cluster-version-operator that referenced this pull request Aug 4, 2020
So the incoming cluster-version operator doesn't need to wait for the
outgoing operator's lease to expire, which can take a while [1]:

  I0802 10:06:01.056591       1 leaderelection.go:243] attempting to acquire leader lease  openshift-cluster-version/version...
  ...
  I0802 10:07:42.632719       1 leaderelection.go:253] successfully acquired lease openshift-cluster-version/version

and time out the:

  Cluster did not acknowledge request to upgrade in a reasonable time

testcase [2].  Using ReleaseOnCancel has been the plan since
2b81f47 (cvo: Release our leader lease when we are gracefully
terminated, 2019-01-16, openshift#87).  I'm not clear on why it (sometimes?)
doesn't work today.

The discrepancy between the "exit after 2s no matter what" comment and
the 5s After dates back to dbedb7a (cvo: When the CVO restarts,
perform one final sync to write status, 2019-04-27, openshift#179), which
bumped the After from 2s to 5s, but forgot to bump the comment.  I'm
removing that code here in favor of the two-minute timeout from
b30aa0e (pkg/cvo/metrics: Graceful server shutdown, 2020-04-15, openshift#349).
We still exit immediately on a second TERM, for folks who get
impatient waiting for the graceful timeout.

Also set terminationGracePeriodSeconds 130 to extend the default 30s
[3], to give the container the full two-minute graceful timeout window
before the kubelet steps in with a KILL.

Also push the Background() initialization all the way up to the
command-line handler, to make it more obvious that the context is
scoped to the whole 'start' invocation.

[1]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/25365/pull-ci-openshift-origin-master-e2e-gcp-upgrade/1289853267223777280/artifacts/e2e-gcp-upgrade/pods/openshift-cluster-version_cluster-version-operator-5b6ff896c6-57ppb_cluster-version-operator.log
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=1843505#c7
[3]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#podspec-v1-core
wking added a commit to wking/cluster-version-operator that referenced this pull request Aug 4, 2020
So the incoming cluster-version operator doesn't need to wait for the
outgoing operator's lease to expire, which can take a while [1]:

  I0802 10:06:01.056591       1 leaderelection.go:243] attempting to acquire leader lease  openshift-cluster-version/version...
  ...
  I0802 10:07:42.632719       1 leaderelection.go:253] successfully acquired lease openshift-cluster-version/version

and time out the:

  Cluster did not acknowledge request to upgrade in a reasonable time

testcase [2].  Using ReleaseOnCancel has been the plan since
2b81f47 (cvo: Release our leader lease when we are gracefully
terminated, 2019-01-16, openshift#87).  I'm not clear on why it (sometimes?)
doesn't work today.

The discrepancy between the "exit after 2s no matter what" comment and
the 5s After dates back to dbedb7a (cvo: When the CVO restarts,
perform one final sync to write status, 2019-04-27, openshift#179), which
bumped the After from 2s to 5s, but forgot to bump the comment.  I'm
removing that code here in favor of the two-minute timeout from
b30aa0e (pkg/cvo/metrics: Graceful server shutdown, 2020-04-15, openshift#349).
We still exit immediately on a second TERM, for folks who get
impatient waiting for the graceful timeout.

Also drop an unnecessary runCancel() from the "shutting down" branch
of the error collector.  I'd added it in b30aa0e, but you can only
ever get into the "shutting down" branch if runCancel has already been
called.

Also add some logging to the error collector, so it's easier to see
where we are in the collection process from the operator logs.
Although because we only collect errors at the moment, it's not
obvious from the logs exactly what has been collected.

Also set terminationGracePeriodSeconds 130 to extend the default 30s
[3], to give the container the full two-minute graceful timeout window
before the kubelet steps in with a KILL.

Also push the Background() initialization all the way up to the
command-line handler, to make it more obvious that the context is
scoped to the whole 'start' invocation.

[1]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/25365/pull-ci-openshift-origin-master-e2e-gcp-upgrade/1289853267223777280/artifacts/e2e-gcp-upgrade/pods/openshift-cluster-version_cluster-version-operator-5b6ff896c6-57ppb_cluster-version-operator.log
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=1843505#c7
[3]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#podspec-v1-core
wking added a commit to wking/cluster-version-operator that referenced this pull request Aug 5, 2020
So the incoming cluster-version operator doesn't need to wait for the
outgoing operator's lease to expire, which can take a while [1]:

  I0802 10:06:01.056591       1 leaderelection.go:243] attempting to acquire leader lease  openshift-cluster-version/version...
  ...
  I0802 10:07:42.632719       1 leaderelection.go:253] successfully acquired lease openshift-cluster-version/version

and time out the:

  Cluster did not acknowledge request to upgrade in a reasonable time

testcase [2].  Using ReleaseOnCancel has been the plan since
2b81f47 (cvo: Release our leader lease when we are gracefully
terminated, 2019-01-16, openshift#87).  I'm not clear on why it (sometimes?)
doesn't work today.

The discrepancy between the "exit after 2s no matter what" comment and
the 5s After dates back to dbedb7a (cvo: When the CVO restarts,
perform one final sync to write status, 2019-04-27, openshift#179), which
bumped the After from 2s to 5s, but forgot to bump the comment.  I'm
removing that code here in favor of the two-minute timeout from
b30aa0e (pkg/cvo/metrics: Graceful server shutdown, 2020-04-15, openshift#349).
We still exit immediately on a second TERM, for folks who get
impatient waiting for the graceful timeout.

Also drop an unnecessary runCancel() from the "shutting down" branch
of the error collector.  I'd added it in b30aa0e, but you can only
ever get into the "shutting down" branch if runCancel has already been
called.  And fix the scoping for the shutdownTimer variable so we
don't clear it on each for-loop iteration (oops :p, bug from
b30aa0e).

Also add some logging to the error collector, so it's easier to see
where we are in the collection process from the operator logs.
Although because we only collect errors at the moment, it's not
obvious from the logs exactly what has been collected.

Also set terminationGracePeriodSeconds 130 to extend the default 30s
[3], to give the container the full two-minute graceful timeout window
before the kubelet steps in with a KILL.

Also push the Background() initialization all the way up to the
command-line handler, to make it more obvious that the context is
scoped to the whole 'start' invocation.

[1]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/25365/pull-ci-openshift-origin-master-e2e-gcp-upgrade/1289853267223777280/artifacts/e2e-gcp-upgrade/pods/openshift-cluster-version_cluster-version-operator-5b6ff896c6-57ppb_cluster-version-operator.log
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=1843505#c7
[3]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#podspec-v1-core
wking added a commit to wking/cluster-version-operator that referenced this pull request Aug 5, 2020
So the incoming cluster-version operator doesn't need to wait for the
outgoing operator's lease to expire, which can take a while [1]:

  I0802 10:06:01.056591       1 leaderelection.go:243] attempting to acquire leader lease  openshift-cluster-version/version...
  ...
  I0802 10:07:42.632719       1 leaderelection.go:253] successfully acquired lease openshift-cluster-version/version

and time out the:

  Cluster did not acknowledge request to upgrade in a reasonable time

testcase [2].  Using ReleaseOnCancel has been the plan since
2b81f47 (cvo: Release our leader lease when we are gracefully
terminated, 2019-01-16, openshift#87).  I'm not clear on why it (sometimes?)
doesn't work today.

The discrepancy between the "exit after 2s no matter what" comment and
the 5s After dates back to dbedb7a (cvo: When the CVO restarts,
perform one final sync to write status, 2019-04-27, openshift#179), which
bumped the After from 2s to 5s, but forgot to bump the comment.  I'm
removing that code here in favor of the two-minute timeout from
b30aa0e (pkg/cvo/metrics: Graceful server shutdown, 2020-04-15, openshift#349).
We still exit immediately on a second TERM, for folks who get
impatient waiting for the graceful timeout.

Decouple shutdownContext from the context passed into Options.run, to
allow TestIntegrationCVO_gracefulStepDown to request a graceful
shutdown.  And remove Context.Start(), inlining the logic in
Options.run so we can count and reap the goroutines it used to launch.
This also allows us to be more targeted with the context for each
goroutines:

* Informers are now launched before the lease controller, so they're
  up and running by the time we acquire the lease.  They remain
  running until the main operator CVO.Run() exits, after which we shut
  them down.  Having informers running before we have a lease is
  somewhat expensive in terms of API traffic, but we should rarely
  have two CVO pods competing for leadership since we transitioned to
  the Recreate Deployment strategy in 078686d
  (install/0000_00_cluster-version-operator_03_deployment: Set
  'strategy: Recreate', 2019-03-20, openshift#140) and 5d8a527
  (install/0000_00_cluster-version-operator_03_deployment: Fix
  Recreate strategy, 2019-04-03, openshift#155).

* The metrics server also continues to run until CVO.Run() exits,
  where previously we began gracefully shutting it down at the same
  time we started shutting down CVO.Run().  This ensures we are around
  and publishing any last-minute CVO.Run() changes.

* Leader election also continues to run until CVO.Run() exits.  We
  don't want to release the lease while we're still controlling
  things.

* CVO.Run() and AutoUpdate.Run() both stop immediately when the
  passed-in context is canceled or we call runCancel internally
  (because of a TERM, error from a goroutine, or loss of leadership).
  These are the only two goroutines that are actually writing to the
  API servers, so we want to shut them down as quickly as possible.

Drop an unnecessary runCancel() from the "shutting down" branch of the
error collector.  I'd added it in b30aa0e, but you can only ever
get into the "shutting down" branch if runCancel has already been
called.  And fix the scoping for the shutdownTimer variable so we
don't clear it on each for-loop iteration (oops :p, bug from
b30aa0e).

Add some logging to the error collector, so it's easier to see where
we are in the collection process from the operator logs.  Also start
logging collected goroutines by name, so we can figure out which may
still be outstanding.

Set terminationGracePeriodSeconds 130 to extend the default 30s [3],
to give the container the full two-minute graceful timeout window
before the kubelet steps in with a KILL.

Push the Background() initialization all the way up to the
command-line handler, to make it more obvious that the context is
scoped to the whole 'start' invocation.

[1]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/25365/pull-ci-openshift-origin-master-e2e-gcp-upgrade/1289853267223777280/artifacts/e2e-gcp-upgrade/pods/openshift-cluster-version_cluster-version-operator-5b6ff896c6-57ppb_cluster-version-operator.log
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=1843505#c7
[3]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#podspec-v1-core
wking added a commit to wking/cluster-version-operator that referenced this pull request Aug 5, 2020
So the incoming cluster-version operator doesn't need to wait for the
outgoing operator's lease to expire, which can take a while [1]:

  I0802 10:06:01.056591       1 leaderelection.go:243] attempting to acquire leader lease  openshift-cluster-version/version...
  ...
  I0802 10:07:42.632719       1 leaderelection.go:253] successfully acquired lease openshift-cluster-version/version

and time out the:

  Cluster did not acknowledge request to upgrade in a reasonable time

testcase [2].  Using ReleaseOnCancel has been the plan since
2b81f47 (cvo: Release our leader lease when we are gracefully
terminated, 2019-01-16, openshift#87).  I'm not clear on why it (sometimes?)
doesn't work today.

The discrepancy between the "exit after 2s no matter what" comment and
the 5s After dates back to dbedb7a (cvo: When the CVO restarts,
perform one final sync to write status, 2019-04-27, openshift#179), which
bumped the After from 2s to 5s, but forgot to bump the comment.  I'm
removing that code here in favor of the two-minute timeout from
b30aa0e (pkg/cvo/metrics: Graceful server shutdown, 2020-04-15, openshift#349).
We still exit immediately on a second TERM, for folks who get
impatient waiting for the graceful timeout.

Decouple shutdownContext from the context passed into Options.run, to
allow TestIntegrationCVO_gracefulStepDown to request a graceful
shutdown.  And remove Context.Start(), inlining the logic in
Options.run so we can count and reap the goroutines it used to launch.
This also allows us to be more targeted with the context for each
goroutines:

* Informers are now launched before the lease controller, so they're
  up and running by the time we acquire the lease.  They remain
  running until the main operator CVO.Run() exits, after which we shut
  them down.  Having informers running before we have a lease is
  somewhat expensive in terms of API traffic, but we should rarely
  have two CVO pods competing for leadership since we transitioned to
  the Recreate Deployment strategy in 078686d
  (install/0000_00_cluster-version-operator_03_deployment: Set
  'strategy: Recreate', 2019-03-20, openshift#140) and 5d8a527
  (install/0000_00_cluster-version-operator_03_deployment: Fix
  Recreate strategy, 2019-04-03, openshift#155).

* The metrics server also continues to run until CVO.Run() exits,
  where previously we began gracefully shutting it down at the same
  time we started shutting down CVO.Run().  This ensures we are around
  and publishing any last-minute CVO.Run() changes.

* Leader election also continues to run until CVO.Run() exits.  We
  don't want to release the lease while we're still controlling
  things.

* CVO.Run() and AutoUpdate.Run() both stop immediately when the
  passed-in context is canceled or we call runCancel internally
  (because of a TERM, error from a goroutine, or loss of leadership).
  These are the only two goroutines that are actually writing to the
  API servers, so we want to shut them down as quickly as possible.

Drop an unnecessary runCancel() from the "shutting down" branch of the
error collector.  I'd added it in b30aa0e, but you can only ever
get into the "shutting down" branch if runCancel has already been
called.  And fix the scoping for the shutdownTimer variable so we
don't clear it on each for-loop iteration (oops :p, bug from
b30aa0e).

Add some logging to the error collector, so it's easier to see where
we are in the collection process from the operator logs.  Also start
logging collected goroutines by name, so we can figure out which may
still be outstanding.

Set terminationGracePeriodSeconds 130 to extend the default 30s [3],
to give the container the full two-minute graceful timeout window
before the kubelet steps in with a KILL.

Push the Background() initialization all the way up to the
command-line handler, to make it more obvious that the context is
scoped to the whole 'start' invocation.

[1]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/25365/pull-ci-openshift-origin-master-e2e-gcp-upgrade/1289853267223777280/artifacts/e2e-gcp-upgrade/pods/openshift-cluster-version_cluster-version-operator-5b6ff896c6-57ppb_cluster-version-operator.log
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=1843505#c7
[3]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#podspec-v1-core
wking added a commit to wking/cluster-version-operator that referenced this pull request Aug 5, 2020
So the incoming cluster-version operator doesn't need to wait for the
outgoing operator's lease to expire, which can take a while [1]:

  I0802 10:06:01.056591       1 leaderelection.go:243] attempting to acquire leader lease  openshift-cluster-version/version...
  ...
  I0802 10:07:42.632719       1 leaderelection.go:253] successfully acquired lease openshift-cluster-version/version

and time out the:

  Cluster did not acknowledge request to upgrade in a reasonable time

testcase [2].  Using ReleaseOnCancel has been the plan since
2b81f47 (cvo: Release our leader lease when we are gracefully
terminated, 2019-01-16, openshift#87).  I'm not clear on why it (sometimes?)
doesn't work today.

The discrepancy between the "exit after 2s no matter what" comment and
the 5s After dates back to dbedb7a (cvo: When the CVO restarts,
perform one final sync to write status, 2019-04-27, openshift#179), which
bumped the After from 2s to 5s, but forgot to bump the comment.  I'm
removing that code here in favor of the two-minute timeout from
b30aa0e (pkg/cvo/metrics: Graceful server shutdown, 2020-04-15, openshift#349).
We still exit immediately on a second TERM, for folks who get
impatient waiting for the graceful timeout.

Decouple shutdownContext from the context passed into Options.run, to
allow TestIntegrationCVO_gracefulStepDown to request a graceful
shutdown.  And remove Context.Start(), inlining the logic in
Options.run so we can count and reap the goroutines it used to launch.
This also allows us to be more targeted with the context for each
goroutines:

* Informers are now launched before the lease controller, so they're
  up and running by the time we acquire the lease.  They remain
  running until the main operator CVO.Run() exits, after which we shut
  them down.  Having informers running before we have a lease is
  somewhat expensive in terms of API traffic, but we should rarely
  have two CVO pods competing for leadership since we transitioned to
  the Recreate Deployment strategy in 078686d
  (install/0000_00_cluster-version-operator_03_deployment: Set
  'strategy: Recreate', 2019-03-20, openshift#140) and 5d8a527
  (install/0000_00_cluster-version-operator_03_deployment: Fix
  Recreate strategy, 2019-04-03, openshift#155).  I don't see a way to block on
  their internal goroutine's completion, but maybe informers will grow
  an API for that in the future.

* The metrics server also continues to run until CVO.Run() exits,
  where previously we began gracefully shutting it down at the same
  time we started shutting down CVO.Run().  This ensures we are around
  and publishing any last-minute CVO.Run() changes.

* Leader election also continues to run until CVO.Run() exits.  We
  don't want to release the lease while we're still controlling
  things.

* CVO.Run() and AutoUpdate.Run() both stop immediately when the
  passed-in context is canceled or we call runCancel internally
  (because of a TERM, error from a goroutine, or loss of leadership).
  These are the only two goroutines that are actually writing to the
  API servers, so we want to shut them down as quickly as possible.

Drop an unnecessary runCancel() from the "shutting down" branch of the
error collector.  I'd added it in b30aa0e, but you can only ever
get into the "shutting down" branch if runCancel has already been
called.  And fix the scoping for the shutdownTimer variable so we
don't clear it on each for-loop iteration (oops :p, bug from
b30aa0e).

Add some logging to the error collector, so it's easier to see where
we are in the collection process from the operator logs.  Also start
logging collected goroutines by name, so we can figure out which may
still be outstanding.

Set terminationGracePeriodSeconds 130 to extend the default 30s [3],
to give the container the full two-minute graceful timeout window
before the kubelet steps in with a KILL.

Push the Background() initialization all the way up to the
command-line handler, to make it more obvious that the context is
scoped to the whole 'start' invocation.

[1]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/25365/pull-ci-openshift-origin-master-e2e-gcp-upgrade/1289853267223777280/artifacts/e2e-gcp-upgrade/pods/openshift-cluster-version_cluster-version-operator-5b6ff896c6-57ppb_cluster-version-operator.log
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=1843505#c7
[3]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#podspec-v1-core

squash! pkg/start: Release leader lease on graceful shutdown
wking added a commit to wking/cluster-version-operator that referenced this pull request Aug 5, 2020
So the incoming cluster-version operator doesn't need to wait for the
outgoing operator's lease to expire, which can take a while [1]:

  I0802 10:06:01.056591       1 leaderelection.go:243] attempting to acquire leader lease  openshift-cluster-version/version...
  ...
  I0802 10:07:42.632719       1 leaderelection.go:253] successfully acquired lease openshift-cluster-version/version

and time out the:

  Cluster did not acknowledge request to upgrade in a reasonable time

testcase [2].  Using ReleaseOnCancel has been the plan since
2b81f47 (cvo: Release our leader lease when we are gracefully
terminated, 2019-01-16, openshift#87).  I'm not clear on why it (sometimes?)
doesn't work today.

The discrepancy between the "exit after 2s no matter what" comment and
the 5s After dates back to dbedb7a (cvo: When the CVO restarts,
perform one final sync to write status, 2019-04-27, openshift#179), which
bumped the After from 2s to 5s, but forgot to bump the comment.  I'm
removing that code here in favor of the two-minute timeout from
b30aa0e (pkg/cvo/metrics: Graceful server shutdown, 2020-04-15, openshift#349).
We still exit immediately on a second TERM, for folks who get
impatient waiting for the graceful timeout.

Decouple shutdownContext from the context passed into Options.run, to
allow TestIntegrationCVO_gracefulStepDown to request a graceful
shutdown.  And remove Context.Start(), inlining the logic in
Options.run so we can count and reap the goroutines it used to launch.
This also allows us to be more targeted with the context for each
goroutines:

* Informers are now launched before the lease controller, so they're
  up and running by the time we acquire the lease.  They remain
  running until the main operator CVO.Run() exits, after which we shut
  them down.  Having informers running before we have a lease is
  somewhat expensive in terms of API traffic, but we should rarely
  have two CVO pods competing for leadership since we transitioned to
  the Recreate Deployment strategy in 078686d
  (install/0000_00_cluster-version-operator_03_deployment: Set
  'strategy: Recreate', 2019-03-20, openshift#140) and 5d8a527
  (install/0000_00_cluster-version-operator_03_deployment: Fix
  Recreate strategy, 2019-04-03, openshift#155).  I don't see a way to block on
  their internal goroutine's completion, but maybe informers will grow
  an API for that in the future.

* The metrics server also continues to run until CVO.Run() exits,
  where previously we began gracefully shutting it down at the same
  time we started shutting down CVO.Run().  This ensures we are around
  and publishing any last-minute CVO.Run() changes.

* Leader election also continues to run until CVO.Run() exits.  We
  don't want to release the lease while we're still controlling
  things.

* CVO.Run() and AutoUpdate.Run() both stop immediately when the
  passed-in context is canceled or we call runCancel internally
  (because of a TERM, error from a goroutine, or loss of leadership).
  These are the only two goroutines that are actually writing to the
  API servers, so we want to shut them down as quickly as possible.

Drop an unnecessary runCancel() from the "shutting down" branch of the
error collector.  I'd added it in b30aa0e, but you can only ever
get into the "shutting down" branch if runCancel has already been
called.  And fix the scoping for the shutdownTimer variable so we
don't clear it on each for-loop iteration (oops :p, bug from
b30aa0e).

Add some logging to the error collector, so it's easier to see where
we are in the collection process from the operator logs.  Also start
logging collected goroutines by name, so we can figure out which may
still be outstanding.

Set terminationGracePeriodSeconds 130 to extend the default 30s [3],
to give the container the full two-minute graceful timeout window
before the kubelet steps in with a KILL.

Push the Background() initialization all the way up to the
command-line handler, to make it more obvious that the context is
scoped to the whole 'start' invocation.

[1]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/25365/pull-ci-openshift-origin-master-e2e-gcp-upgrade/1289853267223777280/artifacts/e2e-gcp-upgrade/pods/openshift-cluster-version_cluster-version-operator-5b6ff896c6-57ppb_cluster-version-operator.log
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=1843505#c7
[3]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#podspec-v1-core

squash! pkg/start: Release leader lease on graceful shutdown
wking added a commit to wking/cluster-version-operator that referenced this pull request Aug 5, 2020
…ap lock release

From the godocs:

  $ grep -A5 '// HolderIdentity' vendor/k8s.io/client-go/tools/leaderelection/resourcelock/interface.go
    // HolderIdentity is the ID that owns the lease. If empty, no one owns this lease and
    // all callers may acquire. Versions of this library prior to Kubernetes 1.14 will not
    // attempt to acquire leases with empty identities and will wait for the full lease
    // interval to expire before attempting to reacquire. This value is set to empty when
    // a client voluntarily steps down.
    HolderIdentity       string      `json:"holderIdentity"`

The previous assumption that the release would involve ConfigMap
deletion was born with the test in 2b81f47 (cvo: Release our leader
lease when we are gracefully terminated, 2019-01-16, openshift#87).
wking added a commit to wking/cluster-version-operator that referenced this pull request Aug 5, 2020
So the incoming cluster-version operator doesn't need to wait for the
outgoing operator's lease to expire, which can take a while [1]:

  I0802 10:06:01.056591       1 leaderelection.go:243] attempting to acquire leader lease  openshift-cluster-version/version...
  ...
  I0802 10:07:42.632719       1 leaderelection.go:253] successfully acquired lease openshift-cluster-version/version

and time out the:

  Cluster did not acknowledge request to upgrade in a reasonable time

testcase [2].  Using ReleaseOnCancel has been the plan since
2b81f47 (cvo: Release our leader lease when we are gracefully
terminated, 2019-01-16, openshift#87).  I'm not clear on why it (sometimes?)
doesn't work today.

The discrepancy between the "exit after 2s no matter what" comment and
the 5s After dates back to dbedb7a (cvo: When the CVO restarts,
perform one final sync to write status, 2019-04-27, openshift#179), which
bumped the After from 2s to 5s, but forgot to bump the comment.  I'm
removing that code here in favor of the two-minute timeout from
b30aa0e (pkg/cvo/metrics: Graceful server shutdown, 2020-04-15, openshift#349).
We still exit immediately on a second TERM, for folks who get
impatient waiting for the graceful timeout.

Decouple shutdownContext from the context passed into Options.run, to
allow TestIntegrationCVO_gracefulStepDown to request a graceful
shutdown.  And remove Context.Start(), inlining the logic in
Options.run so we can count and reap the goroutines it used to launch.
This also allows us to be more targeted with the context for each
goroutines:

* Informers are now launched before the lease controller, so they're
  up and running by the time we acquire the lease.  They remain
  running until the main operator CVO.Run() exits, after which we shut
  them down.  Having informers running before we have a lease is
  somewhat expensive in terms of API traffic, but we should rarely
  have two CVO pods competing for leadership since we transitioned to
  the Recreate Deployment strategy in 078686d
  (install/0000_00_cluster-version-operator_03_deployment: Set
  'strategy: Recreate', 2019-03-20, openshift#140) and 5d8a527
  (install/0000_00_cluster-version-operator_03_deployment: Fix
  Recreate strategy, 2019-04-03, openshift#155).  I don't see a way to block on
  their internal goroutine's completion, but maybe informers will grow
  an API for that in the future.

* The metrics server also continues to run until CVO.Run() exits,
  where previously we began gracefully shutting it down at the same
  time we started shutting down CVO.Run().  This ensures we are around
  and publishing any last-minute CVO.Run() changes.

* Leader election also continues to run until CVO.Run() exits.  We
  don't want to release the lease while we're still controlling
  things.

* CVO.Run() and AutoUpdate.Run() both stop immediately when the
  passed-in context is canceled or we call runCancel internally
  (because of a TERM, error from a goroutine, or loss of leadership).
  These are the only two goroutines that are actually writing to the
  API servers, so we want to shut them down as quickly as possible.

Drop an unnecessary runCancel() from the "shutting down" branch of the
error collector.  I'd added it in b30aa0e, but you can only ever
get into the "shutting down" branch if runCancel has already been
called.  And fix the scoping for the shutdownTimer variable so we
don't clear it on each for-loop iteration (oops :p, bug from
b30aa0e).

Add some logging to the error collector, so it's easier to see where
we are in the collection process from the operator logs.  Also start
logging collected goroutines by name, so we can figure out which may
still be outstanding.

Set terminationGracePeriodSeconds 130 to extend the default 30s [3],
to give the container the full two-minute graceful timeout window
before the kubelet steps in with a KILL.

Push the Background() initialization all the way up to the
command-line handler, to make it more obvious that the context is
scoped to the whole 'start' invocation.

[1]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/25365/pull-ci-openshift-origin-master-e2e-gcp-upgrade/1289853267223777280/artifacts/e2e-gcp-upgrade/pods/openshift-cluster-version_cluster-version-operator-5b6ff896c6-57ppb_cluster-version-operator.log
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=1843505#c7
[3]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#podspec-v1-core

squash! pkg/start: Release leader lease on graceful shutdown
wking added a commit to wking/cluster-version-operator that referenced this pull request Aug 5, 2020
…ap lock release

From the godocs:

  $ grep -A5 '// HolderIdentity' vendor/k8s.io/client-go/tools/leaderelection/resourcelock/interface.go
    // HolderIdentity is the ID that owns the lease. If empty, no one owns this lease and
    // all callers may acquire. Versions of this library prior to Kubernetes 1.14 will not
    // attempt to acquire leases with empty identities and will wait for the full lease
    // interval to expire before attempting to reacquire. This value is set to empty when
    // a client voluntarily steps down.
    HolderIdentity       string      `json:"holderIdentity"`

The previous assumption that the release would involve ConfigMap
deletion was born with the test in 2b81f47 (cvo: Release our leader
lease when we are gracefully terminated, 2019-01-16, openshift#87).
wking added a commit to wking/cluster-version-operator that referenced this pull request Aug 6, 2020
So the incoming cluster-version operator doesn't need to wait for the
outgoing operator's lease to expire, which can take a while [1]:

  I0802 10:06:01.056591       1 leaderelection.go:243] attempting to acquire leader lease  openshift-cluster-version/version...
  ...
  I0802 10:07:42.632719       1 leaderelection.go:253] successfully acquired lease openshift-cluster-version/version

and time out the:

  Cluster did not acknowledge request to upgrade in a reasonable time

testcase [2].  Using ReleaseOnCancel has been the plan since
2b81f47 (cvo: Release our leader lease when we are gracefully
terminated, 2019-01-16, openshift#87).  I'm not clear on why it (sometimes?)
doesn't work today.

The discrepancy between the "exit after 2s no matter what" comment and
the 5s After dates back to dbedb7a (cvo: When the CVO restarts,
perform one final sync to write status, 2019-04-27, openshift#179), which
bumped the After from 2s to 5s, but forgot to bump the comment.  I'm
removing that code here in favor of the two-minute timeout from
b30aa0e (pkg/cvo/metrics: Graceful server shutdown, 2020-04-15, openshift#349).
We still exit immediately on a second TERM, for folks who get
impatient waiting for the graceful timeout.

Decouple shutdownContext from the context passed into Options.run, to
allow TestIntegrationCVO_gracefulStepDown to request a graceful
shutdown.  And remove Context.Start(), inlining the logic in
Options.run so we can count and reap the goroutines it used to launch.
This also allows us to be more targeted with the context for each
goroutines:

* Informers are now launched before the lease controller, so they're
  up and running by the time we acquire the lease.  They remain
  running until the main operator CVO.Run() exits, after which we shut
  them down.  Having informers running before we have a lease is
  somewhat expensive in terms of API traffic, but we should rarely
  have two CVO pods competing for leadership since we transitioned to
  the Recreate Deployment strategy in 078686d
  (install/0000_00_cluster-version-operator_03_deployment: Set
  'strategy: Recreate', 2019-03-20, openshift#140) and 5d8a527
  (install/0000_00_cluster-version-operator_03_deployment: Fix
  Recreate strategy, 2019-04-03, openshift#155).  I don't see a way to block on
  their internal goroutine's completion, but maybe informers will grow
  an API for that in the future.

* The metrics server also continues to run until CVO.Run() exits,
  where previously we began gracefully shutting it down at the same
  time we started shutting down CVO.Run().  This ensures we are around
  and publishing any last-minute CVO.Run() changes.

* Leader election also continues to run until CVO.Run() exits.  We
  don't want to release the lease while we're still controlling
  things.

* CVO.Run() and AutoUpdate.Run() both stop immediately when the
  passed-in context is canceled or we call runCancel internally
  (because of a TERM, error from a goroutine, or loss of leadership).
  These are the only two goroutines that are actually writing to the
  API servers, so we want to shut them down as quickly as possible.

Drop an unnecessary runCancel() from the "shutting down" branch of the
error collector.  I'd added it in b30aa0e, but you can only ever
get into the "shutting down" branch if runCancel has already been
called.  And fix the scoping for the shutdownTimer variable so we
don't clear it on each for-loop iteration (oops :p, bug from
b30aa0e).

Add some logging to the error collector, so it's easier to see where
we are in the collection process from the operator logs.  Also start
logging collected goroutines by name, so we can figure out which may
still be outstanding.

Set terminationGracePeriodSeconds 130 to extend the default 30s [3],
to give the container the full two-minute graceful timeout window
before the kubelet steps in with a KILL.

Push the Background() initialization all the way up to the
command-line handler, to make it more obvious that the context is
scoped to the whole 'start' invocation.

[1]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/25365/pull-ci-openshift-origin-master-e2e-gcp-upgrade/1289853267223777280/artifacts/e2e-gcp-upgrade/pods/openshift-cluster-version_cluster-version-operator-5b6ff896c6-57ppb_cluster-version-operator.log
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=1843505#c7
[3]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#podspec-v1-core

squash! pkg/start: Release leader lease on graceful shutdown
wking added a commit to wking/cluster-version-operator that referenced this pull request Aug 6, 2020
…ap lock release

From the godocs:

  $ grep -A5 '// HolderIdentity' vendor/k8s.io/client-go/tools/leaderelection/resourcelock/interface.go
    // HolderIdentity is the ID that owns the lease. If empty, no one owns this lease and
    // all callers may acquire. Versions of this library prior to Kubernetes 1.14 will not
    // attempt to acquire leases with empty identities and will wait for the full lease
    // interval to expire before attempting to reacquire. This value is set to empty when
    // a client voluntarily steps down.
    HolderIdentity       string      `json:"holderIdentity"`

The previous assumption that the release would involve ConfigMap
deletion was born with the test in 2b81f47 (cvo: Release our leader
lease when we are gracefully terminated, 2019-01-16, openshift#87).
wking added a commit to wking/cluster-version-operator that referenced this pull request Aug 25, 2020
So the incoming cluster-version operator doesn't need to wait for the
outgoing operator's lease to expire, which can take a while [1]:

  I0802 10:06:01.056591       1 leaderelection.go:243] attempting to acquire leader lease  openshift-cluster-version/version...
  ...
  I0802 10:07:42.632719       1 leaderelection.go:253] successfully acquired lease openshift-cluster-version/version

and time out the:

  Cluster did not acknowledge request to upgrade in a reasonable time

testcase [2].  Using ReleaseOnCancel has been the plan since
2b81f47 (cvo: Release our leader lease when we are gracefully
terminated, 2019-01-16, openshift#87).  I'm not clear on why it (sometimes?)
doesn't work today.

The discrepancy between the "exit after 2s no matter what" comment and
the 5s After dates back to dbedb7a (cvo: When the CVO restarts,
perform one final sync to write status, 2019-04-27, openshift#179), which
bumped the After from 2s to 5s, but forgot to bump the comment.  I'm
removing that code here in favor of the two-minute timeout from
b30aa0e (pkg/cvo/metrics: Graceful server shutdown, 2020-04-15, openshift#349).
We still exit immediately on a second TERM, for folks who get
impatient waiting for the graceful timeout.

Decouple shutdownContext from the context passed into Options.run, to
allow TestIntegrationCVO_gracefulStepDown to request a graceful
shutdown.  And remove Context.Start(), inlining the logic in
Options.run so we can count and reap the goroutines it used to launch.
This also allows us to be more targeted with the context for each
goroutines:

* Informers are now launched before the lease controller, so they're
  up and running by the time we acquire the lease.  They remain
  running until the main operator CVO.Run() exits, after which we shut
  them down.  Having informers running before we have a lease is
  somewhat expensive in terms of API traffic, but we should rarely
  have two CVO pods competing for leadership since we transitioned to
  the Recreate Deployment strategy in 078686d
  (install/0000_00_cluster-version-operator_03_deployment: Set
  'strategy: Recreate', 2019-03-20, openshift#140) and 5d8a527
  (install/0000_00_cluster-version-operator_03_deployment: Fix
  Recreate strategy, 2019-04-03, openshift#155).  I don't see a way to block on
  their internal goroutine's completion, but maybe informers will grow
  an API for that in the future.

* The metrics server also continues to run until CVO.Run() exits,
  where previously we began gracefully shutting it down at the same
  time we started shutting down CVO.Run().  This ensures we are around
  and publishing any last-minute CVO.Run() changes.

* Leader election also continues to run until CVO.Run() exits.  We
  don't want to release the lease while we're still controlling
  things.

* CVO.Run() and AutoUpdate.Run() both stop immediately when the
  passed-in context is canceled or we call runCancel internally
  (because of a TERM, error from a goroutine, or loss of leadership).
  These are the only two goroutines that are actually writing to the
  API servers, so we want to shut them down as quickly as possible.

Drop an unnecessary runCancel() from the "shutting down" branch of the
error collector.  I'd added it in b30aa0e, but you can only ever
get into the "shutting down" branch if runCancel has already been
called.  And fix the scoping for the shutdownTimer variable so we
don't clear it on each for-loop iteration (oops :p, bug from
b30aa0e).

Add some logging to the error collector, so it's easier to see where
we are in the collection process from the operator logs.  Also start
logging collected goroutines by name, so we can figure out which may
still be outstanding.

Set terminationGracePeriodSeconds 130 to extend the default 30s [3],
to give the container the full two-minute graceful timeout window
before the kubelet steps in with a KILL.

Push the Background() initialization all the way up to the
command-line handler, to make it more obvious that the context is
scoped to the whole 'start' invocation.

[1]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/25365/pull-ci-openshift-origin-master-e2e-gcp-upgrade/1289853267223777280/artifacts/e2e-gcp-upgrade/pods/openshift-cluster-version_cluster-version-operator-5b6ff896c6-57ppb_cluster-version-operator.log
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=1843505#c7
[3]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#podspec-v1-core

squash! pkg/start: Release leader lease on graceful shutdown
wking added a commit to wking/cluster-version-operator that referenced this pull request Aug 25, 2020
…ap lock release

From the godocs:

  $ grep -A5 '// HolderIdentity' vendor/k8s.io/client-go/tools/leaderelection/resourcelock/interface.go
    // HolderIdentity is the ID that owns the lease. If empty, no one owns this lease and
    // all callers may acquire. Versions of this library prior to Kubernetes 1.14 will not
    // attempt to acquire leases with empty identities and will wait for the full lease
    // interval to expire before attempting to reacquire. This value is set to empty when
    // a client voluntarily steps down.
    HolderIdentity       string      `json:"holderIdentity"`

The previous assumption that the release would involve ConfigMap
deletion was born with the test in 2b81f47 (cvo: Release our leader
lease when we are gracefully terminated, 2019-01-16, openshift#87).
wking added a commit to wking/cluster-version-operator that referenced this pull request Aug 25, 2020
So the incoming cluster-version operator doesn't need to wait for the
outgoing operator's lease to expire, which can take a while [1]:

  I0802 10:06:01.056591       1 leaderelection.go:243] attempting to acquire leader lease  openshift-cluster-version/version...
  ...
  I0802 10:07:42.632719       1 leaderelection.go:253] successfully acquired lease openshift-cluster-version/version

and time out the:

  Cluster did not acknowledge request to upgrade in a reasonable time

testcase [2].  Using ReleaseOnCancel has been the plan since
2b81f47 (cvo: Release our leader lease when we are gracefully
terminated, 2019-01-16, openshift#87).  I'm not clear on why it (sometimes?)
doesn't work today.

The discrepancy between the "exit after 2s no matter what" comment and
the 5s After dates back to dbedb7a (cvo: When the CVO restarts,
perform one final sync to write status, 2019-04-27, openshift#179), which
bumped the After from 2s to 5s, but forgot to bump the comment.  I'm
removing that code here in favor of the two-minute timeout from
b30aa0e (pkg/cvo/metrics: Graceful server shutdown, 2020-04-15, openshift#349).
We still exit immediately on a second TERM, for folks who get
impatient waiting for the graceful timeout.

Decouple shutdownContext from the context passed into Options.run, to
allow TestIntegrationCVO_gracefulStepDown to request a graceful
shutdown.  And remove Context.Start(), inlining the logic in
Options.run so we can count and reap the goroutines it used to launch.
This also allows us to be more targeted with the context for each
goroutines:

* Informers are now launched before the lease controller, so they're
  up and running by the time we acquire the lease.  They remain
  running until the main operator CVO.Run() exits, after which we shut
  them down.  Having informers running before we have a lease is
  somewhat expensive in terms of API traffic, but we should rarely
  have two CVO pods competing for leadership since we transitioned to
  the Recreate Deployment strategy in 078686d
  (install/0000_00_cluster-version-operator_03_deployment: Set
  'strategy: Recreate', 2019-03-20, openshift#140) and 5d8a527
  (install/0000_00_cluster-version-operator_03_deployment: Fix
  Recreate strategy, 2019-04-03, openshift#155).  I don't see a way to block on
  their internal goroutine's completion, but maybe informers will grow
  an API for that in the future.

* The metrics server also continues to run until CVO.Run() exits,
  where previously we began gracefully shutting it down at the same
  time we started shutting down CVO.Run().  This ensures we are around
  and publishing any last-minute CVO.Run() changes.

* Leader election also continues to run until CVO.Run() exits.  We
  don't want to release the lease while we're still controlling
  things.

* CVO.Run() and AutoUpdate.Run() both stop immediately when the
  passed-in context is canceled or we call runCancel internally
  (because of a TERM, error from a goroutine, or loss of leadership).
  These are the only two goroutines that are actually writing to the
  API servers, so we want to shut them down as quickly as possible.

Drop an unnecessary runCancel() from the "shutting down" branch of the
error collector.  I'd added it in b30aa0e, but you can only ever
get into the "shutting down" branch if runCancel has already been
called.  And fix the scoping for the shutdownTimer variable so we
don't clear it on each for-loop iteration (oops :p, bug from
b30aa0e).

Add some logging to the error collector, so it's easier to see where
we are in the collection process from the operator logs.  Also start
logging collected goroutines by name, so we can figure out which may
still be outstanding.

Set terminationGracePeriodSeconds 130 to extend the default 30s [3],
to give the container the full two-minute graceful timeout window
before the kubelet steps in with a KILL.

Push the Background() initialization all the way up to the
command-line handler, to make it more obvious that the context is
scoped to the whole 'start' invocation.

[1]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/25365/pull-ci-openshift-origin-master-e2e-gcp-upgrade/1289853267223777280/artifacts/e2e-gcp-upgrade/pods/openshift-cluster-version_cluster-version-operator-5b6ff896c6-57ppb_cluster-version-operator.log
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=1843505#c7
[3]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#podspec-v1-core

squash! pkg/start: Release leader lease on graceful shutdown
wking added a commit to wking/cluster-version-operator that referenced this pull request Aug 25, 2020
…ap lock release

From the godocs:

  $ grep -A5 '// HolderIdentity' vendor/k8s.io/client-go/tools/leaderelection/resourcelock/interface.go
    // HolderIdentity is the ID that owns the lease. If empty, no one owns this lease and
    // all callers may acquire. Versions of this library prior to Kubernetes 1.14 will not
    // attempt to acquire leases with empty identities and will wait for the full lease
    // interval to expire before attempting to reacquire. This value is set to empty when
    // a client voluntarily steps down.
    HolderIdentity       string      `json:"holderIdentity"`

The previous assumption that the release would involve ConfigMap
deletion was born with the test in 2b81f47 (cvo: Release our leader
lease when we are gracefully terminated, 2019-01-16, openshift#87).
wking added a commit to wking/cluster-version-operator that referenced this pull request Aug 27, 2020
We've had it since 2b81f47 (cvo: Release our leader lease when we
are gracefully terminated, 2019-01-16, openshift#87), but it's redundant
vs. "ListenAddr is not an empty string".

I'm also switching to:

  o.ListenAddr != ""

instead of:

  len(o.ListenAddr) > 0

because it seems slightly easier to understand, but obviously either
will work.

Cherry-picked from 07e5809 (openshift#349), around conflicts due to the lack
of TLS metrics in 4.5.
wking added a commit to wking/cluster-version-operator that referenced this pull request Aug 27, 2020
So the incoming cluster-version operator doesn't need to wait for the
outgoing operator's lease to expire, which can take a while [1]:

  I0802 10:06:01.056591       1 leaderelection.go:243] attempting to acquire leader lease  openshift-cluster-version/version...
  ...
  I0802 10:07:42.632719       1 leaderelection.go:253] successfully acquired lease openshift-cluster-version/version

and time out the:

  Cluster did not acknowledge request to upgrade in a reasonable time

testcase [2].  Using ReleaseOnCancel has been the plan since
2b81f47 (cvo: Release our leader lease when we are gracefully
terminated, 2019-01-16, openshift#87).  I'm not clear on why it (sometimes?)
doesn't work today.

The discrepancy between the "exit after 2s no matter what" comment and
the 5s After dates back to dbedb7a (cvo: When the CVO restarts,
perform one final sync to write status, 2019-04-27, openshift#179), which
bumped the After from 2s to 5s, but forgot to bump the comment.  I'm
removing that code here in favor of the two-minute timeout from
b30aa0e (pkg/cvo/metrics: Graceful server shutdown, 2020-04-15, openshift#349).
We still exit immediately on a second TERM, for folks who get
impatient waiting for the graceful timeout.

Decouple shutdownContext from the context passed into Options.run, to
allow TestIntegrationCVO_gracefulStepDown to request a graceful
shutdown.  And remove Context.Start(), inlining the logic in
Options.run so we can count and reap the goroutines it used to launch.
This also allows us to be more targeted with the context for each
goroutines:

* Informers are now launched before the lease controller, so they're
  up and running by the time we acquire the lease.  They remain
  running until the main operator CVO.Run() exits, after which we shut
  them down.  Having informers running before we have a lease is
  somewhat expensive in terms of API traffic, but we should rarely
  have two CVO pods competing for leadership since we transitioned to
  the Recreate Deployment strategy in 078686d
  (install/0000_00_cluster-version-operator_03_deployment: Set
  'strategy: Recreate', 2019-03-20, openshift#140) and 5d8a527
  (install/0000_00_cluster-version-operator_03_deployment: Fix
  Recreate strategy, 2019-04-03, openshift#155).  I don't see a way to block on
  their internal goroutine's completion, but maybe informers will grow
  an API for that in the future.

* The metrics server also continues to run until CVO.Run() exits,
  where previously we began gracefully shutting it down at the same
  time we started shutting down CVO.Run().  This ensures we are around
  and publishing any last-minute CVO.Run() changes.

* Leader election also continues to run until CVO.Run() exits.  We
  don't want to release the lease while we're still controlling
  things.

* CVO.Run() and AutoUpdate.Run() both stop immediately when the
  passed-in context is canceled or we call runCancel internally
  (because of a TERM, error from a goroutine, or loss of leadership).
  These are the only two goroutines that are actually writing to the
  API servers, so we want to shut them down as quickly as possible.

Drop an unnecessary runCancel() from the "shutting down" branch of the
error collector.  I'd added it in b30aa0e, but you can only ever
get into the "shutting down" branch if runCancel has already been
called.  And fix the scoping for the shutdownTimer variable so we
don't clear it on each for-loop iteration (oops :p, bug from
b30aa0e).

Add some logging to the error collector, so it's easier to see where
we are in the collection process from the operator logs.  Also start
logging collected goroutines by name, so we can figure out which may
still be outstanding.

Set terminationGracePeriodSeconds 130 to extend the default 30s [3],
to give the container the full two-minute graceful timeout window
before the kubelet steps in with a KILL.

Push the Background() initialization all the way up to the
command-line handler, to make it more obvious that the context is
scoped to the whole 'start' invocation.

[1]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/25365/pull-ci-openshift-origin-master-e2e-gcp-upgrade/1289853267223777280/artifacts/e2e-gcp-upgrade/pods/openshift-cluster-version_cluster-version-operator-5b6ff896c6-57ppb_cluster-version-operator.log
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=1843505#c7
[3]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#podspec-v1-core

Cherry picked from cc1921d (openshift#424), around conflicts due to the lack
of TLS metrics and the Context pivots in 4.5.
wking added a commit to wking/cluster-version-operator that referenced this pull request Aug 27, 2020
…ap lock release

From the godocs:

  $ grep -A5 '// HolderIdentity' vendor/k8s.io/client-go/tools/leaderelection/resourcelock/interface.go
    // HolderIdentity is the ID that owns the lease. If empty, no one owns this lease and
    // all callers may acquire. Versions of this library prior to Kubernetes 1.14 will not
    // attempt to acquire leases with empty identities and will wait for the full lease
    // interval to expire before attempting to reacquire. This value is set to empty when
    // a client voluntarily steps down.
    HolderIdentity       string      `json:"holderIdentity"`

The previous assumption that the release would involve ConfigMap
deletion was born with the test in 2b81f47 (cvo: Release our leader
lease when we are gracefully terminated, 2019-01-16, openshift#87).

Cherry picked from dd09c3f (openshift#424), around conflicts due to the lack
of Context pivots in 4.5.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants