Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow failed discovery on initial quota controller start #67433

Merged

Conversation

deads2k
Copy link
Contributor

@deads2k deads2k commented Aug 15, 2018

Fixes #65005

Aggregated API servers now correctly provide 503s on discovery endpoints for groups that cannot be reached. This means that the kube-controller-manager process is now sensitive to discovery failures in the quota controller. This change allows discovery failures in the initial quota replenishment controller resource discovery.

@liggitt suspects similar races exist to those he found GC last release, but this pull doesn't make that better or worse.

@kubernetes/sig-api-machinery-bugs

kube-controller-manager can now start the quota controller when discovery results can only be partially determined.

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Aug 15, 2018
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Aug 15, 2018
@@ -161,10 +161,11 @@ func NewResourceQuotaController(options *ResourceQuotaControllerOptions) (*Resou

rq.quotaMonitor = qm

// do initial quota monitor setup
// do initial quota monitor setup. If we have a discovery failure here, it's ok. We'll discover more resources when
// a later sync happens.
resources, err := GetQuotableResources(options.DiscoveryFunc)
if err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this only tolerate IsGroupDiscoveryFailedError errors?

@@ -421,7 +422,9 @@ func (rq *ResourceQuotaController) Sync(discoveryFunc NamespacedResourcesFunc, p
// Something has changed, so track the new state and perform a sync.
oldResources := make(map[schema.GroupVersionResource]struct{})
wait.Until(func() {
// Get the current resource list from discovery.
// Get the current resource list from discovery. A failure here will (should?) prevent updates to the sync list.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is tolerating partial discovery on start but not on resync what we want? I expected them to match. ErrGroupDiscoveryFailed errors expose which group/versions failed, if we want to use that info

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even without inspecting the details of the error, we could do something like this as a simple start. letting the quota controller continue running without using available resource info from discovery doesn't seem great.

// Get the current resource list from discovery.
newResources, err := GetQuotableResources(discoveryFunc)
if err != nil {
	utilruntime.HandleError(err)

	if discovery.IsGroupDiscoveryFailedError(err) && len(newResources) > 0 {
		// In partial discovery cases, don't remove any existing informers, just add new ones
		for k, v := range oldResources {
			newResources[k] = v
		}
	} else {
		// short circuit in non-discovery error cases or if discovery returned zero resources
		return
	}
}

@fedebongio
Copy link
Contributor

/assign @yliaog

@deads2k deads2k force-pushed the controller-02-quotadiscovery branch from dbbfb7d to 4c8e9de Compare August 17, 2018 15:43
@deads2k
Copy link
Contributor Author

deads2k commented Aug 17, 2018

@liggitt comments addressed.

@deads2k
Copy link
Contributor Author

deads2k commented Aug 17, 2018

/retest
/release-note-none

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Aug 17, 2018
@yliaog
Copy link
Contributor

yliaog commented Aug 17, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 17, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, yliaog

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit 8b52ca1 into kubernetes:master Aug 17, 2018
@nilebox
Copy link

nilebox commented Aug 20, 2018

@liggitt can you cherry-pick this change to release-1.11 to unblock #67154 please?

@nilebox
Copy link

nilebox commented Aug 20, 2018

or @deads2k: cherry-pick to release-1.11 and release-1.10 is needed, see above.

@liggitt
Copy link
Member

liggitt commented Aug 20, 2018

added to #67155 and #67154

k8s-github-robot pushed a commit that referenced this pull request Aug 21, 2018
…2-upstream-release-1.10

Automatic merge from submit-queue.

Automated cherry pick of #66932 #67433: Include unavailable API services in discovery response, allow failed discovery on initial quota controller start

Cherry pick of #66932 on release-1.10.

#66932: Include unavailable API services in discovery response
#67433: allow failed discovery on initial quota controller start

```release-note
kube-apiserver now includes all registered API groups in discovery, including registered extension API group/versions for unavailable extension API servers.
kube-controller-manager can now start the quota controller when discovery results can only be partially determined.
```
k8s-github-robot pushed a commit that referenced this pull request Aug 21, 2018
…2-upstream-release-1.11

Automatic merge from submit-queue.

Automated cherry pick of #66932 #67433: Include unavailable API services in discovery response, allow failed discovery on initial quota controller start

Cherry pick of #66932 #67433 on release-1.11.

#66932: Include unavailable API services in discovery response
#67433: allow failed discovery on initial quota controller start

```release-note
kube-apiserver now includes all registered API groups in discovery, including registered extension API group/versions for unavailable extension API servers.
kube-controller-manager can now start the quota controller when discovery results can only be partially determined.
```
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Aug 27, 2018
k8s-github-robot pushed a commit that referenced this pull request Aug 31, 2018
…33-upstream-release-1.9

Automatic merge from submit-queue.

allow failed discovery on initial quota controller start

Cherry pick of #67433 on release-1.9.

#67433: allow failed discovery on initial quota controller start

kube-controller-manager can now start the quota controller when discovery results can only be partially determined.

```release-note
NONE
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants