Handle polling errors and update status appropriately #2368

nilebox · 2018-09-28T03:50:25Z

This PR is a

Feature
Bug Fix
Documentation

What this PR does / why we need it:
When broker returns any error in the last_operation response, we just retry without updating the status. As a result, in case of provisioning we have instance reporting that it's still provisioning, as if nothing is wrong:

  conditions:
  - lastTransitionTime: 2018-09-20T07:10:16Z
    message: The instance is being provisioned asynchronously
    reason: Provisioning
    status: "False"
    type: Ready

Which issue(s) this PR fixes

Fixes #2369

Please leave this checklist in the PR comment so that maintainers can ensure a good PR.

Merge Checklist:

New feature
- Tests
- Documentation
SVCat CLI flag
Server Flag for config
- Chart changes
- removing a flag by marking deprecated and hiding to avoid
  breaking the chart release and existing clients who provide a
  flag that will get an error when they try to update

nilebox · 2018-09-28T04:01:21Z

pkg/controller/controller_instance.go

 			return c.processServiceInstancePollingFailureRetryTimeout(instance, readyCond)
 		}

-		return c.continuePollingServiceInstance(instance)
+		if httpErr, ok := osb.IsHTTPError(err); ok {
+			if isRetriableHTTPStatus(httpErr.StatusCode) {


I was on the fence whether to keep retrying on getting 400 BadRequest.
It doesn't make sense to retry polling until the issue is fixed on the broker side though.

It seems okay to do svcat touch instance ... after fixing the issue on the broker side?

I agree that a 400 is a terminal error that should stop polling so the user can fix and then use touch to retry try it. We are working on a new command svcat set instance --param foo=bar that will let you fix an instance and touch it in a single command, but it's not merged yet.

nilebox · 2018-09-28T04:04:38Z

pkg/controller/controller_instance_test.go

-	).msg("Status: 403; ErrorMessage: <nil>; Description: <nil>; ResponseError: <nil>")
-	if err := checkEvents(events, expectedEvent.stringArr()); err != nil {
+	).msg("Status: 400; ErrorMessage: <nil>; Description: <nil>; ResponseError: <nil>")
+	// Event is sent twice: one for Ready condition and one for Failed


The reason we see 2 events is that we send separate events for Ready and Failed condition in https://github.com/kubernetes-incubator/service-catalog/blob/621d4e22f5fd6ad137d350e8122b59e01d0c3845/pkg/controller/controller_instance.go#L2710-L2716

It may be better to send only the Failed one, but this is out of scope of this PR.

artkoshelev · 2018-09-28T04:51:20Z

pkg/controller/controller_instance.go

@@ -1103,13 +1110,18 @@ func isServiceInstanceProcessedAlready(instance *v1beta1.ServiceInstance) bool {
 // processServiceInstancePollingFailureRetryTimeout marks the instance as having
 // failed polling due to its reconciliation retry duration expiring
 func (c *controller) processServiceInstancePollingFailureRetryTimeout(instance *v1beta1.ServiceInstance, readyCond *v1beta1.ServiceInstanceCondition) error {
+	msg := "Stopping reconciliation retries because too much time has elapsed"


send the event here?

The event is being sent inside processServiceInstancePollingTerminalFailure method invoked 2 lines below.

Or rather in functions invoked by processServiceInstancePollingTerminalFailure... :)

it all looks very spaghetti :-\

@artkoshelev You're welcome to submit a PR to refactor this for readability! 😀 Nile did the right thing by sticking with fixing just the bug in this PR.

carolynvs · 2018-09-28T13:27:26Z

pkg/controller/controller_instance.go

 			return c.processServiceInstancePollingFailureRetryTimeout(instance, readyCond)
 		}

-		return c.continuePollingServiceInstance(instance)
+		if httpErr, ok := osb.IsHTTPError(err); ok {
+			if isRetriableHTTPStatus(httpErr.StatusCode) {


I agree that a 400 is a terminal error that should stop polling so the user can fix and then use touch to retry try it. We are working on a new command svcat set instance --param foo=bar that will let you fix an instance and touch it in a single command, but it's not merged yet.

carolynvs · 2018-09-28T13:28:24Z

pkg/controller/controller_instance.go

 		reason := errorPollingLastOperationReason
 		message := fmt.Sprintf("Error polling last operation: %v", err)
-		glog.V(4).Info(pcb.Message(message))


This log message was removed, but any polling error messages should continue to be logged.

I will bring it back.

carolynvs · 2018-09-28T13:30:55Z

pkg/controller/controller_instance.go

@@ -1103,13 +1110,18 @@ func isServiceInstanceProcessedAlready(instance *v1beta1.ServiceInstance) bool {
 // processServiceInstancePollingFailureRetryTimeout marks the instance as having
 // failed polling due to its reconciliation retry duration expiring
 func (c *controller) processServiceInstancePollingFailureRetryTimeout(instance *v1beta1.ServiceInstance, readyCond *v1beta1.ServiceInstanceCondition) error {
+	msg := "Stopping reconciliation retries because too much time has elapsed"


@artkoshelev You're welcome to submit a PR to refactor this for readability! 😀 Nile did the right thing by sticking with fixing just the bug in this PR.

carolynvs

/approve

k8s-ci-robot · 2018-09-29T13:51:27Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: carolynvs

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [carolynvs]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jboyd01 · 2018-10-01T13:27:25Z

/lgtm

…ired#2368) * Handle polling errors and update status appropriately * Added back the log message about polling error

Handle polling errors and update status appropriately

979d5bb

k8s-ci-robot requested review from duglin and jeremyrickard September 28, 2018 03:50

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 28, 2018

nilebox commented Sep 28, 2018

View reviewed changes

artkoshelev reviewed Sep 28, 2018

View reviewed changes

carolynvs reviewed Sep 28, 2018

View reviewed changes

Added back the log message about polling error

5da58fd

carolynvs approved these changes Sep 29, 2018

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 29, 2018

k8s-ci-robot assigned jboyd01 Oct 1, 2018

k8s-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 1, 2018

k8s-ci-robot merged commit b1a1c45 into kubernetes-retired:master Oct 1, 2018

cblecker unassigned jboyd01 Jun 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle polling errors and update status appropriately #2368

Handle polling errors and update status appropriately #2368

nilebox commented Sep 28, 2018 •

edited

Loading

nilebox Sep 28, 2018

carolynvs Sep 28, 2018

nilebox Sep 28, 2018

artkoshelev Sep 28, 2018

nilebox Sep 28, 2018

nilebox Sep 28, 2018

artkoshelev Sep 28, 2018

carolynvs Sep 28, 2018

carolynvs Sep 28, 2018

carolynvs Sep 28, 2018

nilebox Sep 29, 2018

nilebox Sep 29, 2018

carolynvs Sep 28, 2018

carolynvs left a comment

k8s-ci-robot commented Sep 29, 2018

jboyd01 commented Oct 1, 2018

Handle polling errors and update status appropriately #2368

Handle polling errors and update status appropriately #2368

Conversation

nilebox commented Sep 28, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carolynvs left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Sep 29, 2018

jboyd01 commented Oct 1, 2018

nilebox commented Sep 28, 2018 •

edited

Loading