-
Notifications
You must be signed in to change notification settings - Fork 382
Handle polling errors and update status appropriately #2368
Handle polling errors and update status appropriately #2368
Conversation
return c.processServiceInstancePollingFailureRetryTimeout(instance, readyCond) | ||
} | ||
|
||
return c.continuePollingServiceInstance(instance) | ||
if httpErr, ok := osb.IsHTTPError(err); ok { | ||
if isRetriableHTTPStatus(httpErr.StatusCode) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was on the fence whether to keep retrying on getting 400 BadRequest
.
It doesn't make sense to retry polling until the issue is fixed on the broker side though.
It seems okay to do svcat touch instance ...
after fixing the issue on the broker side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that a 400 is a terminal error that should stop polling so the user can fix and then use touch to retry try it. We are working on a new command svcat set instance --param foo=bar
that will let you fix an instance and touch it in a single command, but it's not merged yet.
).msg("Status: 403; ErrorMessage: <nil>; Description: <nil>; ResponseError: <nil>") | ||
if err := checkEvents(events, expectedEvent.stringArr()); err != nil { | ||
).msg("Status: 400; ErrorMessage: <nil>; Description: <nil>; ResponseError: <nil>") | ||
// Event is sent twice: one for Ready condition and one for Failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason we see 2 events is that we send separate events for Ready and Failed condition in https://github.com/kubernetes-incubator/service-catalog/blob/621d4e22f5fd6ad137d350e8122b59e01d0c3845/pkg/controller/controller_instance.go#L2710-L2716
It may be better to send only the Failed one, but this is out of scope of this PR.
@@ -1103,13 +1110,18 @@ func isServiceInstanceProcessedAlready(instance *v1beta1.ServiceInstance) bool { | |||
// processServiceInstancePollingFailureRetryTimeout marks the instance as having | |||
// failed polling due to its reconciliation retry duration expiring | |||
func (c *controller) processServiceInstancePollingFailureRetryTimeout(instance *v1beta1.ServiceInstance, readyCond *v1beta1.ServiceInstanceCondition) error { | |||
msg := "Stopping reconciliation retries because too much time has elapsed" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
send the event here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The event is being sent inside processServiceInstancePollingTerminalFailure
method invoked 2 lines below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or rather in functions invoked by processServiceInstancePollingTerminalFailure
... :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it all looks very spaghetti :-\
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@artkoshelev You're welcome to submit a PR to refactor this for readability! 😀 Nile did the right thing by sticking with fixing just the bug in this PR.
return c.processServiceInstancePollingFailureRetryTimeout(instance, readyCond) | ||
} | ||
|
||
return c.continuePollingServiceInstance(instance) | ||
if httpErr, ok := osb.IsHTTPError(err); ok { | ||
if isRetriableHTTPStatus(httpErr.StatusCode) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that a 400 is a terminal error that should stop polling so the user can fix and then use touch to retry try it. We are working on a new command svcat set instance --param foo=bar
that will let you fix an instance and touch it in a single command, but it's not merged yet.
reason := errorPollingLastOperationReason | ||
message := fmt.Sprintf("Error polling last operation: %v", err) | ||
glog.V(4).Info(pcb.Message(message)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This log message was removed, but any polling error messages should continue to be logged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will bring it back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@@ -1103,13 +1110,18 @@ func isServiceInstanceProcessedAlready(instance *v1beta1.ServiceInstance) bool { | |||
// processServiceInstancePollingFailureRetryTimeout marks the instance as having | |||
// failed polling due to its reconciliation retry duration expiring | |||
func (c *controller) processServiceInstancePollingFailureRetryTimeout(instance *v1beta1.ServiceInstance, readyCond *v1beta1.ServiceInstanceCondition) error { | |||
msg := "Stopping reconciliation retries because too much time has elapsed" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@artkoshelev You're welcome to submit a PR to refactor this for readability! 😀 Nile did the right thing by sticking with fixing just the bug in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: carolynvs The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
…ired#2368) * Handle polling errors and update status appropriately * Added back the log message about polling error
This PR is a
What this PR does / why we need it:
When broker returns any error in the
last_operation
response, we just retry without updating the status. As a result, in case of provisioning we have instance reporting that it's still provisioning, as if nothing is wrong:Which issue(s) this PR fixes
Fixes #2369
Please leave this checklist in the PR comment so that maintainers can ensure a good PR.
Merge Checklist:
breaking the chart release and existing clients who provide a
flag that will get an error when they try to update