Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get-last-operation indefinidely returns error if broker is restarted during service provisionning #7

Closed
gberche-orange opened this issue Dec 9, 2019 · 2 comments

Comments

@gberche-orange
Copy link
Member

gberche-orange commented Dec 9, 2019

Expected behavior

As an osb-cmdb operator, in order to operate osb-cmdb onto cloudfoundry without causing downtime, I need osb-cmdb to preserve service during diego cells evacuations

Observed behavior

osb-cmdb is not complying to 12 factors apps:

Following a restart of osb-cmdb, async service provisionning hangs/fails

  • OSB-cmdb get last operation systematically returns an error (presumably 500 status code)
  • OSB client (e.g. CF) keeps polling the service instance being provisioned. Each polling triggers an error log entry which fails the osb-cmdb smoke test

Root cause

  • OSB-cmdb get last operation returns the content of the InMemoryServiceInstanceStateRepository which was clear during last broker restart

Possible fixes

  • Use a durable persistent ServiceInstanceStateRepository implementation (e.g. using a mysql database) instead of InMemoryServiceInstanceStateRepository
  • Override the default SCAB implementation of get last operation to lookup the status of the backing service instance(s) (along with possible backing application(s))

Workaround

  • Have the osb client issue an unprovision request for the faulty service instance guid. If the client is CF is is equivalent of a cf curl -X DELETE v2/service_instances/a005a22e-3684-423a-ad25-c5ad63ce0ca1

Affected release

Reproduced on version x.y
-->

@gberche-orange
Copy link
Member Author

gberche-orange commented Feb 13, 2020

Associated symptom

ERROR 20 --- [or-http-epoll-4] s.c.ServiceBrokerWebFluxExceptionHandler : Unknown exception handled:
 java.lang.IllegalArgumentException: Unknown service instance ID a005a22e-3684-423a-ad25-c5ad63ce0ca1
 	at org.springframework.cloud.appbroker.state.InMemoryServiceInstanceStateRepository.lambda$null$3(InMemoryServiceInstanceStateRepository.java:47) ~[spring-cloud-app-broker-core-1.0.4.BUILD-SNAPSHOT.jar!/:1.0.4


 	org.springframework.cloud.appbroker.state.InMemoryServiceInstanceStateRepository.lambda$null$3(InMemoryServiceInstanceStateRepository.java:47)
 Error has been observed by the following operator(s):
 	|_	Mono.error ⇢ org.springframework.cloud.appbroker.state.InMemoryServiceInstanceStateRepository.lambda$null$3(InMemoryServiceInstanceStateRepository.java:47)
 	|_	Mono.defer ⇢ org.springframework.cloud.appbroker.state.InMemoryServiceInstanceStateRepository.lambda$getState$4(InMemoryServiceInstanceStateRepository.java:42)
 	|_	Mono.flatMap ⇢ org.springframework.cloud.appbroker.state.InMemoryServiceInstanceStateRepository.getState(InMemoryServiceInstanceStateRepository.java:42)
 	|_	Mono.doOnError ⇢ org.springframework.cloud.appbroker.service.WorkflowServiceInstanceService.getLastOperation(WorkflowServiceInstanceService.java:210)
 	|_	Mono.map ⇢ org.springframework.cloud.appbroker.service.WorkflowServiceInstanceService.getLastOperation(WorkflowServiceInstanceService.java:211)
 	|_	Flux.then ⇢ org.springframework.cloud.servicebroker.service.ServiceInstanceEventService.getLastOperation(ServiceInstanceEventService.java:69)
 	|_	Flux.then ⇢ org.springframework.cloud.servicebroker.service.ServiceInstanceEventService.lambda$getLastOperation$2(ServiceInstanceEventService.java:71)
 	|_	Mono.onErrorResume ⇢ org.springframework.cloud.servicebroker.service.ServiceInstanceEventService.getLastOperation(ServiceInstanceEventService.java:70)
 	|_	Mono.flatMap ⇢ org.springframework.cloud.servicebroker.service.ServiceInstanceEventService.getLastOperation(ServiceInstanceEventService.java:72)
 	|_	Mono.doOnRequest ⇢ org.springframework.cloud.servicebroker.controller.ServiceInstanceController.lambda$getServiceInstanceLastOperation$15(ServiceInstanceController.java:169)
 	|_	Mono.doOnSuccess ⇢ org.springframework.cloud.servicebroker.controller.ServiceInstanceController.lambda$getServiceInstanceLastOperation$15(ServiceInstanceController.java:170)
 	|_	Mono.flatMap ⇢ org.springframework.cloud.servicebroker.controller.ServiceInstanceController.getServiceInstanceLastOperation(ServiceInstanceController.java:168)
 	|_	Mono.map ⇢ org.springframework.cloud.servicebroker.controller.ServiceInstanceController.getServiceInstanceLastOperation(ServiceInstanceController.java:173)
 	|_	Mono.flatMap ⇢ org.springframework.web.reactive.result.method.annotation.ResponseEntityResultHandler.handleResult(ResponseEntityResultHandler.java:130)

To get traces of the associated service instance guid:

# lookup smoke test service instance if it still exists 
cf curl /v2/service_instances/a005a22e-3684-423a-ad25-c5ad63ce0ca1

      "last_operation": {
         "type": "create",
         "state": "in progress",
         "description": "create service instance started",
         "updated_at": "2020-02-13T11:24:39Z",
         "created_at": "2020-02-13T11:23:35Z"
      },


# when smoke test service instance does not exists, look at audit events
cf curl '/v2/events?q=actee:a005a22e-3684-423a-ad25-c5ad63ce0ca1'

# if an osb-cmdb backend service instance
cf curl "/v3/service_instances?label_selector=backing_service_instance_guid==a005a22e-3684-423a-ad25-c5ad63ce0ca1"

Check the restart history of the broker

cf events osb-cmdb-broker-1
Getting events for app osb-cmdb-broker-1 in org system_domain / space osb-cmdb-broker-1 as me...

time                          event                      actor    description
2020-02-13T12:41:18.00+0100   audit.app.droplet.create   coa-cf
2020-02-13T12:40:58.00+0100   audit.app.update           coa-cf   state: STARTED
2020-02-13T12:40:58.00+0100   audit.app.build.create     coa-cf
2020-02-13T12:40:57.00+0100   audit.app.update           coa-cf   state: STOPPED
2020-02-13T12:40:47.00+0100   audit.app.upload-bits      coa-cf
2020-02-13T12:40:45.00+0100   audit.app.update           coa-cf   instances: 1, memory: 2048, environment_json: [PRIVATE DATA HIDDEN]
2020-02-13T12:25:09.00+0100   audit.app.droplet.create   coa-cf
2020-02-13T12:24:48.00+0100   audit.app.update           coa-cf   state: STARTED
2020-02-13T12:24:48.00+0100   audit.app.build.create     coa-cf
2020-02-13T12:24:48.00+0100   audit.app.update           coa-cf   state: STOPPED
2020-02-13T12:24:38.00+0100   audit.app.upload-bits      coa-cf
2020-02-13T12:24:36.00+0100   audit.app.update           coa-cf   instances: 1, memory: 2048, environment_json: [PRIVATE DATA HIDDEN]

@gberche-orange gberche-orange changed the title osb-cmdb is not complying to 12 factors apps: breaks if restarts during service provisionning osb-cmdb is not complying to 12 factors apps: get-last-operation returns error if restarted during service provisionning Feb 20, 2020
@gberche-orange gberche-orange changed the title osb-cmdb is not complying to 12 factors apps: get-last-operation returns error if restarted during service provisionning get-last-operation indefinidely returns error if broker is restarted during service provisionning Feb 20, 2020
@gberche-orange gberche-orange mentioned this issue May 6, 2020
6 tasks
@gberche-orange
Copy link
Member Author

fixed in v1.0.0: osb-cmdb now relying of osb api operation state to maintain it state, and does not maintain state in the broker RAM anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant