Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support async binding operations #137

Closed
bmelville opened this issue Feb 16, 2017 · 72 comments
Closed

Support async binding operations #137

bmelville opened this issue Feb 16, 2017 · 72 comments
Assignees

Comments

@bmelville
Copy link
Contributor

bmelville commented Feb 16, 2017

Binding to an instance may encapsulate a long running workflow, similar to creating an instance. This might include provisioning new user accounts in a service, setting up firewalls, etc. It seems like the Bind API would benefit from the same last_operation model that Provision uses, and I'd like to get feedback from others who may have more context around this.

Latest proposal: https://github.com/mattmcneeney/servicebroker/blob/async-bindings/spec.md#binding

@ccemeraldeyes
Copy link

+1. I have already run into a scenario where I needed async binding (Google CloudSQL).

@angarg12
Copy link
Contributor

We also had the situation where bindings need to return information, potentially asynchronously.

This raises the question of whether it would be better to implement the last_operation function, or a fully fledged GET method on bindings.

Would a GET operation cover your use cases?

@RamakrishnanArun
Copy link

Would that be adding additional information in the response to the GET /last_operation?

@avade
Copy link
Contributor

avade commented Feb 17, 2017 via email

@avade
Copy link
Contributor

avade commented Feb 23, 2017

Unless there is any further discussion I would suggest that @bmelville creates the proposed spec changes on a branch. This can then be shared with the cf-dev mailing list.

I would support this feature.

Any concerns @shalako?

@avade
Copy link
Contributor

avade commented Mar 2, 2017

@vaikas-google will you be picking up drafting the spec changes in a branch?

@vaikas
Copy link
Contributor

vaikas commented Mar 2, 2017 via email

@avade
Copy link
Contributor

avade commented Mar 16, 2017

It is also interesting the spec doesn't include timeout information.

Currently, Cloud Foundry times out binding requests after 60 seconds by default. This is configurable by the operator on a global basis, meaning it applies to all brokers.

https://github.com/cloudfoundry/cloud_controller_ng/blob/master/bosh/jobs/cloud_controller_ng/spec#L570

@pmorie does service catalog have any request timeout logic that you are enforcing?

@mattmcneeney
Copy link
Contributor

I've created a branch that adds async binding and unbinding to the spec: View proposed spec (or changes).

Please provide any feedback you have on this change. The spec changes are relatively straightforward, making use of the workflows already in place for provisioning, updating and deprovisioning. Thanks!

@mattmcneeney
Copy link
Contributor

Following yesterday's call with the working group, I've updated the branch with an updated spec containing a last operation endpoint for service bindings.

Let us know your thoughts, thanks!

@gberche-orange
Copy link
Contributor

@mattmcneeney I understand that the synchronous binding endpoint PUT /v2/service_instances/:instance_id/service_bindings/:binding_id returns the credentials and there is no GET endpoint to retrieve credentials afterwards.

However, I don't see the credentials being returned by the last operation GET /v2/service_instances/:instance_id/service_bindings/:binding_id/last_operation endpoint

c1f610a#commitcomment-21437274

How would platform clients fetch the binding credentials ?

@mattmcneeney
Copy link
Contributor

Good spot @gberche-orange, it should be in the asynchronous response too.

I've updated the spec so that credentials (and other information) can be included in the last service binding operation endpoint response.

@avade
Copy link
Contributor

avade commented Mar 22, 2017

@mattmcneeney as we discussed, this seems like the simplest path forward for brokers.

It does mean brokers that support async responses would need to hold on to the credentials / be able to retrieve them until "last_operation" is called.

@mattmcneeney
Copy link
Contributor

Thanks for the feedback on this so far everyone. Does anyone else have any further thoughts on the proposed spec?

@gberche-orange
Copy link
Contributor

thanks @mattmcneeney for the update to the GET /v2/service_instances/:instance_id/service_bindings/:binding_id/last_operation endpoint response.

I wonder whether the specification should mandate that the last_operation endpoint only returns once the credentials, i.e. within the response where "state": "succeeded" was returned.

This would make it consistent with the current synchronous behavior for which there is currently no GET endpoint to retrieve credentials, possibly strengthening security by reducing credentials undesired retransmits over the wire, and reducing the duration into which broker should manage credentials state.

This would be a more defensive wording than the current

Returning "state": "succeeded" or "state": "failed" will cause the platform to cease polling.

@mattmcneeney
Copy link
Contributor

mattmcneeney commented Mar 28, 2017

An alternative solution would be to add an additional GET endpoint that would return the binding information. The platform talking to the broker could handle permissions on who could actually see any information returned in that call (such as credentials).

Additional logic around how long service brokers have to keep credentials for (ie. until the last_operation endpoint has been polled and a 'succeeded' response was returned) seems complex, but you're right that it would work more similarly to the synchronous behaviour.


Edit: There may be overlap with issue Fetching current input parameters values (for a service or a binding) regarding the alternative solution mentioned above.

@gberche-orange
Copy link
Contributor

I don't have strong a opinion on either approach.

Potential drawbacks I can think of for new GET /v2/service_instances/:instance_id/service_bindings/:binding_id new endpoint:

  • It seems that adding this new GET endpoint for credentials further pollutes the API specs for brokers that would only need to support support synchronous bindings.
  • client callers might get confused in expecting that the GET /v2/service_instances/:instance_id/service_bindings/:binding_id new endpoint would also be returning synchronously created binding credentials

@mattmcneeney
Copy link
Contributor

I agree with those drawbacks @gberche-orange.

Since this overlaps with issue #159 regarding Fetching current input parameters values (for a service or a binding), we can wait to see what everyone's thoughts are on that issue before making a decision here.

@duglin
Copy link
Contributor

duglin commented Apr 6, 2017

For those who want to see a diff, I think this is the link:
master...avade:async-binding

@duglin
Copy link
Contributor

duglin commented Apr 6, 2017

Couple of comments:

  • master...avade:async-binding#diff-958e7270f96f5407d7d980f500805b1bR328 talks about bindings instead of provisioning.
  • why are service_id and plan_id needed on: master...avade:async-binding#diff-958e7270f96f5407d7d980f500805b1bR350 ? The text says they should be there but I'm not clear on why and what happens if they don't match the previous binding request? If they're optional it seems like we should just remove them since the broker is expected to work w/o them.
  • same question for "operation". Are brokers expected to support returning the status of a bind() and unbind() at the same time? Or just the last one invoked? It seems like "operation" only makes sense if the platform can query for a previous command, but then of course that makes the phase "last_operation" kind of misleading.
  • for the previous 2 comments, if the answer is "they're there just for consistency and not for technical reasons" then ok - I can live with that, but then we may want to consider deprecating them.
  • on master...avade:async-binding#diff-958e7270f96f5407d7d980f500805b1bR367 the text implies they should keep polling until they get a 200, but it seems to me that they should stop on other errors too, like a 404.
  • Along those lines, do we need to add text to say that brokers should be able to return a 200 on this new API immediately once the binding() response is sent. Meaning, waiting until the async op completes before this URL is valid would not be appropriate.
  • master...avade:async-binding#diff-958e7270f96f5407d7d980f500805b1bR685 add a period before "Additionally".
  • there are spots where we talk about "broker clients" ( master...avade:async-binding#diff-958e7270f96f5407d7d980f500805b1bR701 ) and spots where we say "platform" - we may want to be consistent. This might be an issue for the spec itself and not this change though. Just noticed it.

@mattmcneeney
Copy link
Contributor

Thanks for the feedback @duglin. In response to the above:

  • The reference to binding was a mistake. Fixed.
  • The service_id, plan_id and operation were left in for consistency. I'm not sure how they are used currently for the existing last_operation endpoint. Anyone?
  • Agreed that the existing and new endpoints should clarify to stop polling on an error also. Fixed.
  • I'm not sure I understand do we need to add text to say that brokers should be able to return a 200.... Brokers supporting async bindings should return a 202.
  • Period added before "Additionally"
  • I agree with the inconsistent broker client vs platform. I've fixed it in this branch so that we only refer to platform, but happy to move to another branch if preferred (or use broker client instead; platform made more sense to me from a CF perspective).

@mattmcneeney
Copy link
Contributor

Updated spec: link

@duglin
Copy link
Contributor

duglin commented Apr 6, 2017

@mattmcneeney re: "I'm not sure I understand do we need to add text to say that brokers should be able to return a 200.... Brokers supporting async bindings should return a 202."... right now a broker should return a 202 in response to the provision() and bind() operations, if they're doing to do it asynchronously. However, what's not clear is when the URL for the "last_operation" because valid/alive for the new resource. I think its implied that it should be alive immediately but the spec doesn't say so. Which means, its possible for someone to implement it in such a way that the URL will return a 404 until the resource is fully ready. For example, if they simply accept the incoming request and stick it in a queue, then return a 202. They may not do the work to make the polling endpoint available until something starts to processes that entry the queue.

The reason I thought about this is because of a previous comment about error responses to the async polling causing the platform to stop polling. Well, if they stop on a 404 and the broker is kind of goofy and returns a 404 until the resource is ready then the platform will stop prematurely.

I think simply adding text along the lines of:

Once a broker returns a 202 in response to a provision or binding request, the corresponding
`last_operation` endpoint should be ready to respond to polling requests immediately.

should cover it.

@mattmcneeney
Copy link
Contributor

Aha, I understand. Good spot, the spec was definitely a bit loose. I'll add the above note to the spec for all of the 202 responses.

@mattmcneeney
Copy link
Contributor

mattmcneeney commented Apr 6, 2017

@duglin How's this?
(Whole spec)

@duglin
Copy link
Contributor

duglin commented Apr 6, 2017

@mattmcneeney looks good - thanks!

@vaikas
Copy link
Contributor

vaikas commented Sep 12, 2017

@duglin Regarding handing the credentials (or any other binding related return values) right up front kind of defeats the purpose of the async provisioning. If they take any time to create, then why even bother with the async request?

@vaikas
Copy link
Contributor

vaikas commented Sep 12, 2017

@duglin I also think that the binding should be gettable resource and in that case last operation polling should only be used to tell whether the requested resource (binding) was created successfully or if it failed, polluting that response with the resulting resource information seems wonky to me. It's a single GET so the overhead for doing that is minimal and I thought we wanted to make Bindings gettable so why wouldn't we return it there for consistency? By consistency I mean if bindings are gettable resource and hence useful for reconciling differences then wouldn't it make more sense to just have one place to fetch the credentials?

@duglin
Copy link
Contributor

duglin commented Sep 12, 2017

@vaikas-google I agree that returning it on the 202 makes little sense, but having the spec say it SHOULD be there is going to be problematic - because SHOULD is a soft-MUST. I'd like to get @shalako's take on why it isn't a MUST to better understand the original intent - that might provide some insight into whether we can change it to a MAY instead - which I think is the best we can do on that.

My reasons for allowing the creds to flow back on the final last_operation response is that it just feels like a waste of effort to force all platforms to issue a second request every time. Also not everyone will support GET so we'll need to get the creds for an async request w/o it. Which means there will be some scenarios where the last_op response will be the only way to get the creds - so for consistency it seems like it should be in there regardless of whether GET is supported or not.

@vaikas
Copy link
Contributor

vaikas commented Sep 12, 2017

Seems like since there are no brokers that support async ops for bindings (this is for bindings only, right?), it seems that we have a bit of an opportunity to create a good flow that makes sense. So, one could say that in order to support async bindings, you have to do couple of things:

  1. 202 that points to polling that tells the caller when to stop polling [either success or failure]
  2. once 202 returns success, call GET

So, broker writer must support 202 + last_operation + GET to get async bindings support.

@duglin
Copy link
Contributor

duglin commented Sep 12, 2017

That is certainly an option, but then we're inconsistent with async provisioning. I'm also not sure we can require GET since that mandates stateful brokers. I suspect that might be why the 202 is supposed to return the creds, but that's just a guess.

We should discuss this tomorrow... although, @mattmcneeney may not be on the call.

@mattmcneeney
Copy link
Contributor

mattmcneeney commented Sep 12, 2017 via email

@duglin
Copy link
Contributor

duglin commented Sep 12, 2017

I'm not following the "how long" question. If we're going to mandate GET then there's no real data to hold on to for last_op, at least in the successful case since last_op will know the operation was successful since the broker should have all of the info needed.

Yes, returning no other info on last_op is consistent with async provision, but I would like to suggest that we allow creds on the async provision last_op response to be consistent with our new async binding. I say this because I can imagine a case where a stateless broker can't support GET in general but is willing to hold on to the state long enough to return the last_op response- like we expect them to do today for async provision. And in that case they can probably hold on to the creds too for that short amount of time.

At the f2f we opted to not introduce the more correct set of URLs because we didn't want to be inconsistent or offer more than one way to do something. By having async bindings look so different from async provision it feels like we're back-tracking on that consistency decision and leaving the spec in kind of a weird state.

@mattmcneeney
Copy link
Contributor

mattmcneeney commented Sep 12, 2017 via email

@vaikas
Copy link
Contributor

vaikas commented Sep 12, 2017

I would think GET endpoint should return the binding data for as long as the resource (binding) is around.

@kibbles-n-bytes
Copy link
Contributor

kibbles-n-bytes commented Sep 12, 2017

I'm not too concerned with making the async binding flow exactly match the async provision flow. If we're talking ideal design, I could even see the argument that the async provision response should NOT return the dashboard URL, as that is also something that might take a while to get set up correctly, and its presence is more an artifact of not having GETs in the spec at that time.

Ultimately, in my opinion the GET is the most consistent and clear way to get the credentials back. I view credentials as out of scope of what an operation resource is meant to do, which is a generic resource that just reports on the status of an ongoing/completed process. And supporting credentials being passed back with the 202 but not being accessible is just a headache for the platform and I suspect not a common use case. If you're going to support binding asynchronously then I think it's perfectly fine to say you have to be stateful at that point.

@bmelville
Copy link
Contributor Author

Doug, are you suggesting that last_op return credentials only once? That seems scary from the perspective of idempotency and could easily lead to lost data.

I will also +1 Kibbe's suggestion that in a true async provisioning world, even dashboard URL may not be available until the instance has been provisioned. Any async operation must allow for the results of that operation to be made available only after it completes.

@duglin
Copy link
Contributor

duglin commented Sep 13, 2017

@bmelville no, I'm suggesting that async bindings should have a similar UX as async provisioning to provide consistency between the two. Whether last_op can be called again, even after its said the op is done, is not clear in the spec and I would be fine with it returning the same data over and over until the broker decides to "not remember" that operation any more.

Let me try to be a bit more clear since I probably wasn't in the past....

  • I think consistency between the two async ops is important.
  • while async bindings are new, and therefore technically we have the freedom to do what is architecturally right, from an UX perspective if we can get the same semantic results via a less clean method that's consistent with async provisioning I think we should. I'm more concerned about presenting an inconsistent API in the v2 time frame. We can "do it right" in v3.
  • while the spec says the async response (the 202) SHOULD return the same result as if it was a sync op, we have some wiggle room here to add something like "... SHOULD be the same as if the broker were serving the request synchronously - but if the broker does not have this information available immediately, then it MAY choose to return it on the "successful" last_operation response".
  • I think returning it on the final last_op is critical because that's the only required endpoint a broker MUST support if it supports async provisioning. So that's the same guaranteed method I think we should offer for async binding.
  • While I agree that returning the creds on the GET is ideal, we have to assume that GET may not be supported by everyone - even if they do support async bindings. I view these two features as independent things and even a stateless broker (that can't support GET) might still support returning creds in the final last_op response. Therefore, if we can't mandate GET on async provisioing, we shouldn't mandate it on async bindings - this leaves us with only the last_op response as the message flow to send this data - when GET is not supported.
  • GET (for both) should be optional and should return the full resource in question.

That's at least my current thinking on it.

@vaikas
Copy link
Contributor

vaikas commented Sep 13, 2017

My issue is with the first two points. Why is it important?
I don't understand from the UX perspective why this has any bearing. The platform should be able to provide the UX that basically goes like this for bindings:

  • I started something (initial post -> 202)
  • provide status updates (by polling the last operation)
  • provide results (once the polling completes, platform does a get on the resource).

and for service instances:

  • I started something (initial post -> 202 -> stash the URL away but not show it)
  • provide status updates (by polling the last operation)
  • provide results (once the polling completes, show the URL).

I think we're getting distracted by the dashboard URL because apparently it can be precomputed ahead of time, but has no bearing on the actual resource (Service Instance) being created and mixing that with the result of the operation is making things more complicated.

I'd argue again that baking returning information from the last_operation is a bad idea because then you:

  • lose all the benefits of schemas that a binding response can have [or make it more complicated]
  • last_response response is not some sort of union
  • it's unclean from semantics point

So, your second to last point is what I'd like to focus on. We're designing a new experience for async bindings. If you are saying that the broker can return the credentials (multiple times, mind you since we can't assume that things don't fall on the floor) in the last_operation, it clearly has to hang onto them somehow (or be able to get a mapping from last operation to them and if that's the case, then they could more than likely return or compute the mapping from the serviceinstance/binding as well in order to compute the value for last_operation), and I have a hard time seeing how you could return them in the last_operation but not in the get response. I'm sure such a system could be built, but I'm not sure how realistic it is.
I feel like there's a lot of assumptions baked in there about what can't be done that's leading us down a path of complicated and unclean API :)

@mattmcneeney
Copy link
Contributor

I missed last week's call in which you may have discussed this, but I agree with many of the points above. Summarising my thoughts:

  • We voted at the face to face to favour consistency of the last_operation endpoints over the 'best' workflow
  • However, since async bindings is a new feature that no brokers currently support, I believe we can design this with a bit of flexibility over how 'consistent' we are with async service instance provisioning
  • If a broker decides to support this new feature, I would like to say that the broker therefore MUST support the GET service bindings endpoint
  • It is unrealistic to state that an async binding request should return the same information immediately as for a sync binding request. This defeats the point of offering async bindings and the immediate return of dashboard_url is a bit of a red herring here (as that URL doesn't actually have to be immediately available)

I'm going to make a change to the spec regarding my last point here, and it would be great to discuss this again on next week's call?

@mattmcneeney
Copy link
Contributor

Branch updated: https://github.com/mattmcneeney/servicebroker/blob/async-bindings/spec.md
Shout if anyone else wants write access.

@bmelville
Copy link
Contributor Author

bmelville commented Sep 15, 2017

I agree with your points, except for dashboard_url being a red herring. The URL may be specific to a dynamic IP or port, which may not be known until provisioned in the async process. I think this can be captured as a separate issue.

Edit: #170 already exists.

@vaikas
Copy link
Contributor

vaikas commented Sep 15, 2017 via email

@kibbles-n-bytes
Copy link
Contributor

kibbles-n-bytes commented Sep 28, 2017

A couple of nits on the status codes for binding (here):

  • We should include MUST be returned if the broker only supports asynchronous binding and the request did not include ?accepts_incomplete=true. for a 422 Unprocessable Entity response code.
  • For 200 OK, we may want to have something analogous to "is fully provisioned" from the instance case.

@mattmcneeney
Copy link
Contributor

Good points @kibbles-n-bytes - branch updated!

How do the interested folks feel about this branch now?

Spec: https://github.com/mattmcneeney/servicebroker/blob/async-bindings/spec.md

Diff against get-endpoints: mattmcneeney/servicebroker@get-endpoints...mattmcneeney:async-bindings

@kibbles-n-bytes
Copy link
Contributor

Thanks @mattmcneeney!

We should also document 422 Unprocessable Entity in the unbind response status codes (here)

@mattmcneeney
Copy link
Contributor

Good point. That made me re-check the 422 description for Bindings. I only had one error described regarding the required app_guid field, so I've updated that too. Thanks @kibbles-n-bytes !

@duglin
Copy link
Contributor

duglin commented Nov 7, 2017

Can we close this in favor of #334 ?

@mattmcneeney
Copy link
Contributor

@duglin fine by me as long as folks are happy to continue discussions on the PR

@jmrodri
Copy link

jmrodri commented Nov 7, 2017

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests