Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve versioning documentation #320

Open
SimonLuckenuik opened this issue May 24, 2018 · 15 comments
Open

Improve versioning documentation #320

SimonLuckenuik opened this issue May 24, 2018 · 15 comments

Comments

@SimonLuckenuik
Copy link

I just went through this document: https://docs.microsoft.com/en-us/azure/azure-functions/durable-functions-versioning and the documentation is very light. All solutions sound like data is being lost/pending, except for Function naming

What is the proper strategy for versioning? That seems to be a complex topic and the documentation is very light with no complex example. The mitigation strategies suggested are probably not applicable to most people: "Do nothing" and "Stop all in-flight instances". The goal of creating a stateful workflow is to have it running for a long time, so probably doing nothing and stopping all instances is not appropriate.

Side-by-side deployments:

  • changing the HubName impact is not clear
  • changing the storage accounts sounds like data will be lost
  • function renaming: what will be the behavior if I keep the same orchestration function but add or remove functions with new names?

Could you please elaborate on versioning with specific examples and tutorial / samples ?

This is a very important topic, and I expect that figuring out what is happening in case of an improper versioning will be difficult to track / detect.

Thanks!
Simon

@SimonLuckenuik
Copy link
Author

Related documentation issue I just raised: https://github.com/MicrosoftDocs/azure-docs/issues/9152

@cgillum
Copy link
Member

cgillum commented May 28, 2018

Is it safe to summarize this ask as a detailed "step-by-step guide for side-by-side deployments"? @TsuyoshiUshio this might be a good topic for you.

@marcduiker
Copy link
Contributor

I have a great interest in this topic as well since I want to explain this to clients. @TsuyoshiUshio please let me know if I can help with writing or coding concepts. Your blog is already a good starting point I think.

@SimonLuckenuik
Copy link
Author

@cgillum, sounds like what I am looking after! I am also looking after "Dos and Don'ts/Best Practices/Common Pitfalls" to minimize impact of those versions or completely prevent breaking changes. Example: for the activities inputs and outputs if we manage them with complex entities instead of value types, it is easier to prevent breaking changes of signature changes.

@cgillum
Copy link
Member

cgillum commented May 30, 2018

Good point on best practices (or "tips and tricks" as I sometimes like to think of them). I agree that we need all of this.

@cgillum
Copy link
Member

cgillum commented May 30, 2018

Related documentation issue: #184

@TsuyoshiUshio
Copy link
Collaborator

TsuyoshiUshio commented Jun 8, 2018

Hi @marcduiker @SimonLuckenuik ,

I've done two hackfest relelated this topics. I'd happy to contribute writing Durable Functions DevOps documentation.

Currently I wrote one blog. This solution is suitable for customer has long running process scenario.
Using Event Grid notifications.

Also, I've done one PR for enable us to query all instances status. It helps to check if there is working instances for safe deployment. This feature is going to be merged at the next release.

#323

The usage is something like this.

OrchestrationClient

[FunctionName("GetAllStatus")]
public static async Task Run(
    [HttpTrigger(AuthorizationLevel.Anonymous, "get", "post")]HttpRequestMessage req,
    [OrchestrationClient] DurableOrchestrationClient client,
    TraceWriter log)
{
    var statuses = await client.GetStatusAsync(); // You can pass CancellationToken as a parameter.
    // do something based on the retrun statuses
}

REST API

For Functions 1.0, the request format is as follows:

GET /admin/extensions/DurableTaskExtension/instances/?taskHub={taskHub}&connection={connection}&code={systemKey}

The Functions 2.0 format has all the same parameter but has a slightly different URL prefix:

GET /runtime/webhooks/DurableTaskExtension/instances/?taskHub={taskHub}&connection={connection}&code={systemKey}&showHistory={showHistory}&showHistoryOutput={showHistoryOutput}

@SimonLuckenuik
Copy link
Author

Thank you for the input @TsuyoshiUshio.

Do I understand that the proposed "safe" solution is to make sure that no Activity running in the system before deploying again? This is a bit hard to enforce, let's say that I have an Orchestrator with a timer, at any time that timer could trigger while deploying and break everything? For a high volume scenario, it might be very difficult to make that work...

In the article, there is nothing about status of the Orchestrator while upgrading. Can Orchestrator be "Running" while doing the upgrade (no activity executing, but some activities remaining in the workflow)? If I change the HubName, as suggested, I am assuming that any Orchestrator still "running" will be lost forever (the durable framework will use different storage for it's metadata)?

How long is a "long running process" that you are referring to? Depending on your answer for above statements, if the Orchestrator cannot be "running" than is not well suited for any Orchestrator involving more than a few seconds (max minutes) of execution time, otherwise it means that I would need to wait few days for some orchestrations to complete.

@SimonLuckenuik
Copy link
Author

SimonLuckenuik commented Jun 15, 2018

Other suggestions:

Maybe adding something about Deployment Slot usage would be interesting. What happen if the staging slot is in concurrency with the prod slot?

Anyway to disable durable function in the slot to prevent having old code being executed in concurrency with newer code in prod slot?

@cgillum considering that this is out of preview, I guess that some customers are using that in Production, what are common DevOps scenario you have heard for Durable Functions to prevent any issue (100% safe that everything is executed and nothing is lost)?

@cgillum
Copy link
Member

cgillum commented Jun 15, 2018

@TsuyoshiUshio can speak to that better than I can since he is working directly with some of these customers in Japan, but right now the main approach being used is the Azure Event Grid integration to track orchestration lifecycle across multiple task hubs (which is described above). We've also spoken to customers that have less aggressive requirements, and for them we're creating a REST API that can enumerate the list of all orchestrations in a task hub as a simpler (though less scaleable) solution.

@TsuyoshiUshio is going to put together comprehensive walkthrough documentation which outlines some of the end-to-end mechanisms for implementing DevOps with Durable Functions, and it will also cover these versioning scenarios.

@TsuyoshiUshio
Copy link
Collaborator

Sorry for being late reply. Yes. I'll try to do it. So your use case is very welcome.
@SimonLuckenuik In your case, maybe we need new feature to stop accepting new request feature might be needed. Let's keep on discuss on #184 . :) If there is no feature on the Durable, I'd happy to contribute to implement that.

@TsuyoshiUshio
Copy link
Collaborator

Actually, if you want to upgrade the app, if it is ok if you don't change the orchestrator or activity function interface. however if you change one of these, you need to make sure there is no on-the-fly instances. Since the orchestrator replay according to the storage table. If you change the orchestrator or activity function interfaces, the record of storage table will be unmatch for the new versions.

In short, for the safe deplyment, we need to make sure

  1. There is no on-the-fly instances
  2. If there is on-the-fly instances you can keep it via deployment slot. (however, please check if there is no working instances on the deployment slot)

We can check if there is change for orchestrator / activity function interfaces, however, the pipeline might become complex. For archiving 1. we need to wait finishing the current on-the-fly instances. also, we need to stop accepting the new request. If you want to stop the instance in the middle of the execution, if your functions are idempotent through the orchestration, you might just stop it and replay it. (there is no future for these processes). This is my basic idea.

@ConnorMcMahon
Copy link
Contributor

@SimonLuckenuik

This issue has been open for a while, and I just want to make sure that we understand what the ask here is.

It seems like the ask is to add more explicit/concrete instructions on how to implement side-by-side versioning, and to make our versioining docs a bit more clear that this is the most highly recommended scenario?

@ConnorMcMahon ConnorMcMahon changed the title Enhanced versioning documentation required Improve side-by-side versioning documentation Mar 16, 2021
@ConnorMcMahon
Copy link
Contributor

One thing to note here is that we need to convey the pros/cons of each approach (with more concrete exceptions). It's also worth noting that the introduction of entities changes the calculus of having separate taskhubs for side-by-side deployements...

@ConnorMcMahon ConnorMcMahon added this to the High Priority milestone Jul 7, 2021
@ConnorMcMahon ConnorMcMahon changed the title Improve side-by-side versioning documentation Improve versioning documentation Jul 15, 2021
@conreaux
Copy link

Bumping this topic - Could the RideSharing sample be used as inspiration for versioning discussions? It's a concrete example with potentially frequently-running orchestrations and entities containing application-critical state.
I'm currently at a loss as to how to update an application with live orchestration and entity functions without blocking the client from submitting new orchestrations and then letting in-flight orchestrations run to completion, while maintaining a single representation of entities (i.e., not a side-by-side deployment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants