Improve versioning documentation #320

SimonLuckenuik · 2018-05-24T19:16:21Z

I just went through this document: https://docs.microsoft.com/en-us/azure/azure-functions/durable-functions-versioning and the documentation is very light. All solutions sound like data is being lost/pending, except for Function naming

What is the proper strategy for versioning? That seems to be a complex topic and the documentation is very light with no complex example. The mitigation strategies suggested are probably not applicable to most people: "Do nothing" and "Stop all in-flight instances". The goal of creating a stateful workflow is to have it running for a long time, so probably doing nothing and stopping all instances is not appropriate.

Side-by-side deployments:

changing the HubName impact is not clear
changing the storage accounts sounds like data will be lost
function renaming: what will be the behavior if I keep the same orchestration function but add or remove functions with new names?

Could you please elaborate on versioning with specific examples and tutorial / samples ?

This is a very important topic, and I expect that figuring out what is happening in case of an improper versioning will be difficult to track / detect.

Thanks!
Simon

SimonLuckenuik · 2018-05-24T19:41:21Z

Related documentation issue I just raised: https://github.com/MicrosoftDocs/azure-docs/issues/9152

cgillum · 2018-05-28T00:41:24Z

Is it safe to summarize this ask as a detailed "step-by-step guide for side-by-side deployments"? @TsuyoshiUshio this might be a good topic for you.

marcduiker · 2018-05-28T09:46:02Z

I have a great interest in this topic as well since I want to explain this to clients. @TsuyoshiUshio please let me know if I can help with writing or coding concepts. Your blog is already a good starting point I think.

SimonLuckenuik · 2018-05-28T17:37:01Z

@cgillum, sounds like what I am looking after! I am also looking after "Dos and Don'ts/Best Practices/Common Pitfalls" to minimize impact of those versions or completely prevent breaking changes. Example: for the activities inputs and outputs if we manage them with complex entities instead of value types, it is easier to prevent breaking changes of signature changes.

cgillum · 2018-05-30T23:00:35Z

Good point on best practices (or "tips and tricks" as I sometimes like to think of them). I agree that we need all of this.

cgillum · 2018-05-30T23:05:36Z

OrchestrationClient

[FunctionName("GetAllStatus")]
public static async Task Run(
    [HttpTrigger(AuthorizationLevel.Anonymous, "get", "post")]HttpRequestMessage req,
    [OrchestrationClient] DurableOrchestrationClient client,
    TraceWriter log)
{
    var statuses = await client.GetStatusAsync(); // You can pass CancellationToken as a parameter.
    // do something based on the retrun statuses
}

REST API

For Functions 1.0, the request format is as follows:

GET /admin/extensions/DurableTaskExtension/instances/?taskHub={taskHub}&connection={connection}&code={systemKey}

The Functions 2.0 format has all the same parameter but has a slightly different URL prefix:

GET /runtime/webhooks/DurableTaskExtension/instances/?taskHub={taskHub}&connection={connection}&code={systemKey}&showHistory={showHistory}&showHistoryOutput={showHistoryOutput}

SimonLuckenuik · 2018-06-11T15:18:34Z

Thank you for the input @TsuyoshiUshio.

Do I understand that the proposed "safe" solution is to make sure that no Activity running in the system before deploying again? This is a bit hard to enforce, let's say that I have an Orchestrator with a timer, at any time that timer could trigger while deploying and break everything? For a high volume scenario, it might be very difficult to make that work...

In the article, there is nothing about status of the Orchestrator while upgrading. Can Orchestrator be "Running" while doing the upgrade (no activity executing, but some activities remaining in the workflow)? If I change the HubName, as suggested, I am assuming that any Orchestrator still "running" will be lost forever (the durable framework will use different storage for it's metadata)?

How long is a "long running process" that you are referring to? Depending on your answer for above statements, if the Orchestrator cannot be "running" than is not well suited for any Orchestrator involving more than a few seconds (max minutes) of execution time, otherwise it means that I would need to wait few days for some orchestrations to complete.

SimonLuckenuik · 2018-06-15T19:11:09Z

Other suggestions:

Maybe adding something about Deployment Slot usage would be interesting. What happen if the staging slot is in concurrency with the prod slot?

Anyway to disable durable function in the slot to prevent having old code being executed in concurrency with newer code in prod slot?

@cgillum considering that this is out of preview, I guess that some customers are using that in Production, what are common DevOps scenario you have heard for Durable Functions to prevent any issue (100% safe that everything is executed and nothing is lost)?

cgillum · 2018-06-15T20:05:58Z

@TsuyoshiUshio can speak to that better than I can since he is working directly with some of these customers in Japan, but right now the main approach being used is the Azure Event Grid integration to track orchestration lifecycle across multiple task hubs (which is described above). We've also spoken to customers that have less aggressive requirements, and for them we're creating a REST API that can enumerate the list of all orchestrations in a task hub as a simpler (though less scaleable) solution.

@TsuyoshiUshio is going to put together comprehensive walkthrough documentation which outlines some of the end-to-end mechanisms for implementing DevOps with Durable Functions, and it will also cover these versioning scenarios.

TsuyoshiUshio · 2018-06-20T14:37:12Z

Sorry for being late reply. Yes. I'll try to do it. So your use case is very welcome.
@SimonLuckenuik In your case, maybe we need new feature to stop accepting new request feature might be needed. Let's keep on discuss on #184 . :) If there is no feature on the Durable, I'd happy to contribute to implement that.

TsuyoshiUshio · 2018-06-20T14:48:31Z

Actually, if you want to upgrade the app, if it is ok if you don't change the orchestrator or activity function interface. however if you change one of these, you need to make sure there is no on-the-fly instances. Since the orchestrator replay according to the storage table. If you change the orchestrator or activity function interfaces, the record of storage table will be unmatch for the new versions.

In short, for the safe deplyment, we need to make sure

There is no on-the-fly instances
If there is on-the-fly instances you can keep it via deployment slot. (however, please check if there is no working instances on the deployment slot)

We can check if there is change for orchestrator / activity function interfaces, however, the pipeline might become complex. For archiving 1. we need to wait finishing the current on-the-fly instances. also, we need to stop accepting the new request. If you want to stop the instance in the middle of the execution, if your functions are idempotent through the orchestration, you might just stop it and replay it. (there is no future for these processes). This is my basic idea.

ConnorMcMahon · 2021-03-16T19:26:57Z

@SimonLuckenuik

This issue has been open for a while, and I just want to make sure that we understand what the ask here is.

It seems like the ask is to add more explicit/concrete instructions on how to implement side-by-side versioning, and to make our versioining docs a bit more clear that this is the most highly recommended scenario?

ConnorMcMahon · 2021-07-07T00:21:11Z

One thing to note here is that we need to convey the pros/cons of each approach (with more concrete exceptions). It's also worth noting that the introduction of entities changes the calculus of having separate taskhubs for side-by-side deployements...

conreaux · 2023-07-18T19:49:03Z

Bumping this topic - Could the RideSharing sample be used as inspiration for versioning discussions? It's a concrete example with potentially frequently-running orchestrations and entities containing application-critical state.
I'm currently at a loss as to how to update an application with live orchestration and entity functions without blocking the client from submitting new orchestrations and then letting in-flight orchestrations run to completion, while maintaining a single representation of entities (i.e., not a side-by-side deployment).

cgillum added the documentation label May 30, 2018

SimonLuckenuik mentioned this issue Jun 20, 2018

DevOps guide for versioning #184

Closed

ConnorMcMahon changed the title ~~Enhanced versioning documentation required~~ Improve side-by-side versioning documentation Mar 16, 2021

ConnorMcMahon mentioned this issue Jul 7, 2021

Entities running new and old code after release #1873

Closed

ConnorMcMahon added this to the High Priority milestone Jul 7, 2021

ConnorMcMahon changed the title ~~Improve side-by-side versioning documentation~~ Improve versioning documentation Jul 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve versioning documentation #320

Improve versioning documentation #320

SimonLuckenuik commented May 24, 2018

SimonLuckenuik commented May 24, 2018

cgillum commented May 28, 2018

marcduiker commented May 28, 2018

SimonLuckenuik commented May 28, 2018

cgillum commented May 30, 2018

cgillum commented May 30, 2018

TsuyoshiUshio commented Jun 8, 2018 •

edited

Loading

SimonLuckenuik commented Jun 11, 2018

SimonLuckenuik commented Jun 15, 2018 •

edited

Loading

cgillum commented Jun 15, 2018

TsuyoshiUshio commented Jun 20, 2018

TsuyoshiUshio commented Jun 20, 2018

ConnorMcMahon commented Mar 16, 2021

ConnorMcMahon commented Jul 7, 2021

conreaux commented Jul 18, 2023

Improve versioning documentation #320

Improve versioning documentation #320

Comments

SimonLuckenuik commented May 24, 2018

SimonLuckenuik commented May 24, 2018

cgillum commented May 28, 2018

marcduiker commented May 28, 2018

SimonLuckenuik commented May 28, 2018

cgillum commented May 30, 2018

cgillum commented May 30, 2018

TsuyoshiUshio commented Jun 8, 2018 • edited Loading

OrchestrationClient

REST API

SimonLuckenuik commented Jun 11, 2018

SimonLuckenuik commented Jun 15, 2018 • edited Loading

cgillum commented Jun 15, 2018

TsuyoshiUshio commented Jun 20, 2018

TsuyoshiUshio commented Jun 20, 2018

ConnorMcMahon commented Mar 16, 2021

ConnorMcMahon commented Jul 7, 2021

conreaux commented Jul 18, 2023

TsuyoshiUshio commented Jun 8, 2018 •

edited

Loading

SimonLuckenuik commented Jun 15, 2018 •

edited

Loading