Add support for atomic StartNewAsync that would execute only if instance is not already running (for singleton orchestrator support) #367

SimonLuckenuik · 2018-06-28T14:13:08Z

Could we add an atomic "StartNewAsyncIfNotRunningOrCompletedOrInError" ( name to rework ;-) )? If the instance already exists, keep it running, if not ignore the start request and notify that it was already running.

Currently, we have to check the status, and if not running than start it, which is not concurrency friendly at all (could easily have 2 requests starting the workflow).

I really want something like that for singleton orchestrator scenarios (example: a workflow singleton is started for each user identifier).

Other way would be to have support for Singleton (Azure/azure-functions-host#912) with the consumption plan so that I can safety check the status and call StartNewAsync om the same concurrency safe scope.

jeffhollan · 2018-06-28T14:56:35Z

Like this feature idea. Out of curiosity on the scenario motivating it? I know for Logic Apps before we implemented that to have singleton workflows was often around timer triggers (set recurrence of every 1 minute, but one instance may take 1 minute and 5 seconds to complete). So instead of having this cascade of overlapping instances, singleton provides a way so only 1 job is running at a time. Not sure if you had other triggers in mind too.

That said I do think this makes sense and imagine it would also require creating some type of locking so two instances couldn't be created at the same time.

cgillum · 2018-06-28T16:18:09Z

I've received requests from this from a variety of sources, internal and external, and it's an adoption blocker for some. We should definitely implement this. (Singleton support in Azure Functions would be awesome too because it makes this less of an issue, but I won't block on it).

@SimonLuckenuik the one caveat is that there are several people who expect to be able to "recreate" their named instances after they have already started. The most common case is when a singleton instance has failed or was terminated. For your use case, would it be acceptable to fail instance creation only if it is in a running state?

Here is what I'm currently thinking:

StartNewAsync should throw an exception if a running instance of a specified ID already exists.
StartNewAsync should recreate a terminated or failed instance of a specified ID if it already exists.
StartNewAsync should be thread safe.

Would that behavior work?

SimonLuckenuik · 2018-06-29T02:28:04Z

@jeffhollan, my scenario is the following:

Assume a workflow similar to the Human interaction scenario
The workflow is not created upfront, only upon the first interaction with a user or a notification receive through Event Grid.
Both the user interaction and the notification are happening at the same time --> need to make sure that the workflow is started once (currently, if the 2 triggers call StartNew one after they other, the second call with cancel the first one and force a restart of the workflow).

SimonLuckenuik · 2018-06-29T02:40:35Z

@cgillum, what you are suggesting should be sufficient for my need. Some comments below.

There are currently 2 overloads for StartNewAsync, one where the instanceId is generated, another where it is provided by the caller:
A) Task<string> StartNewAsync(string orchestratorFunctionName, object input)--> returns the generated instanceId
B) Task<string> StartNewAsync(string orchestratorFunctionName, string instanceId, object input) --> returns the provided instanceId

I am assuming that the behavior you are describing is only applicable for B) :

StartNewAsync should throw an exception if a running instance of a specified ID already exists
1.1 consider, instead of exception, changing the return value to a boolean or an enum or the orchestrator status to reflect if a new instance was started or not. Maybe with small naming change to TryStartNewAsync (a la Enum.TryParse), making it if statement friendly
StartNewAsync should recreate a terminated or failed or completed instance of a specified ID if it already exists.
2.1 consider allowing the user providing which status are acceptable, maybe completed is not applicable for some usecases
StartNewAsync should be ~~thread safe~~ Function App safe (concurrency when Function App is scaled up to multiple instances).

Concerning the caveat, personnaly, I don't like the current behavior of B) where you start something that will in fact, in some cases, stop something and restart it. I would rather have the user to explicitly terminate the workflow before being able to restart it.

cgillum · 2018-10-11T22:04:21Z

I just want to update this issue so to clarify what happens today and what I think we need to do for the final implementation:

StartNewAsync will always create a new instance, whether the ID is in use or not. If the ID is in use, that instance will be silently overwritten. We should change this behavior (though it is a breaking change). In order to make it thread-safe, we will need to also include a change in DurableTask.AzureStorage.
StartNewAsync will recreated terminated, failed, or completed instances of a specified ID if it already exists. I think we should preserve this behavior moving forward.
Orchestrator-specific messages like StartNewAsync are already affinitized to a particular worker instance, so even if things are scaled out, only a single worker would process duplicate messages. The exception to this is if there is a lease failover, it's possible that these messages may arrive on different VMs. Any logic we do for dup-detection/overwriting will need to use the appropriate synchronization mechanisms (e.g. eTags) to ensure that we get the correct behavior in all cases.

SimonLuckenuik · 2019-04-18T09:54:56Z

FYI, Singleton attribute is working in Azure Functions, with specific consideration around billing: Azure/azure-functions-host#912 (comment)

We use that as a workaround where we start the orchestrator for now, we accept being double billed for it.

calloncampbell · 2020-06-30T12:04:46Z

Good day, I'm having issues with Durable function being stuck in pending status on Azure ( I do t get this issue when running locally). My function is also following the singleton pattern. Any status update on this issue?

Anything I can do to 'cleanup' my durable function state without blowing everything up?

cgillum · 2021-04-01T16:47:35Z

We finally got around to fixing the race conditions associated with the check/start pattern in Azure/durabletask#528. This includes handling the case where multiple instances of the same app race with each other to create an instance. There won't be any new APIs, but issues such as getting stuck in the Pending state or having duplicate executions should no longer be observed starting in the v2.5.0 extension release. Closing this issue as fixed.

elliott-with-the-longest-name-on-github · 2022-01-11T19:23:52Z

@cgillum

Sorry to resurrect this, but I'm trying to figure out how these interactions work.

Our use case is a "windowed aggregator". Given an identifier (say, an employee ID), when a request is received, the orchestration starter should:

Start an orchestrator with an ID matching the incoming ID if one does not already exist
Send an event with the request data to the running orchestrator

Internally, the orchestrator will do two things on startup:

Start a timer for a preset interval (in this specific case, one hour)
Waits on incoming events
Races 1 and 2 (so if an hour passes, it fails, and if 2 completes, it succeeds)

When it receives an event, it processes the event and checks whether it has received all events it expects to have received within the hour. (This is defined externally in an Azure Table.) If it has, 2 completes and the orchestration exits successfully. If it has not, it waits for more events, or until 1 finishes.

Since the orchestrator is internally queueing events, we don't have to worry about race conditions internally. However, I'm seeing some results that suggest to me that the HTTP starter endpoint is racing.

It appears that, if two requests are received simultaneously, they will both start the orchestrator (one overwriting the other). If one of them has already made it to the "send an event" line when the other reaches "start the orchestrator", the orchestrator is replaced, swallowing the event and resulting in the eventual failure of the orchestrator.

It seems like a function similar to the one proposed above would still be extremely useful for a case like this. Instead of:

               var existingInstance = await starter.GetStatusAsync(subId);

                if (existingInstance == null
                || existingInstance.RuntimeStatus == OrchestrationRuntimeStatus.Completed
                || existingInstance.RuntimeStatus == OrchestrationRuntimeStatus.Failed
                || existingInstance.RuntimeStatus == OrchestrationRuntimeStatus.Terminated)
                {
                    await starter.StartNewAsync("subscription-orchestration", subId, subId);
                }
                await starter.RaiseEventAsync(subId, "new-dependency", newDependency);

We could have:

               await starter.StartNewButNotReplaceAsync(id); // idempotent during an orchestrator's lifespan
               await starter.RaiseEventAsync(id, "name", input);

The problem to me, it seems, is that if the stars align just so, it's possible to overwrite an existing instance after it has already received events, with no possible way to prevent it. Consider a super-contrived example:

                // Part 1
                var existingInstance = await starter.GetStatusAsync(subId);
                await Task.Delay(10000);
                // check status, start and send event

                // Part 2
                var existingInstance = await starter.GetStatusAsync(subId);
                // check status, start and send event

It seems obvious to me that with the delay there (which could be any sort of blocking network operation), Example 2 would simply replace the orchestrator which had already processed the event from Part 2.

Am I wrong about this? Is there some other behavior I'm experiencing?

cgillum added Enhancement Feature requests. help wanted needs-discussion dtfx labels Jun 28, 2018

cgillum added this to the Functions V2 GA milestone Jun 28, 2018

cgillum added breaking-change and removed help wanted needs-discussion labels Oct 11, 2018

tommasobertoni mentioned this issue Feb 19, 2019

Singleton orchestrator starts multiple times #612

Closed

amdeel self-assigned this Oct 9, 2019

ConnorMcMahon mentioned this issue Oct 15, 2019

Activity Function getting called again even after the Orchestrator has completed. #839

Closed

amdeel mentioned this issue Oct 16, 2019

Added Options to Configure Behavior When Starting a New Orchestrator With an Existing Instance Id Azure/durabletask#325

Closed

This was referenced Oct 24, 2019

Added Option to Configure Behavior When Starting a New Orchestrator With an Existing Instance Id #987

Merged

Configuration to throw an exception when an orchestration is in an existing dedupe state Azure/durabletask#334

Merged

amdeel mentioned this issue Nov 1, 2019

Throws an exception on invalid dedupe status when starting a new orchestration #1014

Merged

cgillum modified the milestones: v2.0.0 Release, v2.1.0 Release Nov 2, 2019

cgillum removed the breaking-change label Nov 6, 2019

cgillum removed this from the v2.1.0 Release milestone Dec 11, 2019

amdeel mentioned this issue Feb 25, 2020

Singleton Orchestration completes, but status stuck in pending #1180

Closed

amdeel mentioned this issue Mar 30, 2020

Orchestration stop working (stays in Running or Pending state) #1265

Closed

ConnorMcMahon mentioned this issue Mar 16, 2021

Support for Recycling Instance IDs #40

Closed

armadak mentioned this issue Apr 1, 2021

Explicit support for singleton jobs #1638

Closed

cgillum mentioned this issue Apr 1, 2021

DT.AzureStorage: Runtime de-dupe support for orchestration start events Azure/durabletask#528

Merged

cgillum added this to the v2.5.0 milestone Apr 1, 2021

cgillum closed this as completed Apr 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for atomic StartNewAsync that would execute only if instance is not already running (for singleton orchestrator support) #367

Add support for atomic StartNewAsync that would execute only if instance is not already running (for singleton orchestrator support) #367

SimonLuckenuik commented Jun 28, 2018

jeffhollan commented Jun 28, 2018

cgillum commented Jun 28, 2018

SimonLuckenuik commented Jun 29, 2018

SimonLuckenuik commented Jun 29, 2018

cgillum commented Oct 11, 2018

SimonLuckenuik commented Apr 18, 2019

calloncampbell commented Jun 30, 2020

cgillum commented Apr 1, 2021

elliott-with-the-longest-name-on-github commented Jan 11, 2022 •

edited

Loading

Add support for atomic StartNewAsync that would execute only if instance is not already running (for singleton orchestrator support) #367

Add support for atomic StartNewAsync that would execute only if instance is not already running (for singleton orchestrator support) #367

Comments

SimonLuckenuik commented Jun 28, 2018

jeffhollan commented Jun 28, 2018

cgillum commented Jun 28, 2018

SimonLuckenuik commented Jun 29, 2018

SimonLuckenuik commented Jun 29, 2018

cgillum commented Oct 11, 2018

SimonLuckenuik commented Apr 18, 2019

calloncampbell commented Jun 30, 2020

cgillum commented Apr 1, 2021

elliott-with-the-longest-name-on-github commented Jan 11, 2022 • edited Loading

elliott-with-the-longest-name-on-github commented Jan 11, 2022 •

edited

Loading