-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for atomic StartNewAsync that would execute only if instance is not already running (for singleton orchestrator support) #367
Comments
Like this feature idea. Out of curiosity on the scenario motivating it? I know for Logic Apps before we implemented that to have singleton workflows was often around timer triggers (set recurrence of every 1 minute, but one instance may take 1 minute and 5 seconds to complete). So instead of having this cascade of overlapping instances, singleton provides a way so only 1 job is running at a time. Not sure if you had other triggers in mind too. That said I do think this makes sense and imagine it would also require creating some type of locking so two instances couldn't be created at the same time. |
I've received requests from this from a variety of sources, internal and external, and it's an adoption blocker for some. We should definitely implement this. (Singleton support in Azure Functions would be awesome too because it makes this less of an issue, but I won't block on it). @SimonLuckenuik the one caveat is that there are several people who expect to be able to "recreate" their named instances after they have already started. The most common case is when a singleton instance has failed or was terminated. For your use case, would it be acceptable to fail instance creation only if it is in a running state? Here is what I'm currently thinking:
Would that behavior work? |
@jeffhollan, my scenario is the following:
|
@cgillum, what you are suggesting should be sufficient for my need. Some comments below. There are currently 2 overloads for StartNewAsync, one where the instanceId is generated, another where it is provided by the caller: I am assuming that the behavior you are describing is only applicable for B) :
Concerning the caveat, personnaly, I don't like the current behavior of B) where you start something that will in fact, in some cases, stop something and restart it. I would rather have the user to explicitly terminate the workflow before being able to restart it. |
I just want to update this issue so to clarify what happens today and what I think we need to do for the final implementation:
|
FYI, Singleton attribute is working in Azure Functions, with specific consideration around billing: Azure/azure-functions-host#912 (comment) We use that as a workaround where we start the orchestrator for now, we accept being double billed for it. |
Good day, I'm having issues with Durable function being stuck in pending status on Azure ( I do t get this issue when running locally). My function is also following the singleton pattern. Any status update on this issue? Anything I can do to 'cleanup' my durable function state without blowing everything up? |
We finally got around to fixing the race conditions associated with the check/start pattern in Azure/durabletask#528. This includes handling the case where multiple instances of the same app race with each other to create an instance. There won't be any new APIs, but issues such as getting stuck in the Pending state or having duplicate executions should no longer be observed starting in the v2.5.0 extension release. Closing this issue as fixed. |
Sorry to resurrect this, but I'm trying to figure out how these interactions work. Our use case is a "windowed aggregator". Given an identifier (say, an employee ID), when a request is received, the orchestration starter should:
Internally, the orchestrator will do two things on startup:
When it receives an event, it processes the event and checks whether it has received all events it expects to have received within the hour. (This is defined externally in an Azure Table.) If it has, 2 completes and the orchestration exits successfully. If it has not, it waits for more events, or until 1 finishes. Since the orchestrator is internally queueing events, we don't have to worry about race conditions internally. However, I'm seeing some results that suggest to me that the HTTP starter endpoint is racing. It appears that, if two requests are received simultaneously, they will both start the orchestrator (one overwriting the other). If one of them has already made it to the "send an event" line when the other reaches "start the orchestrator", the orchestrator is replaced, swallowing the event and resulting in the eventual failure of the orchestrator. It seems like a function similar to the one proposed above would still be extremely useful for a case like this. Instead of: var existingInstance = await starter.GetStatusAsync(subId);
if (existingInstance == null
|| existingInstance.RuntimeStatus == OrchestrationRuntimeStatus.Completed
|| existingInstance.RuntimeStatus == OrchestrationRuntimeStatus.Failed
|| existingInstance.RuntimeStatus == OrchestrationRuntimeStatus.Terminated)
{
await starter.StartNewAsync("subscription-orchestration", subId, subId);
}
await starter.RaiseEventAsync(subId, "new-dependency", newDependency); We could have: await starter.StartNewButNotReplaceAsync(id); // idempotent during an orchestrator's lifespan
await starter.RaiseEventAsync(id, "name", input); The problem to me, it seems, is that if the stars align just so, it's possible to overwrite an existing instance after it has already received events, with no possible way to prevent it. Consider a super-contrived example: // Part 1
var existingInstance = await starter.GetStatusAsync(subId);
await Task.Delay(10000);
// check status, start and send event // Part 2
var existingInstance = await starter.GetStatusAsync(subId);
// check status, start and send event It seems obvious to me that with the delay there (which could be any sort of blocking network operation), Example 2 would simply replace the orchestrator which had already processed the event from Part 2. Am I wrong about this? Is there some other behavior I'm experiencing? |
Could we add an atomic "StartNewAsyncIfNotRunningOrCompletedOrInError" ( name to rework ;-) )? If the instance already exists, keep it running, if not ignore the start request and notify that it was already running.
Currently, we have to check the status, and if not running than start it, which is not concurrency friendly at all (could easily have 2 requests starting the workflow).
I really want something like that for singleton orchestrator scenarios (example: a workflow singleton is started for each user identifier).
Other way would be to have support for Singleton (Azure/azure-functions-host#912) with the consumption plan so that I can safety check the status and call StartNewAsync om the same concurrency safe scope.
The text was updated successfully, but these errors were encountered: