Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of proc still limited to < 7d for timers #1910

Open
drub0y opened this issue Aug 6, 2021 · 11 comments
Open

Out of proc still limited to < 7d for timers #1910

drub0y opened this issue Aug 6, 2021 · 11 comments
Labels
Enhancement Feature requests.

Comments

@drub0y
Copy link

drub0y commented Aug 6, 2021

We're working on a TypeScript (hence node) based durable orchestration where we expected to be able to call createTimer with dates greater than 7d out. We've been doing this ever since #1390 was completed in C# so we just expected it to work, but we run into the following stack trace:

Function 'myFunction (Orchestrator)' failed with an error. Reason: System.ArgumentException: The Azure Storage provider supports a maximum of 6 days for time-based delays (Parameter 'fireAt')
[2021-08-06T22:15:51.743Z]    at Microsoft.Azure.WebJobs.Extensions.DurableTask.DurableOrchestrationContext.ThrowIfInvalidTimerLengthForStorageProvider(DateTime fireAt) in D:\a\r1\a\azure-functions-durable-extension\src\WebJobs.Extensions.DurableTask\ContextImplementations\DurableOrchestrationContext.cs:line 397
[2021-08-06T22:15:52.069Z]    at Microsoft.Azure.WebJobs.Extensions.DurableTask.OutOfProcOrchestrationShim.InvokeAPIFromAction(AsyncAction action) in D:\a\r1\a\azure-functions-durable-extension\src\WebJobs.Extensions.DurableTask\Listener\OutOfProcOrchestrationShim.cs:line 125
[2021-08-06T22:15:52.166Z]    at Microsoft.Azure.WebJobs.Extensions.DurableTask.OutOfProcOrchestrationShim.ProcessAsyncActionsV1(AsyncAction[][] actions) in D:\a\r1\a\azure-functions-durable-extension\src\WebJobs.Extensions.DurableTask\Listener\OutOfProcOrchestrationShim.cs:line 249
[2021-08-06T22:15:52.175Z]    at Microsoft.Azure.WebJobs.Extensions.DurableTask.OutOfProcOrchestrationShim.ReplayOOProcOrchestration(AsyncAction[][] actions, SchemaVersion schema) in D:\a\r1\a\azure-functions-durable-extension\src\WebJobs.Extensions.DurableTask\Listener\OutOfProcOrchestrationShim.cs:line 225
[2021-08-06T22:15:52.200Z]    at Microsoft.Azure.WebJobs.Extensions.DurableTask.OutOfProcOrchestrationShim.ScheduleDurableTaskEvents(OrchestrationInvocationResult result) in D:\a\r1\a\azure-functions-durable-extension\src\WebJobs.Extensions.DurableTask\Listener\OutOfProcOrchestrationShim.cs:line 84
[2021-08-06T22:15:52.211Z]    at Microsoft.Azure.WebJobs.Extensions.DurableTask.OutOfProcOrchestrationShim.HandleDurableTaskReplay(OrchestrationInvocationResult executionJson) in D:\a\r1\a\azure-functions-durable-extension\src\WebJobs.Extensions.DurableTask\Listener\OutOfProcOrchestrationShim.cs:line 64
[2021-08-06T22:15:52.221Z]    at Microsoft.Azure.WebJobs.Extensions.DurableTask.TaskOrchestrationShim.TraceAndReplay(Object result, Exception ex) in D:\a\r1\a\azure-functions-durable-extension\src\WebJobs.Extensions.DurableTask\Listener\TaskOrchestrationShim.cs:line 233
[2021-08-06T22:15:52.229Z]    at Microsoft.Azure.WebJobs.Extensions.DurableTask.TaskOrchestrationShim.InvokeUserCodeAndHandleResults(RegisteredFunctionInfo orchestratorInfo, OrchestrationContext innerContext) in D:\a\r1\a\azure-functions-durable-extension\src\WebJobs.Extensions.DurableTask\Listener\TaskOrchestrationShim.cs:line 155. IsReplay: False. State: Failed. HubName: MyTaskHub. AppName: . SlotName: . ExtensionVersion: 2.5.0. SequenceNumber: 13. TaskEventId: -1

This was, as you can imagine, shocking, but then I tracked it down to here:

if (ctx != null)
{
ctx.ThrowIfInvalidTimerLengthForStorageProvider(action.FireAt);
}

Which calls here:

// We now have built in long-timer support for C#, but in some scenarios, such as out-of-proc,
// we may still need to enforce this limitations until the solution works there as well.
internal void ThrowIfInvalidTimerLengthForStorageProvider(DateTime fireAt)
{
this.ThrowIfInvalidAccess();
if (!this.durabilityProvider.ValidateDelayTime(fireAt.Subtract(this.InnerContext.CurrentUtcDateTime), out string errorMessage))
{
throw new ArgumentException(errorMessage, nameof(fireAt));
}
}

The comment is pretty clear about this being here for possible "out of proc" issues, but... is there really an issue or was it just forgotten about? I'm actually amazed it's nearly a year later since #1390 and nobody else has run into this or at least not reported it.

@ghost ghost added the Needs: Triage 🔍 label Aug 6, 2021
@cgillum
Copy link
Member

cgillum commented Aug 6, 2021

Pinging @davidmrdavid and @bachuv for this one. It's been a while, so I don't quite remember what the limitations for OOProc were at the time, or whether they still apply.

@davidmrdavid
Copy link
Contributor

davidmrdavid commented Aug 9, 2021

Hi @drub0y,

Thanks for reaching out! Indeed, we've known about this limitation for some time now. If I recall correctly, long-running timers are managed by the durable-extension and durable-task framework by splitting them into a sequence of smaller timers. I believe this trick made it particularly hard to deal with long-running timers in our current out-of-process orchestration replay algorithm, as it required careful reasoning of timer-fired events in our history array. As a result, I remember we decided to prioritize revamping our replay algorithm first, which should facilitate incorporating this feature, as well as provide some long-needed correctness and performance improvements to the SDK.

For the past few months, we've been working on doing just that, and we're nearing the release of a revamped replay algorithm for our out-of-proc SDKs. After those are out, we should be unblocked to resume work on this feature. In the meantime, I believe that manually sequencing timers should allow for a long-running timer-like experience.

@drub0y
Copy link
Author

drub0y commented Aug 9, 2021

Well, I'm glad to hear it's being tracked, but it's an unfortunate discovery for us in the meantime as we'll have to go back to writing our own timeout loops into our orchestrations now. 😞

@ConnorMcMahon ConnorMcMahon added Enhancement Feature requests. and removed Needs: Triage 🔍 labels Aug 16, 2021
@ConnorMcMahon
Copy link
Contributor

As a note, we have updated our docs in the meantime to reflect that long-running timers only work on C# currently.

@wesleysmitthe
Copy link

Any updating about the long-running using node or python?

@davidmrdavid
Copy link
Contributor

Hi @wesleysmitthe. I believe NodeJS already supports this (@hossam-nasr can you please confirm that the current released Extension supports this).

The work in Python has not been prioritized yet.

@danielniccoli
Copy link

danielniccoli commented May 15, 2024

I'm running into the same issue (Durable Functions / Python). A maximum of 6 days for WaitForExternalEvent is not enough for our use-cases. We need up to 30 days. In #1538 it was said that it's just a bug and longer time-outs are possible. This issue suggests that it is not a bug. I'm confused, but also unable to continue my work due to this limitation.

@davidmrdavid What is the status of this?

@davidmrdavid
Copy link
Contributor

davidmrdavid commented May 23, 2024

Hey @danielniccoli: this is definitely implemented at the Durable Extension level (this repo) but it's possible there's a gap in Python. Can you tell me how you're pairing timers and external events? As far as I know, the external API in Python does not support a timer parameter (https://github.com/Azure/azure-functions-durable-python/blob/3f0e168b9d5b0b3ea06de878586aa6436dd6a3a1/azure/durable_functions/models/DurableOrchestrationContext.py#L572) so it sounds to me like you're scheduling an explicit timer in addition to the external event? A small code snippet would help.

@davidmrdavid
Copy link
Contributor

In the meantime, a manual workaround is always to split your timer into several smaller but sequential ones

@danielniccoli
Copy link

danielniccoli commented May 23, 2024

@davidmrdavid

As far as I know, the external API in Python does not support a timer parameter

It does, see Timers in Durable Functions (Azure Functions).

this is definitely implemented at the Durable Extension level (this repo)

I don't know enough to make an assessment, but doesn't the error indicate that it's a problem with the underlying code?

System.ArgumentException: The Azure Storage provider supports a maximum of 6 days for time-based delays (Parameter 'fireAt') does not look like a gap in Python, but rather in some C# code.

Below the code snippet. We're using external events for an approval flow. At a certain point, the orchestration waits for approval of two people, the owner and the deputy. If they do not approve withing 21 days, the request is considered "declined" and the orchestration will exit.

...

due_date = context.current_utc_datetime + timedelta(days=21)
context.set_custom_status(
    {
        "status": "Waiting for approval",
        "details": {"dueDate": due_date.isoformat()},
    }
)
due_date_task = context.create_timer(due_date)
owner_approval_event_task = context.wait_for_external_event("OwnerApprovalEvent")
deputy_approval_event_task = context.wait_for_external_event("DeputyApprovalEvent")
owner_approval_winning_task = yield context.task_any(
    [owner_approval_event_task, due_date_task]
)
deputy_approval_winning_task = yield context.task_any(
    [deputy_approval_event_task, due_date_task]
)

if (
    owner_approval_winning_task == due_date_task
    or deputy_approval_winning_task == due_date_task
):
    # TODO: Not all approvals have been given before deadline
    # TODO: Complete the rochestration with status: unapproved
    context.set_custom_status(
        {
            "status": "Declined",
            "details": "Approvals were not given",
        }
    )
    return

...

Splitting it into several sequential timers could be done, but it's not something I'd like to do. I consider workarounds to be resolvable withing a certain time period. Since this ticket exists for almost 3 years, it feels more like a definitive solution 😬

@davidmrdavid
Copy link
Contributor

Hi @danielniccoli,

I took a look and I see the issue now - the Python SDK is not yet onboarded to the protocol that allows for automatic long timer creation. The basic idea for how to implement that is in this PR (Azure/azure-functions-durable-js#340), which is moderately short, but it is not something we have the bandwidth to immediately prioritize relative to other work. In the meantime, I would recommend to continue using the workaround. I apologize this isn't implemented yet, I realize it is frustrating on your end. My next ~3 weeks do seem rather busy, but if you ping me again after those 3 weeks, I may have sufficient bandwidth to re-prioritize this. Alternatively, we do routinely accept open source contributions in case the JS-based PR above is sufficient guidance (the two codebase are purposely almost identical).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Feature requests.
Projects
None yet
Development

No branches or pull requests

6 participants