-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Background
I tested my Azure Durable Function App with a dedicated Durable Task Scheduler instance as backend, and found unexpected errors related to activity results being lost during a load test, see the description and attached reproduction project below. Under normal load the application worked as expected.
Description
- A given activity writes an entity to a Table storage account, a success message is logged after this.
- The following error is logged (IDs redacted):
TaskActivityDispatcher-ID: Unhandled exception with work item WORK_ITEM_ID': Grpc.Core.RpcException: Status(StatusCode="NotFound", Detail="Work item 'WORK_ITEM_ID' not found")
at Microsoft.DurableTask.AzureManagedBackend.AzureManagedOrchestrationService.<>c__DisplayClass63_0.<<CompleteTaskActivityWorkItemAsync>b__0>d.MoveNext() in /_/src/SDK/Microsoft.DurableTask.AzureManagedBackend/AzureManagedOrchestrationService.cs:line 1006
--- End of stack trace from previous location ---
at Microsoft.DurableTask.AzureManagedBackend.AzureManagedOrchestrationService.ExecuteWithRetryAsync(Func`1 action, String operationName, Object request, Func`2 summarizeGrpcRequestFunction) in /_/src/SDK/Microsoft.DurableTask.AzureManagedBackend/AzureManagedOrchestrationService.cs:line 1546
at Microsoft.DurableTask.AzureManagedBackend.AzureManagedOrchestrationService.CompleteTaskActivityWorkItemAsync(TaskActivityWorkItem workItem, TaskMessage responseMessage) in /_/src/SDK/Microsoft.DurableTask.AzureManagedBackend/AzureManagedOrchestrationService.cs:line 1005
at DurableTask.Core.TaskActivityDispatcher.OnProcessWorkItemAsync(TaskActivityWorkItem workItem) in /_/src/DurableTask.Core/TaskActivityDispatcher.cs:line 279
at DurableTask.Core.TaskActivityDispatcher.OnProcessWorkItemAsync(TaskActivityWorkItem workItem) in /_/src/DurableTask.Core/TaskActivityDispatcher.cs:line 302
at DurableTask.Core.WorkItemDispatcher`1.ProcessWorkItemAsync(WorkItemDispatcherContext context, Object workItemObj) in /_/src/DurableTask.Core/WorkItemDispatcher.cs:line 373
Backing off for 1 seconds until 5 successful operations
- The following warning is logged (IDs redacted):
Abandoning activity work item for [ACTIVITY_NAME#4] of orchestration 'ORCHESTRATION_ID' with completion token COMPLETION_TOKEN.
- The orchestration will retry, and call the same activity.
- The activity will fail to write to the Table storage, since the entity already exists (status code 409, error code EntityAlreadyExists)
Context
App Service Plan: P1v3, 3 instances
Function app: Linux, .NET 8 isolated
AzureFunctionsJobHost__extensions__durableTask__maxConcurrentActivityFunctions: 30
AzureFunctionsJobHost__extensions__durableTask__maxConcurrentOrchestratorFunctions: 20
AzureFunctionsJobHost__extensions__durableTask__storageProvider__partitionCount: 4 (note: we don't know if this applies to DTS)
DTS: Dedacted SKU, 1 capacity unit
As mentioned, this behaviour did not occur under normal load, and we have never observed this issue using the Azure Storage backend. See attached a minimal project using which I could reproduce the behaviour with conflicts during the storage account write retries, although I have not yet managed to get the same logs. It's also worth mentioning that I could only reproduce the behaviour when adding durable entities to this project.