Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebJob host shut down unexpectedly resulting in ThreadAbortException in functions #1296

Closed
oshevnin opened this issue Mar 13, 2017 · 2 comments

Comments

@oshevnin
Copy link

Our queue triggered C# functions periodically fail with ThreadAbortedException which basically means something wrong hapenned with WebJobs host itself and it is restarting.

Further research of detailed text logs revealed the issue with function logging the execution time results. For some reason (!) function start time bacame greater than the function finish time. That caused an unexpected error in the host and resulted in host restart and termination of all currently running function instances.
Here is the detailed log from Kudu:

2017-03-11T21:54:15.445 Function started (Id=c5f11d50-7710-4578-bac9-b7c91bc938bf)
--CUT--
2017-03-11T21:54:15.928 Executed 'Functions.FunctionName' (Succeeded, Id=0185ef53-1e74-4898-aa97-19718653afa5)
2017-03-11T21:54:15.957 A ScriptHost error has occurred
2017-03-11T21:54:15.957 System.InvalidOperationException: ######## 145051648
2017-03-11T21:54:15.957 ######## 145051648
2017-03-11T21:54:15.957 Time: 00:00:00.8806987 Items
2017-03-11T21:54:15.976 Function completed (Success, Id=9f5c5bbe-8a4d-4cb4-881a-b884ad034f48)
2017-03-11T21:54:15.976 Executed 'Functions.FunctionName' (Succeeded, Id=9f5c5bbe-8a4d-4cb4-881a-b884ad034f48)
2017-03-11T21:54:11.906 A ScriptHost error has occurred
2017-03-11T21:54:11.906 System.InvalidOperationException: ######## 145055744
2017-03-11T21:54:11.938 ######## 145055744
2017-03-11T21:54:11.938 Time: 00:00:00.6590790 Items
2017-03-11T21:54:11.954 Function completed (Success, Id=c5f11d50-7710-4578-bac9-b7c91bc938bf)
2017-03-11T21:54:11.999 An unhandled exception has occurred. Host is shutting down.
2017-03-11T21:54:11.999 System.InvalidOperationException: End Time must be greater than start time
at Microsoft.Azure.WebJobs.Logging.FunctionInstanceLogItem.Validate()
at Microsoft.Azure.WebJobs.Logging.LogWriter.AddAsync(FunctionInstanceLogItem item, CancellationToken cancellationToken)
at Microsoft.Azure.WebJobs.Script.WebHost.Diagnostics.FastLogger.d__2.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd(Task task)
at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.d__10.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Azure.WebJobs.Host.Executors.TriggeredFunctionExecutor`1.d__6.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Azure.WebJobs.Host.Queues.Listeners.QueueTriggerExecutor.d__2.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Azure.WebJobs.Host.Queues.Listeners.QueueListener.d__27.MoveNext()
2017-03-11T21:54:12.049 A ScriptHost error has occurred
2017-03-11T21:54:12.049 System.InvalidOperationException: ######## 145125376

As you can see log timing jumps back from 21:54:15.976 to 2017-03-11T21:54:11.906 which causes function c5f11d50-7710-4578-bac9-b7c91bc938bf start time to be greater than finish time and critical exception thrown at:
https://github.com/Azure/azure-webjobs-sdk/blob/314ca4dad27ffd58b530fd05ef6040bf9fc58446/src/Microsoft.Azure.WebJobs.Logging/FunctionInstanceLogItem.cs#L159

Please investigate the root cause of the time shift (bug in function logging? time service updating host clocks?) and fix the issue with host restarting in this scenario.

@christopheranderson
Copy link
Contributor

We need to harden the error handling to not bring down the host. As far as investigating the time shift, that may be hard, but should give it a shot.

@christopheranderson christopheranderson added this to the March 2017 milestone Mar 13, 2017
@paulbatum paulbatum modified the milestones: March 2017, April 2017 Apr 3, 2017
@paulbatum paulbatum modified the milestones: April 2017, May 2017 May 1, 2017
@MikeStall
Copy link
Contributor

It looks like there's a clock adjustment on the machine:

2017-03-11T21:54:15.445 Function started (Id=c5f11d50-7710-4578-bac9-b7c91bc938bf)
...
2017-03-11T21:54:11.954 Function completed (Success, Id=c5f11d50-7710-4578-bac9-b7c91bc938bf)

The start is coming 4 seconds 'after' the complete. The same guid means the same function instance and same process.

  1. short term, the logging check should be resilient to this case.
  2. longer term, we should use stopwatch (and not timestamp diffs) to calculate duration.

MikeStall added a commit to Azure/azure-webjobs-sdk that referenced this issue May 17, 2017
alrod pushed a commit to alrod/azure-functions-servicebus-extension that referenced this issue Jul 3, 2019
alrod pushed a commit to alrod/azure-functions-servicebus-extension that referenced this issue Jul 5, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Jan 2, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants