Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow a function to handle execution timeout gracefully and prevent process restart #2153

Open
paulbatum opened this issue Nov 22, 2017 · 12 comments

Comments

@paulbatum
Copy link
Member

paulbatum commented Nov 22, 2017

So the consumption plan for functions has a default execution timeout of 5 minutes. Its not great to allow your functions to hit this timeout because when it happens, your entire process ends up getting restarted (because its the only way to force the execution to stop - task.Abort() does not exist). This will be disruptive to other long running functions in the same process.

The challenge is that as a function author that is aware of this, there's not much you can do to address it. In the case of C#, you could update your function to take a CancellationToken and check the state of that token (or pass it into async APIs your function is calling). However even if you do this, today the system will still terminate the process (because it does not check to see if your function actually honored the cancellation request).

So, this work item tracks the idea of making the timeout mechanism smarter. It would do the following:

  1. If the function does not take a cancellation token, no change in behavior (the process will get killed)
  2. If the function takes a cancellation token, signal the token once the timeout has expired and then wait some period of time (say 5 seconds) and then check to see if the function task has completed. If the task has completed then we assume the timeout was handled gracefully and do not kill the process.

In order for this approach to work for multiple languages, we need a way to support the equivalent of cancellation tokens for out of proc languages which is tracked by #2152.

@oshevnin
Copy link

Any progress with this issue?

It's not possible currently to gracefully shutdown the running function instances when something outstanding happened - timeout exceeded, host stopped/restarted. Even handling cancellation token doesn't help as mentioned above.

waiting for 5 second can be too aggressive. Is there a reason not waiting 5/10 minutes (max function duration) after cancellation token signaled, before hard stop of the instance? So all active functions will finish the work gracefully even if their cancellation token handling is not perfect.

Cross-refencing Azure/Azure-Functions#866 since it may be related as well

@fabiocav

@kreaton
Copy link

kreaton commented Aug 6, 2019

When a timeout occurs in a Function App it appears from my testing (C#, V2) that any logging to Application Insights also goes away, ie it is not possible to trace what happened in the function before the timeout. The timeout exception itself also doesn't seem to be logged to Application Insights and thus cannot be monitored.

@ishepherd
Copy link

This is, tbh, hella dumb and renders the CancellationTokens rather useless. If the CancellationToken is signalled, it's already too late to save your process.

Any chance? @jeffhollan @eduardolaureano

Any chance you could be nice to the .Net guys by doing this before #2152 :)

@mciprijanovic
Copy link

I have a case very related to this topic, and I can't find appropriate answer for a long time. I have the function with the EventHubTrigger. Messages from the event hub are pulled in batches. Now, according the documentation those messages are checked out when function ends. This means that if the process stops for any reason(stop host, restart...), that prevents the function to checkout received messages and next time when function starts, it will pick the same unchecked already received messages.
According to all mentioned here, I have no option to gracefully stop the function, which means somehow to tell it to stop, after function ends and checkout occurs, before starting again and pulling new set of messages, and I must handle possible duplicates in my code.
Is this true, or there is a solution for controlled shutdown?

@paulbatum
Copy link
Member Author

When processing any event hubs workload, you need to write your code to allow for duplicates, because Event Hubs does not provide "at most once" guarantees. Even if you were able to handle the shutdown case correctly, there are other cases where your code might need to handle duplicates, for example, if a partition lease is lost.

@gkindov
Copy link

gkindov commented Dec 10, 2020

Hi, not sure if this is the right place to ask but - is there an function app level host shutdown event I can intercept so I can clean app static resources used but the whole app? For example I need to call Serilog.Log.CloseAndFlush(). I can't find anything in the doc, only Startup event where I register the logger. Thanks.

@ishepherd
Copy link

@gkindov I don't think so. I suggest asking the folks in the Azure Functions Discord. https://discord.gg/YEQPcCsY

@derekrprice
Copy link

I have an issue related to @kreaton's. We use an external performance and error monitoring tool that needs to close gracefully in order to log to a remote server. When the function is killed with prejudice, it never gets a chance to log the performance information that it has recorded so we don't have any traces to use to track down what is causing the runs to take so long.

@alonfirestein
Copy link

Any chance that this issue was resolved by anyone?
Is it possible to catch and handle the execution timeout gracefully instead of killing the process or restart?
In my case an unlimited timeout or retries isn't a effective option so any answer would be appreciated.

@duncanthescot
Copy link

There should definitely be an event which fires before the timeout so processes can be shutdown gracefully.

@borislavml
Copy link

Does anybody know something about that mysterious event that fires before the timeout? We really need this in our functions!

@ChristianPardun
Copy link

Has progress been made on this issue? How can timeouts be handled appropriately? Is there an c# event or delegate to use when a timeout occurs? Thank you very much for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests