-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Request timeout for async workers #1730
base: master
Are you sure you want to change the base?
[WIP] Request timeout for async workers #1730
Conversation
Thank you so much for picking this up! You're going to make a lot of people very happy :).
|
@tilgovi great point. it would be ideal if we had the native timeout mechanism for each worker type. but we dont really have the expertise in all the worker types other than gevent, so implementing this feature for all workers turns this into the sort of thing that probably wont get tackled anytime soon. how would you feel if we implemented this just for gevent and then let other contributors add the feature for their favourite worker type? |
Sure! I can jump in and help with eventlet and maybe others. |
@tilgovi just pushed the a new implementation overwriting i tried to implement the eventlet part but for some reason it's not working properly.
I read in the eventlet docs that you cannot time out CPU-only operations with this class, so that's maybe where the problem is (Note: my test is a flask endpoint that uses |
I think the same is true for gevent as eventlet. If you do not yield to the event loop the timeouts don't complete. I wonder how well the signals approach would work. It's possible that code could even mask the signals. Or more likely, I think CPython delivers signals to Python code only at certain moments; this timeout would not work if code is in a long running call in a C extension module. Maybe we should take a step back and figure out what we really want from this feature. When I look at a server like NGINX I note that there is no request timeout. There are only timeouts around I/O. There are timeouts for receiving and parsing the request (headers, body) and for reading and writing data to the socket. For the proxy module, there are timeouts for sending and receiving data from the upstream server. There is no general request timeout. I am not completely convinced that we should implement such a feature. Do you have a specific use case in mind for yourself that can help us reason through this? |
That's true. (Well, they complete, but they don't interrupt.)
That's also true. gevent goes out of its way to try to ensure that signals get raised in a timely manner. Other C code may not be so considerate. (And even in gevent in certain cases it really is just "best effort" for timely signal delivery.) |
@tilgovi I get where you're coming from, and framing the nginx proxy timeouts around IO is pretty bang on. This is exactly what I want as well from gunicorn, where it waits on slow IO. specific use case: We're currently facing gunicorn worker exhaustion issues during heavy traffic. Nginx times out on slow requests, and is ready to accept new requests, but all of the gunicorn workers are still chugging along. We need to somehow tell these workers nobody cares about them anymore. tackling the slow things inside a request is pretty obvious. but I want to address the gunicorn vulnerability here as well. we're pretty open to any approach you guys have in mind! |
I'd just like to add another use-case which is similar to the above, in case more user stories are required to justify this feature. In Django, a very easy mistake is to write an ORM query that will unexpectedly take a long time to complete (say a 10m query, in a pathological case of loading a full DB table with lots of joins). If you're using a sync worker, then gunicorn will time out in This poses two problems:
It's true that if the greenlet is blocked on a C extension, then you can't get the cancel signal through. But there are many cases where the worker is doing work that does yield to the event loop (say hitting other backends as an API gateway). And even if you're using a C extension (like an ORM wrapper), I believe we'd still have the option of adding a I may just be missing an obvious way to fix these issues; if there's another approach for handling this in the sync worker case I'd love to know it. Thanks! |
Thanks, @paultiplady. It's definitely possible for a user to wrap a timeout function around the application as WSGI middleware. However, I'm not sure it can be done in an event loop agnostic way. For that reason, we may want to have support in Gunicorn. I also think there's an argument for just having something to clear up confusion about how |
Bumping this PR. Without this PR, how do you timeout an async worker request? This seems like a large deficiency in gunicorn + gevent. |
@yoobles if you know what event loop you're using you can implement your own timeouts. That is the most robust solution because it lets your application handle cancellation signals and perform any necessary cleanup so that timeouts don't cause clients to simply retry and overwhelm an already slow system. If you read the discussion, I hope you'll agree that we haven't yet come to a consensus about what general purpose timeout would be appropriate to implement, or how it would work. If you have something to contribute to that discussion, it would be very appreciated. |
I implemented a middleware based off the gevent timer code here that times out requests but I believe that overall this should be handled in gunicorn. I don't have enough knowledge about event loops to say what the right approach is but uwsgi seems to support this with Even if the SIGALRM approach doesn't handle every application, I think having having the starting point is superior to letting requests potentially run unbounded. |
> Work in progress
Problem
Currently
--timeout
and--graceful-timeout
act like request timeout on sync workers but doesn't on async workers. Which means that there is currently no option to "kill" a request that takes too long in a async worker.Proposed solution
Add a
--request-timeout
option that defines after how many seconds the request should timeout. Setup a signal alarm in thehandle_request()
method and then have a handler function to handle the timeout.The code works but the timeout handling obviously needs to be cleaner. I was wondering if this kind of implementation could work.
Feel free to share your opinion and ideas for implementations!
Related issue: #1658