You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using the Uvicorn process manager, requests are assigned to Uvicorn workers randomly. This causes issues including unbalanced queue wait times and the max concurrency limit not working as expected (since each process applies max concurrency independently).
Consider switching to Gunicorn + Uvicorn workers. Gunicorn is a more full-featured process manager than Uvicorn's built in one, and may balance requests across processes better.
Blockers for switching to Gunicorn
Currently there are two features of the Uvicorn process manager that are not supported by Gunicorn:
--limit-concurrency is used to respond with 503s when the user specified concurrency limit is reached. Here, Uvicorn currently says "Gunicorn provides a different set of configuration options to Uvicorn, so some options such as --limit-concurrency are not yet supported when running with Gunicorn."
It is not currently possible to configure how many threads are used by the Uvicorn worker
A sidecar container could receive all requests, count in-flight requests, manage max_replica_concurrency, and forward them to the application container. Within the application container, we'd use FastAPI on Gunicorn with Uvicorn workers (for worker queue fairness), and configure unlimited backlog and limit-concurrency. Knative does something similar to this.
The text was updated successfully, but these errors were encountered:
Description
When using the Uvicorn process manager, requests are assigned to Uvicorn workers randomly. This causes issues including unbalanced queue wait times and the max concurrency limit not working as expected (since each process applies max concurrency independently).
Possible solutions
NGINX
See #1298
Gunicorn + Uvicorn workers
Consider switching to Gunicorn + Uvicorn workers. Gunicorn is a more full-featured process manager than Uvicorn's built in one, and may balance requests across processes better.
Blockers for switching to Gunicorn
Currently there are two features of the Uvicorn process manager that are not supported by Gunicorn:
--limit-concurrency
is used to respond with 503s when the user specified concurrency limit is reached. Here, Uvicorn currently says "Gunicorn provides a different set of configuration options to Uvicorn, so some options such as --limit-concurrency are not yet supported when running with Gunicorn."Here is the Uvicorn Changelog
Add request forwarder sidecar container
A sidecar container could receive all requests, count in-flight requests, manage
max_replica_concurrency
, and forward them to the application container. Within the application container, we'd use FastAPI on Gunicorn with Uvicorn workers (for worker queue fairness), and configure unlimitedbacklog
andlimit-concurrency
. Knative does something similar to this.The text was updated successfully, but these errors were encountered: