Closed
Description
Description
Motivation
NGINX is better suited for this task. For example, it can route to the process with the fewest active requests (Uvicorn does random assignment which leads to imbalance). It can also enforce max concurrency.
It would also make it easier to switch the app server if necessary (e.g. to aiohttp).
Would resolve #839
Open questions
- Can NGNIX track in-flight requests? (here is some info from DataDog). But this might not be desired, since it probably makes most sense for a request to be considered "in-flight" (for autoscaling purposes) when the
post_predict()
is running, even if the response has already been set.