-
Notifications
You must be signed in to change notification settings - Fork 834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seldon does not work with Gunicorn async workers #2499
Comments
Hi @pdstrnadJC, thanks for the bug report. I am just curious, what exact advantages you would see by using the Gunicorn's Async Workers? Most of the prediction tasks are CPU-bound. |
Hi @RafalSkolasinski, so before calling the model's predict function I'm making a HTTP request to get some data that will be passed to the model. I was hoping that by using the async workers, Seldon could do other work while it waits for the response to come back. Initially I was using a Seldon TRANSFORMER to make the request, and I had setup the inference graph such that the TRANSFORMER is called before the MODEL, so that could be an option again, but I preferred to keep the graph as simple as possible. |
Hey @pdstrnadJC, in general both After running a couple small experiments, it seems that On the other hand, it seems that import eventlet
eventlet.monkey_patch() After doing that, I was able to set It's also worth mentioning that our current plan is to eventually introduce our new language wrapper, MLServer, which has been built with Feel free to re-open if you've got any further questions! |
Describe the bug
There are actually two issues, closely related, so I'm posting them in one bug report.
Seldon does not provide a way to use Gunicorn's async workers. At first glance it seemed that setting the
-k
or--worker-class
flag in environment variableGUNICORN_CMD_ARGS
would do the trick, but Seldon doesn't consider that variable when deriving the config it passes to Gunicorn (see load_config() in StandaloneApplication - and for comparison this is how Gunicorn's BaseApplication does it).Since I wanted to test with async workers locally regardless of the above, I cloned the Seldon repo and hard-coded the worker class in microservice.py.
When I try to run Seldon locally it starts fine - once it tries to handle a prediction request (
POST /api/v1.0/predictions
endpoint) it returns a 500 and I see this error in the logs:This actually seems to be a Python issue? Many workarounds I found (one example) suggested calling
join()
on themp.Process
object, but you're doing that already. I'm not a Python expert or power user, so I'm not sure what the options are!To reproduce
gunicorn[eventlet]
orgunicorn[gevent]
- depending on which worker class you decide to use.Expected behaviour
I would expect the prediction request to return a 200.
Environment
I've only run this locally so far - it hasn't made it to our k8s cluster yet. Locally I ran on MacOS or in a Docker container based on python:3.7-slim.
The text was updated successfully, but these errors were encountered: