Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seldon does not work with Gunicorn async workers #2499

Closed
pdstrnadJC opened this issue Sep 29, 2020 · 3 comments
Closed

Seldon does not work with Gunicorn async workers #2499

pdstrnadJC opened this issue Sep 29, 2020 · 3 comments
Assignees
Labels
Milestone

Comments

@pdstrnadJC
Copy link

Describe the bug

There are actually two issues, closely related, so I'm posting them in one bug report.

  1. Seldon does not provide a way to use Gunicorn's async workers. At first glance it seemed that setting the -k or --worker-class flag in environment variable GUNICORN_CMD_ARGS would do the trick, but Seldon doesn't consider that variable when deriving the config it passes to Gunicorn (see load_config() in StandaloneApplication - and for comparison this is how Gunicorn's BaseApplication does it).

  2. Since I wanted to test with async workers locally regardless of the above, I cloned the Seldon repo and hard-coded the worker class in microservice.py.

            def rest_prediction_server():
                options = {
                    "bind": "%s:%s" % ("0.0.0.0", port),
                    # ...
                    "worker_class": "eventlet" # or "gevent"
                }

When I try to run Seldon locally it starts fine - once it tries to handle a prediction request (POST /api/v1.0/predictions endpoint) it returns a 500 and I see this error in the logs:

2020-09-28 22:43:54,870 - seldon_core.wrapper:log_exception:1892 - ERROR:  Exception on /api/v1.0/predictions [POST]
Traceback (most recent call last):
  File "/opt/conda/envs/mlflow/lib/python3.7/multiprocessing/managers.py", line 811, in _callmethod
    conn = self._tls.connection
AttributeError: 'ForkAwareLocal' object has no attribute 'connection'

This actually seems to be a Python issue? Many workarounds I found (one example) suggested calling join() on the mp.Process object, but you're doing that already. I'm not a Python expert or power user, so I'm not sure what the options are!

To reproduce

  1. Edit microservice.py as described above.
  2. Update requirements.txt to install either gunicorn[eventlet] or gunicorn[gevent] - depending on which worker class you decide to use.
  3. Start Seldon and send a request to the prediction endpoint.

Expected behaviour

I would expect the prediction request to return a 200.

Environment

I've only run this locally so far - it hasn't made it to our k8s cluster yet. Locally I ran on MacOS or in a Docker container based on python:3.7-slim.

@pdstrnadJC pdstrnadJC added bug triage Needs to be triaged and prioritised accordingly labels Sep 29, 2020
@axsaucedo axsaucedo added this to the 1.4 milestone Sep 29, 2020
@adriangonz adriangonz self-assigned this Sep 29, 2020
@RafalSkolasinski
Copy link
Contributor

Hi @pdstrnadJC, thanks for the bug report.

I am just curious, what exact advantages you would see by using the Gunicorn's Async Workers? Most of the prediction tasks are CPU-bound.

@pdstrnadJC
Copy link
Author

Hi @RafalSkolasinski, so before calling the model's predict function I'm making a HTTP request to get some data that will be passed to the model. I was hoping that by using the async workers, Seldon could do other work while it waits for the response to come back. Initially I was using a Seldon TRANSFORMER to make the request, and I had setup the inference graph such that the TRANSFORMER is called before the MODEL, so that could be an option again, but I preferred to keep the graph as simple as possible.

@ukclivecox ukclivecox removed the triage Needs to be triaged and prioritised accordingly label Oct 1, 2020
@adriangonz
Copy link
Contributor

Hey @pdstrnadJC, in general both gevent and greenlet are a bit fiddly to use. Both require you to monkeypatch your code. This then gets even fiddlier when you consider that we leverage multiprocessing to run a couple separate servers at once within the same Python container.

After running a couple small experiments, it seems that gevent just doesn't play well with multiprocessing at all. You can check this issue, where the conclusion is plainly that you can't use both at the same time.

On the other hand, it seems that eventlet works a bit better alongside multiprocessing. The only extra thing I had to do was to explicitly add the monkeypatch call to the top of python/seldon_core/microservice.py:

import eventlet
eventlet.monkey_patch()

After doing that, I was able to set worker_class: eventlet and it seemed to work. By that, I mean that I was able to send a couple requests and I was able to get the responses. Note that this doesn't mean that there couldn't be other issues downstream. For example, I did notice that the process seems to hang whenever you try to exit it. There is an open issue keeping track of all incompatibilities between eventlet and multiprocessing: eventlet/eventlet#210

It's also worth mentioning that our current plan is to eventually introduce our new language wrapper, MLServer, which has been built with asyncio in mind. We've recently introduced early support for the SKLearn and XGBoost pre-packaged servers (we'll be adding some examples soon). You can also check an example of how to add custom inference logic with MLServer. Keep in mind that this is still considered an incubating project, although it could still be worth evaluating it for your use case.

Feel free to re-open if you've got any further questions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants