Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

renaming of mxnet-model-server in sagemaker-inference package 1.5.3 causing entrypoint with command serve to fail #88

Open
RZachLamberty opened this issue Dec 4, 2020 · 1 comment

Comments

@RZachLamberty
Copy link

Describe the bug
sagemaker-inference recently (10/15) released v1.5.3, which included this commit updating the name of the model server artifact and command from mxnet-model-server to multi-model-server.

all containers defined in this repository install sagemaker-inference as a dependency of this repo itself, on lines

RUN pip install --no-cache-dir "sagemaker-pytorch-inference<2"

and this repo's setup.py has an install_requires which includes sagemaker-inference>=1.3.1. as a result, sagemaker-inference=1.5.3 installed.

so while the Dockerfile's CMD value (which calls mxnet-model-server directly) will succeed, attempts to use the ENTRYPOINT with serve as a build arg will fail with message:

Traceback (most recent call last):
  File "/usr/local/bin/dockerd-entrypoint.py", line 22, in <module>
    serving.main()
  File "/opt/conda/lib/python3.6/site-packages/sagemaker_pytorch_serving_container/serving.py", line 39, in main
    _start_model_server()
  File "/opt/conda/lib/python3.6/site-packages/retrying.py", line 49, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)
  File "/opt/conda/lib/python3.6/site-packages/retrying.py", line 206, in call
    return attempt.get(self._wrap_exception)
  File "/opt/conda/lib/python3.6/site-packages/retrying.py", line 247, in get
    six.reraise(self.value[0], self.value[1], self.value[2])
  File "/opt/conda/lib/python3.6/site-packages/six.py", line 703, in reraise
    raise value
  File "/opt/conda/lib/python3.6/site-packages/retrying.py", line 200, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
  File "/opt/conda/lib/python3.6/site-packages/sagemaker_pytorch_serving_container/serving.py", line 35, in _start_model_server
    model_server.start_model_server(handler_service=HANDLER_SERVICE)
  File "/opt/conda/lib/python3.6/site-packages/sagemaker_inference/model_server.py", line 94, in start_model_server
    subprocess.Popen(multi_model_server_cmd)
  File "/opt/conda/lib/python3.6/subprocess.py", line 709, in __init__
    restore_signals, start_new_session)
  File "/opt/conda/lib/python3.6/subprocess.py", line 1344, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'multi-model-server': 'multi-model-server'

To reproduce

  1. build any container
  2. mount a model and inference.py (e.g. half_plus_three) into /opt/ml/model
  3. docker run [tag name] serve

Expected behavior
tensorflow serving serves the mounted model / inference.py

System information
A description of your system. Please provide:

  • Toolkit version: 2.0.5, but should apply to all versions
  • Framework version: 1.4, but should apply to all versions
  • Python version: 3.7
  • CPU or GPU: cpu, but should apply to both
  • Custom Docker image (Y/N): N
@saifvazir
Copy link

Hi @RZachLamberty , I stumbled upon your issue here. I was trying to create a custom docker image and had a similar issue. Installing multi-model-server (pip install multi-model-server) did away with this issue. You can give it a try :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants