Use nvidia-container-toolkit instead of nvidia-docker2 to expose GPUs in Cortex local #1223

vishalbollu · 2020-07-16T16:56:11Z

Description

Cortex local currently relies on setting up a docker runtime with nvidia-docker2 to access gpus. This method is deprecated as of Docker version 19.03. For Docker versions >= 19.03, GPUs should be accessible via --gpus all flag after installing nvidia-container-toolkit https://github.com/NVIDIA/nvidia-docker#quickstart.

If it is possible, support both ways of exposing GPUs to Cortex local.

The text was updated successfully, but these errors were encountered:

dakshvar22 · 2020-10-10T22:30:27Z

Hi, is nvidia-docker2 still supported? I have docker version < 19.03 and nvidia-docker2 install and I would like to leverage GPU without having to upgrade docker. Currently with v0.20 it doesn't seem like GPU is made available inside the service. Can you please clarify?

deliahu · 2020-10-11T02:30:09Z

@dakshvar22 yes, it should fall back on nvidia-docker2 if nvidia-container-toolkit is not found. What is the error message that you see when you try?

dakshvar22 · 2020-10-11T08:13:13Z

I don't see any error, but the GPU isn't visible inside the container and hence not used by the inference API.

…

On Sun, Oct 11, 2020, 04:30 David Eliahu ***@***.***> wrote: @dakshvar22 <https://github.com/dakshvar22> yes, it should fall back on nvidia-docker2 if nvidia-container-toolkit is not found. What is the error message that you see when you try? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1223 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACCOBGJGWTNB4LMD7IM3FD3SKEKD3ANCNFSM4O4SS2UA> .

deliahu · 2020-10-11T14:59:37Z

@dakshvar22 are you running an example from the cortex repo (if so, which one?), or your own API (if so, which predictor type?). Also, what is the base image you're using for the API container, or are you using the default?

dakshvar22 · 2020-10-11T15:18:45Z

I am using my own API which is just a Python Predictor API since my model is trained in fairseq lib. I am using the default GPU docker image which is mentioned in the docs for python predictor API.

…

On Sun, Oct 11, 2020, 16:59 David Eliahu ***@***.***> wrote: @dakshvar22 <https://github.com/dakshvar22> are you running an example from the cortex repo (if so, which one?), or your own API (if so, which predictor type?). Also, what is the base image you're using for the API container, or are you using the default? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1223 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACCOBGKPSW2W6HDZKJMGKX3SKHB6LANCNFSM4O4SS2UA> .

deliahu · 2020-10-11T20:21:19Z

@dakshvar22 do you mind sharing your cortex.yaml file, as well as a simple Dockerfile and predictor.py which can be used to reproduce it?

For example, the Dockerfile might just show:

FROM cortexlabs/python-predictor-gpu-slim:0.20.0-cuda10.1
RUN ... # install your dependencies

And your predictor.py might just show:

class PythonPredictor:
    def __init__(self, config):
        print(is_gpu_visible())  # replace is_gpu_visible() with the the appropriate function call

    def predict(self, payload):
        return "ok"

vishalbollu · 2020-10-13T18:32:37Z

@dakshvar22 In addition the the information requested by @deliahu, it would also be helpful if you can share the output docker info.

vishalbollu added the enhancement New feature or request label Jul 16, 2020

deliahu added the v0.20 label Aug 27, 2020

deliahu self-assigned this Aug 27, 2020

vishalbollu mentioned this issue Sep 18, 2020

Expose GPUs using device driver #1366

Merged

2 tasks

deliahu assigned vishalbollu and unassigned deliahu Sep 21, 2020

vishalbollu closed this as completed in #1366 Sep 21, 2020

deliahu added this to the v0.20 milestone Nov 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use nvidia-container-toolkit instead of nvidia-docker2 to expose GPUs in Cortex local #1223

Use nvidia-container-toolkit instead of nvidia-docker2 to expose GPUs in Cortex local #1223

vishalbollu commented Jul 16, 2020

dakshvar22 commented Oct 10, 2020

deliahu commented Oct 11, 2020

dakshvar22 commented Oct 11, 2020 via email

deliahu commented Oct 11, 2020

dakshvar22 commented Oct 11, 2020 via email

deliahu commented Oct 11, 2020

vishalbollu commented Oct 13, 2020 •

edited

Loading

Use nvidia-container-toolkit instead of nvidia-docker2 to expose GPUs in Cortex local #1223

Use nvidia-container-toolkit instead of nvidia-docker2 to expose GPUs in Cortex local #1223

Comments

vishalbollu commented Jul 16, 2020

Description

dakshvar22 commented Oct 10, 2020

deliahu commented Oct 11, 2020

dakshvar22 commented Oct 11, 2020 via email

deliahu commented Oct 11, 2020

dakshvar22 commented Oct 11, 2020 via email

deliahu commented Oct 11, 2020

vishalbollu commented Oct 13, 2020 • edited Loading

vishalbollu commented Oct 13, 2020 •

edited

Loading