Dockerfile for GPU container. Fix for installing GPU version of MXNet #403

strawberrypie · 2019-10-21T09:11:00Z

Description of changes:

fixes a bug in setup.py that didn't work with the current version of requirements
Dockerfile.gpu for building GPU-enabled Docker image

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Changes from upstream

…tream-master

jaheba

Awesome, thanks!

Do you have any experience running gluon-ts on GPU instances?

Dockerfile.gpu

jaheba · 2019-10-21T09:30:57Z

setup.py

+ re.subn(
+ pattern=mxnet_old,
+ repl=mxnet_new,
+ string=line.rstrip(),
+ count=1,
+ )[0]


How does re.subn help over just str.replace here? We should maybe think about making this more robust.

But more importantly, I don't really like what we are doing here. Are there always compatible releases between mxnet and mxnet-cu92mkl? And, should we be more explicit with which version we install? However, we should discuss this probably in another issue.

Well, I was using the substitution with regex like mxnet[><=]?=, but it looks like simple substitution in enough. Will change it to str.replace.

Regarding the choice of mxnet vs mxnet-cu92mkl — I think that we should do the same thing as with MXNet releases — separate MXNet, MXNet + CUDA, MXNet + MKL versions and Docker images.
Personally, I would prefer to have a GPU version that could seamlessly switch to CPU if there are 0 GPUs found — that's how PyTorch works by default. @jaheba do you know if MXNet can work the same way? Currently, the GPU image fails to work on my device without the Nvidia GPU (Macbook Pro), throws the following error:

Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/usr/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/usr/local/lib/python3.7/dist-packages/gluonts/shell/__main__.py", line 27, in <module> from gluonts.model.estimator import Estimator File "/usr/local/lib/python3.7/dist-packages/gluonts/model/estimator.py", line 19, in <module> from mxnet.gluon import HybridBlock File "/usr/local/lib/python3.7/dist-packages/mxnet/__init__.py", line 24, in <module> from .context import Context, current_context, cpu, gpu, cpu_pinned File "/usr/local/lib/python3.7/dist-packages/mxnet/context.py", line 24, in <module> from .base import classproperty, with_metaclass, _MXClassPropertyMetaClass File "/usr/local/lib/python3.7/dist-packages/mxnet/base.py", line 213, in <module> _LIB = _load_lib() File "/usr/local/lib/python3.7/dist-packages/mxnet/base.py", line 204, in _load_lib lib = ctypes.CDLL(lib_path[0], ctypes.RTLD_LOCAL) File "/usr/lib/python3.7/ctypes/__init__.py", line 356, in __init__ self._handle = _dlopen(self._name, mode) OSError: libcuda.so.1: cannot open shared object file: No such file or directory

Personally, I would prefer to have a GPU version that could seamlessly switch to CPU if there are 0 GPUs found — that's how PyTorch works by default. @jaheba do you know if MXNet can work the same way? Currently, the GPU image fails to work on my device without the Nvidia GPU (Macbook Pro), throws the following error:

Yes, I agree. Maybe @szha can help us out here.

strawberrypie · 2019-10-21T12:52:56Z

I've just started to experiment with GPU instances. I'm using DeepAR in my project and using p2.xlarge with this change is just a bit faster than c5.4xlarge. Sagemaker shows that only 20% of GPU is used. @jaheba can you suggest something to speed it up? Or should I create an issue to investigate GPU usage?

jaheba

Sorry, should have mentioned these before.

But otherwise looks really good to me 👍

Dockerfile.gpu

setup.py

Co-Authored-By: Jasper Schulz <jasper.b.schulz@googlemail.com>

jaheba · 2019-10-21T12:58:59Z

I've just started to experiment with GPU instances. I'm using DeepAR in my project and using p2.xlarge with this change is just a bit faster than c5.4xlarge. Sagemaker shows that only 20% of GPU is used. @jaheba can you suggest something to speed it up? Or should I create an issue to investigate GPU usage?

We've also seen no real performance benefit using GPUs with SageMaker DeepAR as well. There might be some performance increase when large batch sizes are used.

However, other models (e.g. Wavenet) should benefit much more from using GPUs than DeepAR does.

/cc @vafl

jaheba

Thanks, again!

jaheba · 2019-10-21T13:00:56Z

@strawberrypie oh, I think having a dedicated issues regarding GPUs would be great, thanks.

…awslabs#403) * Dockerfile for GPU container. Fix for installing GPU version of MXNet * Typo fix. Replacing requirement without regex.

strawberrypie added 4 commits October 9, 2019 19:04

Merge pull request #1 from awslabs/master

7eb9353

Changes from upstream

Merge pull request #2 from awslabs/master

1712228

Changes from upstream

Merge branch 'master' of https://github.com/awslabs/gluon-ts into ups…

a47b861

…tream-master

Dockerfile for GPU container. Fix for installing GPU version of MXNet

fc30e08

jaheba suggested changes Oct 21, 2019

View reviewed changes

strawberrypie added 2 commits October 21, 2019 15:49

Typo fix. Replacing requirement without regex.

770dcf3

Merge branch 'master' into docker-gpu

e642930

jaheba suggested changes Oct 21, 2019

View reviewed changes

Dockerfile.gpu Outdated Show resolved Hide resolved

setup.py Outdated Show resolved Hide resolved

strawberrypie and others added 2 commits October 21, 2019 15:54

Update Dockerfile.gpu

80f0fe8

Co-Authored-By: Jasper Schulz <jasper.b.schulz@googlemail.com>

Update setup.py

0d4b18a

Co-Authored-By: Jasper Schulz <jasper.b.schulz@googlemail.com>

jaheba approved these changes Oct 21, 2019

View reviewed changes

jaheba merged commit a894aee into awslabs:master Oct 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dockerfile for GPU container. Fix for installing GPU version of MXNet #403

Dockerfile for GPU container. Fix for installing GPU version of MXNet #403

strawberrypie commented Oct 21, 2019

jaheba left a comment

jaheba Oct 21, 2019

strawberrypie Oct 21, 2019

jaheba Oct 21, 2019

strawberrypie commented Oct 21, 2019

jaheba left a comment

jaheba commented Oct 21, 2019

jaheba left a comment

jaheba commented Oct 21, 2019

Dockerfile for GPU container. Fix for installing GPU version of MXNet #403

Dockerfile for GPU container. Fix for installing GPU version of MXNet #403

Conversation

strawberrypie commented Oct 21, 2019

jaheba left a comment

Choose a reason for hiding this comment

jaheba Oct 21, 2019

Choose a reason for hiding this comment

strawberrypie Oct 21, 2019

Choose a reason for hiding this comment

jaheba Oct 21, 2019

Choose a reason for hiding this comment

strawberrypie commented Oct 21, 2019

jaheba left a comment

Choose a reason for hiding this comment

jaheba commented Oct 21, 2019

jaheba left a comment

Choose a reason for hiding this comment

jaheba commented Oct 21, 2019