-
Notifications
You must be signed in to change notification settings - Fork 750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dockerfile for GPU container. Fix for installing GPU version of MXNet #403
Conversation
Changes from upstream
Changes from upstream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks!
Do you have any experience running gluon-ts on GPU instances?
setup.py
Outdated
re.subn( | ||
pattern=mxnet_old, | ||
repl=mxnet_new, | ||
string=line.rstrip(), | ||
count=1, | ||
)[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does re.subn
help over just str.replace
here? We should maybe think about making this more robust.
But more importantly, I don't really like what we are doing here. Are there always compatible releases between mxnet
and mxnet-cu92mkl
? And, should we be more explicit with which version we install? However, we should discuss this probably in another issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I was using the substitution with regex like mxnet[><=]?=
, but it looks like simple substitution in enough. Will change it to str.replace
.
Regarding the choice of mxnet
vs mxnet-cu92mkl
— I think that we should do the same thing as with MXNet releases — separate MXNet, MXNet + CUDA, MXNet + MKL versions and Docker images.
Personally, I would prefer to have a GPU version that could seamlessly switch to CPU if there are 0 GPUs found — that's how PyTorch works by default. @jaheba do you know if MXNet can work the same way? Currently, the GPU image fails to work on my device without the Nvidia GPU (Macbook Pro), throws the following error:
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.7/dist-packages/gluonts/shell/__main__.py", line 27, in <module>
from gluonts.model.estimator import Estimator
File "/usr/local/lib/python3.7/dist-packages/gluonts/model/estimator.py", line 19, in <module>
from mxnet.gluon import HybridBlock
File "/usr/local/lib/python3.7/dist-packages/mxnet/__init__.py", line 24, in <module>
from .context import Context, current_context, cpu, gpu, cpu_pinned
File "/usr/local/lib/python3.7/dist-packages/mxnet/context.py", line 24, in <module>
from .base import classproperty, with_metaclass, _MXClassPropertyMetaClass
File "/usr/local/lib/python3.7/dist-packages/mxnet/base.py", line 213, in <module>
_LIB = _load_lib()
File "/usr/local/lib/python3.7/dist-packages/mxnet/base.py", line 204, in _load_lib
lib = ctypes.CDLL(lib_path[0], ctypes.RTLD_LOCAL)
File "/usr/lib/python3.7/ctypes/__init__.py", line 356, in __init__
self._handle = _dlopen(self._name, mode)
OSError: libcuda.so.1: cannot open shared object file: No such file or directory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I would prefer to have a GPU version that could seamlessly switch to CPU if there are 0 GPUs found — that's how PyTorch works by default. @jaheba do you know if MXNet can work the same way? Currently, the GPU image fails to work on my device without the Nvidia GPU (Macbook Pro), throws the following error:
Yes, I agree. Maybe @szha can help us out here.
I've just started to experiment with GPU instances. I'm using DeepAR in my project and using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, should have mentioned these before.
But otherwise looks really good to me 👍
Co-Authored-By: Jasper Schulz <jasper.b.schulz@googlemail.com>
Co-Authored-By: Jasper Schulz <jasper.b.schulz@googlemail.com>
We've also seen no real performance benefit using GPUs with SageMaker DeepAR as well. There might be some performance increase when large batch sizes are used. However, other models (e.g. Wavenet) should benefit much more from using GPUs than DeepAR does. /cc @vafl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, again!
@strawberrypie oh, I think having a dedicated issues regarding GPUs would be great, thanks. |
…awslabs#403) * Dockerfile for GPU container. Fix for installing GPU version of MXNet * Typo fix. Replacing requirement without regex.
Description of changes:
setup.py
that didn't work with the current version of requirementsDockerfile.gpu
for building GPU-enabled Docker imageBy submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.