You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
Running mxnet-horovod example incubator-mxnet/example/distributed_training-horovod/gluon_mnist.py on mxnet1.8-cuda11.0 with python 3.7 encountered a segfault error. The error occurred after the example script finished.
The same script works fine on mxnet1.8-cuda10.2 with python 3.7 and mxnet1.8-cuda11.0 with python 3.6.
To Reproduce
Steps to reproduce
Launch an EC2 p3.8x gpu instance with dlami: ami-02440419a5afe47ab
Build mx1.8-cu110 from source
Install Horovod python3 -m pip install horovod
Run LD_LIBRARY_PATH=/usr/local/cuda-11.0/lib64:$LD_LIBRARY_PATH python3 \ incubator-mxnet/example/distributed_training-horovod/gluon_mnist.py to reproduce the error
Description
Running mxnet-horovod example
incubator-mxnet/example/distributed_training-horovod/gluon_mnist.py
on mxnet1.8-cuda11.0 with python 3.7 encountered a segfault error. The error occurred after the example script finished.The same script works fine on mxnet1.8-cuda10.2 with python 3.7 and mxnet1.8-cuda11.0 with python 3.6.
To Reproduce
Steps to reproduce
python3 -m pip install horovod
LD_LIBRARY_PATH=/usr/local/cuda-11.0/lib64:$LD_LIBRARY_PATH python3 \ incubator-mxnet/example/distributed_training-horovod/gluon_mnist.py
to reproduce the errorWhat have you tried to solve it?
The text was updated successfully, but these errors were encountered: