You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I first ran the code with its default config on my server, but later i noticed that the training process was actually on my CPU , and nvidia-smi returned error.
After that, I found it on Dockerhub that I can use the GPU in container with --gpus all when run the docker, that is to say, replace docker run --rm -m4g -v /path/to/data:/mnt/data -it ratsql
with docker run --rm --gpus all -m4g -v /path/to/data:/mnt/data -it ratsql
I then found that nvidia-smi works in the container, but when I trained the modle, it turn out to be error like
"the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:331"
I searched that on the internet, it is said that cuda 11+ is necessary for GPU RTX30XX. Then I modified the dockerfile to pytorch/pytorch:1.5-cuda10.1-cudnn7-devel
and rebuild the image, but the same error occured again.
I wonder whether I can train the model with GPU in docker .Kindly please help me to resolve this issue. Any help will be really appreciated.
The text was updated successfully, but these errors were encountered:
The way I apprehend your issue is that you should check if PyTorch recognised your CUDA device. Try this in the Terminal/or any console: python3 -c "import torch; assert(torch.cuda.is_available())". What is the output?
I first ran the code with its default config on my server, but later i noticed that the training process was actually on my CPU , and
nvidia-smi
returned error.After that, I found it on Dockerhub that I can use the GPU in container with
--gpus all
when run the docker, that is to say, replacedocker run --rm -m4g -v /path/to/data:/mnt/data -it ratsql
with
docker run --rm --gpus all -m4g -v /path/to/data:/mnt/data -it ratsql
I then found that
nvidia-smi
works in the container, but when I trained the modle, it turn out to be error like"the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:331"
I searched that on the internet, it is said that cuda 11+ is necessary for GPU RTX30XX. Then I modified the dockerfile to
pytorch/pytorch:1.5-cuda10.1-cudnn7-devel
and rebuild the image, but the same error occured again.
I wonder whether I can train the model with GPU in docker .Kindly please help me to resolve this issue. Any help will be really appreciated.
The text was updated successfully, but these errors were encountered: