Can`t train the model with GPU on a server with RTX3090 #57

Quasimoodo · 2021-08-08T16:07:38Z

I first ran the code with its default config on my server, but later i noticed that the training process was actually on my CPU , and nvidia-smi returned error.
After that, I found it on Dockerhub that I can use the GPU in container with --gpus all when run the docker, that is to say, replace
docker run --rm -m4g -v /path/to/data:/mnt/data -it ratsql
with
docker run --rm --gpus all -m4g -v /path/to/data:/mnt/data -it ratsql
I then found that nvidia-smi works in the container, but when I trained the modle, it turn out to be error like
"the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:331"
I searched that on the internet, it is said that cuda 11+ is necessary for GPU RTX30XX. Then I modified the dockerfile to
pytorch/pytorch:1.5-cuda10.1-cudnn7-devel
and rebuild the image, but the same error occured again.
I wonder whether I can train the model with GPU in docker .Kindly please help me to resolve this issue. Any help will be really appreciated.

The text was updated successfully, but these errors were encountered:

m1nhtu99-hoan9 · 2021-08-27T09:39:50Z

The way I apprehend your issue is that you should check if PyTorch recognised your CUDA device. Try this in the Terminal/or any console: python3 -c "import torch; assert(torch.cuda.is_available())". What is the output?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can`t train the model with GPU on a server with RTX3090 #57

Can`t train the model with GPU on a server with RTX3090 #57

Quasimoodo commented Aug 8, 2021

m1nhtu99-hoan9 commented Aug 27, 2021

Can`t train the model with GPU on a server with RTX3090 #57

Can`t train the model with GPU on a server with RTX3090 #57

Comments

Quasimoodo commented Aug 8, 2021

m1nhtu99-hoan9 commented Aug 27, 2021