Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuda not recognized #78

Open
davidhoover opened this issue Oct 10, 2023 · 4 comments
Open

cuda not recognized #78

davidhoover opened this issue Oct 10, 2023 · 4 comments

Comments

@davidhoover
Copy link

After installing v1.0.5, I am getting a message saying cuda is not recognized:

/opt/mamba/envs/model_angelo/lib/python3.10/site-packages/torch/amp/autocast_mode.py:204: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling

I am running this on an Nvidia v100 node, capable of handling recent CUDA libraries and drivers. Is there anything that can be done to force pytorch to recognize cuda and the GPU devices available on the node?

@jamaliki
Copy link
Collaborator

Hi,

What may work is for you to try installing pytorch with the GPU like the following. Please let me know if you have further issues. This should be in the same conda environment.

python -mpip install torch torchvision torchaudio

If you continue having issues, you can try the following conda command too

conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

@davidhoover
Copy link
Author

That did not work. I figured it out. Can you be more specific in your installation statement? This worked:

pytorch==2.0.0=py3.10_cuda11.8_cudnn8.7.0_0 torchvision torchaudio cudatoolkit pytorch-cuda=11.8 -c pytorch -c conda-forge -c nvidia

@jamaliki
Copy link
Collaborator

Wow @davidhoover thanks so much for figuring this out. May I ask where you found this? I want to see if the part with pytorch=2.0.0...etc is the important bit or if it's adding -c conda-forge and cudatoolkit explicitly

@davidhoover
Copy link
Author

I'm not sure if cudatoolkit needs to be explicit. I figured this out by trial-and-error, guided by this running this test on a GPU node:

python -c 'import torch; print(torch.cuda.is_available())'

If the result is False, then model-angelo will not utilize gpus. If True, then we're all good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants