Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyTorch for ROCm is overwritten by PyTorch for CUDA #37

Open
ghost opened this issue Jun 6, 2024 · 4 comments
Open

PyTorch for ROCm is overwritten by PyTorch for CUDA #37

ghost opened this issue Jun 6, 2024 · 4 comments

Comments

@ghost
Copy link

ghost commented Jun 6, 2024

System Info

If I follow installation guide on README, lion-pytorch is installed (see requirements-dev.txt). However, installing lion-pytorch cause uninstallation of PyTorch for ROCm (e.g., 2.4.0.dev20240520+rocm6.1) and install PyTorch for CUDA (e.g., 2.3.1+cu121). Is there a way to avoid the overwrite?

A workaround is to install lion-pytorch first, then re-install PyTorch for ROCm manually. However, it may confuse a user. If there is no way to avoid the overwrite, I will raise the PR to add explanation to README.

Reproduction

  • Original PyTorch for ROCm version (e.g., 2.4.0.dev20240520+rocm6.1)
python -c 'import torch; print(torch.__version__)'
  • Install lion-pytorch
git clone --recurse https://github.com/ROCm/bitsandbytes
cd bitsandbytes
git checkout rocm_enabled
pip install -r requirements-dev.txt
  • PyTorch is replaced by CUDA version (e.g., 2.3.1+cu121)
python -c 'import torch; print(torch.__version__)'

Expected behavior

Keep original PyTorch for ROCm if possible. If not, at least we should add note in order to call a user attention to reinstall PyTorch for ROCm.

@pnunna93
Copy link
Collaborator

Please use same pip/python version for pytorch and lion-pytorch installation.

@ghost
Copy link
Author

ghost commented Jun 11, 2024

After being replaced by CUDA PyTorch, I need to re-install PyTorch for ROCm like this.

pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.1/

It's better to give user a heads-up on this in README. When installing this bitsandbytes for ROCm, I didn't notice my PyTorch for ROCm is replaced by PyTorch for CUDA, then my model training code was not working due to that, which wastes a lot of time for me to debug..

@pnunna93
Copy link
Collaborator

@taka-nscc , please check that your environment doesn't have multiple python/pip versions. You can create a container from one of our pytorch dockers to be sure and install inside it.
docker pull rocm/pytorch:latest
docker run -it --device=/dev/kfd --device=/dev/dri --group-add video rocm/pytorch:latest

@ghost
Copy link
Author

ghost commented Jun 18, 2024

This is the procedure to set up my environment.

# Pull and run the latest Docker image of PyTorch for ROCm
docker pull rocm/pytorch:latest
docker run -itd -v /home/work:/root --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host rocm/pytorch:latest

# Check PyTorch version of the container
python -c 'import torch; print(torch.__version__)' # -> 2.4.0.dev20240520+rocm6.0

# Install bitsandbytes
git clone --recurse https://github.com/ROCm/bitsandbytes
cd bitsandbytes
git checkout rocm_enabled
pip install -r requirements-dev.txt
cmake -DCOMPUTE_BACKEND=hip -S .
make
pip install .

# PyTorch for CUDA is installed after installing bitsandbytes as follows
python -c 'import torch; print(torch.__version__)' # -> 2.3.1+cu121

It would be user friendly if we could add explanation about this in README so that users can recognize they need to re-install PyTorch manually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant