Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No access to rocminfo in a production environment - ability to manually set GPU arch. #53

Open
isaranto opened this issue Dec 9, 2024 · 0 comments

Comments

@isaranto
Copy link

isaranto commented Dec 9, 2024

System Info

Working on a kubernetes deployment with debian + pytorch 2.4.0 + ROCm 6.1.
The deployment is using the multiple backend alpha release available in the parent bitsandbytes repo.

Reproduction

Trying to load a model with bitsandbytes fails because there is no access to rocminfo.

def get_rocm_gpu_arch() -> str:
    logger = logging.getLogger(__name__)
    try:
        if torch.version.hip:
            result = subprocess.run(["rocminfo"], capture_output=True, text=True)
            match = re.search(r"Name:\s+gfx([a-zA-Z\d]+)", result.stdout)
ERROR:bitsandbytes.cuda_specs:Could not detect ROCm GPU architecture: [Errno 2] No such file or directory: 'rocminfo'
WARNING:bitsandbytes.cuda_specs:
ROCm GPU architecture detection failed despite ROCm being available.

result = subprocess.run(["rocminfo"], capture_output=True, text=True)

Expected behavior

I would prefer if I could set the architecture via an environment variable and rocminfo would be the fallback option if the env var is not set.
Here is the related cope snippet.
Happy to work on this if other people feel it is a good workaround.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant