Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pytorch: safetensors library hardcodes using CUDA if only device index is provided #499

Closed
dvrogozh opened this issue Jul 13, 2024 · 3 comments

Comments

@dvrogozh
Copy link

In relevance to:

safetensors library hardcodes returning CUDA device if only device index is provided. This causes runtime errors running huggingface models with pipeline(device_map="auto") as noted in huggingface/transformers#31941 (see this issue for repro steps). Hardcoding is happening here:

} else if let Ok(number) = ob.extract::<usize>() {
Ok(Device::Cuda(number))

A possible solution might be to return the device returned by torch.device(N). Note however that this will work for non-CUDA devices only after the following change in pytorch will be merged:

CC: @faaany @muellerzr @SunMarc @guangyey

@dvrogozh
Copy link
Author

FYI, pytorch/pytorch#129119 got merged, so solution which I outlined should now be possible.

dvrogozh added a commit to dvrogozh/safetensors that referenced this issue Jul 15, 2024
Fixes: huggingface#499
Fixes: huggingface/transformers#31941

In some cases only device index is given on querying device. In this
case both PyTorch and Safetensors were returning 'cuda:N' by default.
This is causing runtime failures if user actually runs something on
non-cuda device and does not have cuda at all. Recently this was
addressed on PyTorch side by [1]: starting from PyTorch 2.5 calling
'torch.device(N)' will return current device instead of cuda device.

This commit is making similar change to Safetensors. If only device
index is given, Safetensors will query and return device calling
'torch.device(N)'. This change is backward compatible since this call
would return 'cuda:N' on PyTorch <=2.4 which aligns with previous
Safetensors behavior.

See[1]: pytorch/pytorch#129119
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
@dvrogozh
Copy link
Author

I have implemented a fix for this issue as I do see it. Please, help review #500.

dvrogozh added a commit to dvrogozh/safetensors that referenced this issue Jul 25, 2024
Fixes: huggingface#499
Fixes: huggingface/transformers#31941

In some cases only device index is given on querying device. In this
case both PyTorch and Safetensors were returning 'cuda:N' by default.
This is causing runtime failures if user actually runs something on
non-cuda device and does not have cuda at all. Recently this was
addressed on PyTorch side by [1]: starting from PyTorch 2.5 calling
'torch.device(N)' will return current device instead of cuda device.

This commit is making similar change to Safetensors. If only device
index is given, Safetensors will query and return device calling
'torch.device(N)'. This change is backward compatible since this call
would return 'cuda:N' on PyTorch <=2.4 which aligns with previous
Safetensors behavior.

See[1]: pytorch/pytorch#129119
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
dvrogozh added a commit to dvrogozh/safetensors that referenced this issue Jul 30, 2024
Fixes: huggingface#499
Fixes: huggingface/transformers#31941

In some cases only device index is given on querying device. In this
case both PyTorch and Safetensors were returning 'cuda:N' by default.
This is causing runtime failures if user actually runs something on
non-cuda device and does not have cuda at all. Recently this was
addressed on PyTorch side by [1]: starting from PyTorch 2.5 calling
'torch.device(N)' will return current device instead of cuda device.

This commit is making similar change to Safetensors. If only device
index is given, Safetensors will query and return device calling
'torch.device(N)'. This change is backward compatible since this call
would return 'cuda:N' on PyTorch <=2.4 which aligns with previous
Safetensors behavior.

See[1]: pytorch/pytorch#129119
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
dvrogozh added a commit to dvrogozh/safetensors that referenced this issue Jul 31, 2024
Fixes: huggingface#499
Fixes: huggingface/transformers#31941

In some cases only device index is given on querying device. In this
case both PyTorch and Safetensors were returning 'cuda:N' by default.
This is causing runtime failures if user actually runs something on
non-cuda device and does not have cuda at all. Recently this was
addressed on PyTorch side by [1]: starting from PyTorch 2.5 calling
'torch.device(N)' will return current device instead of cuda device.

This commit is making similar change to Safetensors. If only device
index is given, Safetensors will query and return device calling
'torch.device(N)'. This change is backward compatible since this call
would return 'cuda:N' on PyTorch <=2.4 which aligns with previous
Safetensors behavior.

See[1]: pytorch/pytorch#129119

Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
@Narsil
Copy link
Collaborator

Narsil commented Aug 1, 2024

Closed by #509

@Narsil Narsil closed this as completed Aug 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants