-
Notifications
You must be signed in to change notification settings - Fork 223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimizer CPU offload doesn't work outside of CUDA #958
Comments
Yes, CPU offload optimizer only works on CUDA. There is no point to support other devices. Perhaps we should make it clearer 🤔
I'm not too familiar with other hardware, so not sure how this works with AMD and Intel GPUs. |
pytorch isn't taking much advantage of MPS in a unified memory way, and, i agree, there isn't a whole lot of point to it on MPS other than to be able to run the same code and ensure a consistent experience. for ROCm, i'm almost certain it just masquerades as CUDA and is invisible. the Intel and other systems like Ascend NPU rely on XPU or NPU extensions in pytorch, and TPUs require XLA. |
Yeah agree with @gau-nernst I'd just maybe prioritize sanity checking whether the offloader works on an AMD GPU |
This (cpu offload on AMD) is on our roadmap, stay tuned. |
@petrex Sounds awesome! Actually our single-GPU CPU-offload optimizer only relies on PyTorch's CUDA stream API, so as long as PyTorch ROCm supports those API, it should work on AMD GPUs too! I don't have access to AMD GPUs so I can't test it out though. |
The CPUOptimizerOffload class is very clever, but overly relies on CUDA Streams, which aren't available w/o a CUDA device.
should use
torch.cpu.Stream
andtorch.cpu.current_stream
instead.additionally,
pin_memory=True if torch.cuda.is_available() else False
as MPS is a unified mem arch.The text was updated successfully, but these errors were encountered: