optimizer CPU offload doesn't work outside of CUDA #958

bghira · 2024-09-26T20:12:37Z

The CPUOptimizerOffload class is very clever, but overly relies on CUDA Streams, which aren't available w/o a CUDA device.

should use torch.cpu.Stream and torch.cpu.current_stream instead.

additionally, pin_memory=True if torch.cuda.is_available() else False as MPS is a unified mem arch.

The text was updated successfully, but these errors were encountered:

gau-nernst · 2024-09-26T21:30:20Z

Yes, CPU offload optimizer only works on CUDA. There is no point to support other devices. Perhaps we should make it clearer 🤔

If you train with CPU, well, you don't need CPU offload
If you train with MPS, since it is unified memory, again, you don't need CPU offload.

I'm not too familiar with other hardware, so not sure how this works with AMD and Intel GPUs.

bghira · 2024-09-26T22:42:27Z

pytorch isn't taking much advantage of MPS in a unified memory way, and, i agree, there isn't a whole lot of point to it on MPS other than to be able to run the same code and ensure a consistent experience.

for ROCm, i'm almost certain it just masquerades as CUDA and is invisible. the Intel and other systems like Ascend NPU rely on XPU or NPU extensions in pytorch, and TPUs require XLA.

msaroufim · 2024-09-26T23:23:45Z

Yeah agree with @gau-nernst I'd just maybe prioritize sanity checking whether the offloader works on an AMD GPU

petrex · 2024-11-11T23:36:11Z

This (cpu offload on AMD) is on our roadmap, stay tuned.

gau-nernst · 2024-11-12T01:29:28Z

@petrex Sounds awesome! Actually our single-GPU CPU-offload optimizer only relies on PyTorch's CUDA stream API, so as long as PyTorch ROCm supports those API, it should work on AMD GPUs too! I don't have access to AMD GPUs so I can't test it out though.

https://github.com/pytorch/ao/blob/v0.6.1/torchao/prototype/low_bit_optim/cpu_offload.py

gau-nernst mentioned this issue Sep 29, 2024

[low-bit optim] Update docs on supported platforms and caveats #971

Merged

msaroufim added the multibackend label Oct 1, 2024

jcaip mentioned this issue Nov 11, 2024

AMD integration tracker #1260

Open

1 task

yanbing-j pushed a commit to yanbing-j/ao that referenced this issue Dec 9, 2024

[Android] Skip if token is system token <...> (pytorch#958)

900b6d4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimizer CPU offload doesn't work outside of CUDA #958

optimizer CPU offload doesn't work outside of CUDA #958

bghira commented Sep 26, 2024

gau-nernst commented Sep 26, 2024

bghira commented Sep 26, 2024

msaroufim commented Sep 26, 2024

petrex commented Nov 11, 2024

gau-nernst commented Nov 12, 2024

optimizer CPU offload doesn't work outside of CUDA #958

optimizer CPU offload doesn't work outside of CUDA #958

Comments

bghira commented Sep 26, 2024

gau-nernst commented Sep 26, 2024

bghira commented Sep 26, 2024

msaroufim commented Sep 26, 2024

petrex commented Nov 11, 2024

gau-nernst commented Nov 12, 2024