Hi,
I'm looking to use Zero-offload feature to train 10B param model on a single GPU. I have been able to train models using Zero-2 but when I enable cpu-optimizer flag the job fails with the following error.
"ModuleNotFoundError: No module named 'deepspeed.ops.adam.cpu_adam_op'"
Not sure why this is happening, although I do see there's a recent change to disable default installation https://github.com/microsoft/DeepSpeed/pull/450/files/19c51251f1f6d32099fe321911316eeacaa9ed26
Is there something user's need to enable the installation of this?
Appreciate the help.