Is your feature request related to a problem? Please describe.
The issue is related to #5620 and #6011. When having a deespeed model initialised for ZeRO-3 inference, with a DeepSpeedZeRoOffload optimizer for example, the model cannot be moved to the CPU either by using the torch.nn.module.to() functionality or with the new offload_states API.
Describe the solution you'd like
Either extend #6011 to support offload of a model configured for ZeRO-3 inference or a new API that supports this.
Thanks