[REQUEST] Moving a trainable model with an optimiser between GPU and CPU

**Is your feature request related to a problem? Please describe.**
When a deespeed model is initialised with an optimiser, the `torch.nn.module.to()` functionality for moving the model between devices breaks as the optimiser holds references to the model parameters and thus GPU memory is not cleared when trying to move it to CPU for example. 

**Describe the solution you'd like**
Functionality that is similar to `torch.nn.module.to()` for moving both model and optimiser between devices which de-allocates the previously occupied memory.

**Describe alternatives you've considered**
The alternative is to destroy the model instance and recreate it from a checkpoint but this has a much higher time cost. 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[REQUEST] Moving a trainable model with an optimiser between GPU and CPU #5620

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[REQUEST] Moving a trainable model with an optimiser between GPU and CPU #5620

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions