OOM issue with `self._prevent_trainer_and_dataloaders_deepcopy()` #12516

MGheini · 2022-03-30T00:49:58Z

MGheini
Mar 30, 2022

Hi,

I'm working on a meta-learning-based code where I'm implementing the maml algorithm. I understand you might not necessarily be fully familiar with the algorithm. But I think for my purposes it's enough to know that each round of optimization consists of an inner loop and the outer loop, and in the inner loop I need to copy the model. I expected these two snippets below to behave in the same way:

model_copy = deepcopy(self)

vs.

with self._prevent_trainer_and_dataloaders_deepcopy():
     model_copy = deepcopy(self)

However, the first one runs fine. But the second one results in CUDA OOM.

I print out the torch.cuda.memory_allocated() at the beginning of the outer loop at each round, and while it seems pretty stable in the first case, it keeps increasing with ._prevent_trainer_and_dataloaders_deepcopy(). I have not been able to pin down the root of the problem. Can you please give me some pointers?

Thanks a lot!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OOM issue with `self._prevent_trainer_and_dataloaders_deepcopy()` #12516

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

OOM issue with self._prevent_trainer_and_dataloaders_deepcopy() #12516

MGheini Mar 30, 2022

Replies: 0 comments

OOM issue with `self._prevent_trainer_and_dataloaders_deepcopy()` #12516

MGheini
Mar 30, 2022