OOM issue with self._prevent_trainer_and_dataloaders_deepcopy()
#12516
Unanswered
MGheini
asked this question in
code help: RL / MetaLearning
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I'm working on a meta-learning-based code where I'm implementing the maml algorithm. I understand you might not necessarily be fully familiar with the algorithm. But I think for my purposes it's enough to know that each round of optimization consists of an inner loop and the outer loop, and in the inner loop I need to copy the model. I expected these two snippets below to behave in the same way:
vs.
However, the first one runs fine. But the second one results in CUDA OOM.
I print out the
torch.cuda.memory_allocated()
at the beginning of the outer loop at each round, and while it seems pretty stable in the first case, it keeps increasing with._prevent_trainer_and_dataloaders_deepcopy()
. I have not been able to pin down the root of the problem. Can you please give me some pointers?Thanks a lot!
Beta Was this translation helpful? Give feedback.
All reactions