This repository was archived by the owner on Nov 3, 2023. It is now read-only.
Eliminate init_cuda_buffer
and dummy_batch
#3412
Labels
init_cuda_buffer
and dummy_batch
#3412
[Let's put put this off until we implement sharded_ddp and others, unsure how they're going to work together yet, #3415 ]
In early implementations of distributed training, it was necessary to do a "dummy pass" if a worker OOMed, in order to ensure that it correctly synced with others. You can see this code here:
ParlAI/parlai/core/torch_generator_agent.py
Lines 751 to 773 in 67433e3
ParlAI/parlai/core/torch_generator_agent.py
Lines 625 to 648 in 67433e3
This is slightly annoying as a user, as any time you add a custom field, you will need to implement
_dummy_batch
yourself.DDP allows for a manual sync (see DDP.join) which has the same effect as this method. We should switch to this, and then remove all instances of
_init_cuda_buffer
and_dummy_batch
from all our codebases.The text was updated successfully, but these errors were encountered: