Add support for fairscale sharded ddp #3415

stephenroller · 2021-01-26T17:58:46Z

Let's add support for Fairscale's Sharded DDP.

Right now we hardcode that we must use PyTorch's DDP, but let's generalize this:

ParlAI/parlai/core/torch_generator_agent.py

Lines 550 to 552 in 67433e3

    
           self.model = torch.nn.parallel.DistributedDataParallel( 
        
               self.model, device_ids=device_ids, broadcast_buffers=False 
        
           )

Create a new "distributed" folder inside parlai/nn. Inside parlai/nn/distributed/init.py, create a helper class which finds and instantiates the "right" version of data parallel. Something like

class DistributedFactory:
    @classmethod
    def add_cmdline_args(cls, parser, partial_opt):
        # --distributed-method here, as well as moving options from https://github.com/facebookresearch/ParlAI/blob/67433e376fc361dee5aa045cb6bb2b68d3faa478/parlai/core/params.py#L760-L768

    @classmethod
    def factory(cls, model: torch.nn.Module, opt: Opt)
         # based on opt['distributed_method'], instantiate a DDP or ShardedDDP object

Upgrade TGA/TCA/TRA to use this helper.

It would also be nice to use a @register_distributed pattern (see how we do it for Agents, Teachers, and scripts presently), so that we can add internal-only approaches.

The text was updated successfully, but these errors were encountered:

stephenroller · 2021-07-14T01:48:45Z

Implemented in #3740.

stephenroller added the never-stale label Jan 26, 2021

stephenroller assigned kauterry Jan 26, 2021

stephenroller mentioned this issue Jan 26, 2021

Eliminate init_cuda_buffer and dummy_batch #3412

Closed

stephenroller closed this as completed Jul 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for fairscale sharded ddp #3415

Add support for fairscale sharded ddp #3415

stephenroller commented Jan 26, 2021 •

edited

Loading

stephenroller commented Jul 14, 2021

Add support for fairscale sharded ddp #3415

Add support for fairscale sharded ddp #3415

Comments

stephenroller commented Jan 26, 2021 • edited Loading

stephenroller commented Jul 14, 2021

stephenroller commented Jan 26, 2021 •

edited

Loading