Doubt regarding all_gather_list in case of DDP #230

bhattg · 2022-08-29T18:00:49Z

Hi,

Thanks for the amazing framework. I have a doubt regarding the utility of the all_gather_list function, that gathers the tensors across the GPUs. When we are training in DDP, the gradients are synchronized before the parameter updates, therefore, why is this step needed? Is it just to collate the loss or number of correct predictions or the rank (in evaluation)? If yes, then couldn't one gather all of them after computing the loss, instead of exchanging the question and context representations first and then going forward with it?

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doubt regarding all_gather_list in case of DDP #230

Doubt regarding all_gather_list in case of DDP #230

bhattg commented Aug 29, 2022

Doubt regarding all_gather_list in case of DDP #230

Doubt regarding all_gather_list in case of DDP #230

Comments

bhattg commented Aug 29, 2022