Iterations completing out of order (possibly) in ddp with torchelastic? #3403
Labels
bug
Something isn't working
distributed
Generic distributed-related topic
help wanted
Open to be worked on
waiting on author
Waiting on user action, correction, or update
Milestone
This might be bug or might be expected. I'm running a pytorchlightning with torchelastic and ddp. I'm noticing the iterations are being dumped out of order (below iteration 632 preceeds iteration 574). This could be due to delays in parallel writing... or perhaps just issues in logging. Is this expected behavior?
Running with 6 gpus in ddp.
The text was updated successfully, but these errors were encountered: