-
Notifications
You must be signed in to change notification settings - Fork 939
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[fix] multinode jobs after recent lightning update (#921)
Summary: After Sasha's update of pytorch lightning on MMF master, it broke MMF codebase for multinode job. The root problem to PR Lightning-AI/pytorch-lightning#6802. The assumption that SLURM_PROCID points to worker rank is wrong as some frameworks launch their own processes later using multiprocessing spawn and have ntasks_per_node=1 set. This means that first node will have procid = 0, second node will have procid = 1 set and so on. Now, since this is used in prepare_data masking in LightningDataModule, this leads to it running on all workers on first node and thus causing inconsistencies. Now, this leads to prepare_data being called on all workers on first node instead of rank zero. Specifically, the barrier call in prepare_data, is called on first node workers but not on others leading to block later on. This PR fixes this by ensuring on our side that we only call prepare_data on rank zero. Furthermore, this can cause further confusion, we remove sync barrier calls from download as well. Users are now supposed to handle is_master checks on their own. Pull Request resolved: #921 Test Plan: Tested in multinode settings. Reviewed By: vedanuj Differential Revision: D28156855 Pulled By: apsdehal fbshipit-source-id: 4e0dd5317e15153f558d34c6951a89299602454f
- Loading branch information
1 parent
1cc6b08
commit 8a01258
Showing
4 changed files
with
33 additions
and
15 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters