-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Akoumparouli/mcore microbatch calculator fix (#10780)
* move tests/lightning/{,_}io Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add microbatch calculator context manager Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * use microbatch calculator context manager Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add on_load_checkpoint test to ValidateModelRestoration; use ctx manager to reconfigure microbatch calculator; update save/restore path; add cleanup step at the end Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * remove unused var Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
- Loading branch information
Showing
8 changed files
with
246 additions
and
163 deletions.
There are no files selected for viewing
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
import contextlib | ||
|
||
|
||
# @akoumparouli: use a context manager that saves/restores gbs/mbs when using | ||
# reconfigure_num_microbatches_calculator to avoid interference between tests. | ||
@contextlib.contextmanager | ||
def reconfigure_num_microbatches_calculator_manager(*args, **kwargs): | ||
import megatron.core.num_microbatches_calculator as mb_calc | ||
|
||
# Store current mbs, gbs values | ||
if not mb_calc._GLOBAL_NUM_MICROBATCHES_CALCULATOR is None: | ||
_mbs = mb_calc.get_micro_batch_size() | ||
_gbs = mb_calc.get_current_global_batch_size() | ||
|
||
# use user's settings | ||
mb_calc.reconfigure_num_microbatches_calculator(*args, **kwargs) | ||
else: | ||
_mbs, _gbs = 1, 1 | ||
|
||
try: | ||
# run user's code | ||
yield | ||
# @akoumparouli: no catch | ||
finally: | ||
# restore old mbs, gbs | ||
if not mb_calc._GLOBAL_NUM_MICROBATCHES_CALCULATOR is None: | ||
mb_calc.reconfigure_num_microbatches_calculator(0, None, _gbs, _mbs, data_parallel_size=1) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.