Fix for megatron_amp_O2 + MegatronGPTSFTModel #7910

vysarge · 2023-11-18T03:01:46Z

Currently MegatronGPTSFTModel inherits sharded_state_dict from NLPAdapterModelMixin, but when restoring weights from the distributed checkpoint it's expected that MegatronGPTModel's version of sharded_state_dict has been run, resulting in:

  File "/root/llm/NeMo/examples/nlp/language_modeling/tuning/megatron_gpt_peft_tuning.py", line 63, in main
    model = MegatronGPTSFTModel.restore_from(cfg.model.restore_from_path, model_cfg, trainer=trainer)
  File "/root/llm/NeMo/nemo/collections/nlp/models/nlp_model.py", line 465, in restore_from
    return super().restore_from(
  File "/root/llm/NeMo/nemo/core/classes/modelPT.py", line 442, in restore_from
    instance = cls._save_restore_connector.restore_from(
  File "/root/llm/NeMo/nemo/collections/nlp/parts/nlp_overrides.py", line 743, in restore_from
    checkpoint = dist_checkpointing.load(
  File "/opt/Megatron-LM/megatron/core/dist_checkpointing/serialization.py", line 72, in load
    sharded_objects, sharded_state_dict = load_sharded_objects(sharded_state_dict, checkpoint_dir)
  File "/opt/Megatron-LM/megatron/core/dist_checkpointing/serialization.py", line 123, in load_sharded_objects
    return dict_list_map_inplace(load_sharded_object, sharded_objects), sharded_state_dict
  File "/opt/Megatron-LM/megatron/core/dist_checkpointing/dict_utils.py", line 155, in dict_list_map_inplace
    x[k] = dict_list_map_inplace(f, v)
  File "/opt/Megatron-LM/megatron/core/dist_checkpointing/dict_utils.py", line 155, in dict_list_map_inplace
    x[k] = dict_list_map_inplace(f, v)
  File "/opt/Megatron-LM/megatron/core/dist_checkpointing/dict_utils.py", line 159, in dict_list_map_inplace
    return f(x)
  File "/opt/Megatron-LM/megatron/core/dist_checkpointing/serialization.py", line 120, in load_sharded_object
    loaded_obj = torch.load(load_path)
  File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 988, in load
    with _open_file_like(f, 'rb') as opened_file:
  File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 437, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 418, in __init__
    super().__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: '/pretrained_models/Llama-2-70b-bf16-mcore/model_weights/model.module.decoder.layers.self_attention.linear_proj._extra_state/shard_0_80.pt'

This code change overrides sharded_state_dict in MegatronGPTSFTModel to call that method instead. Changes may also be needed to reconcile MegatronT5SFTModel.

Signed-off-by: Valerie Sarge <vsarge@nvidia.com>

github-actions · 2023-12-03T01:46:46Z

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

vysarge · 2023-12-05T01:21:17Z

Closing in favor of more-compatible change on the NLPAdapterModelMixin side in #7971.

Override NLPAdapterModelMixin sharded_state_dict for MegatronGPTSFTModel

4575f67

Signed-off-by: Valerie Sarge <vsarge@nvidia.com>

github-actions bot added the NLP label Nov 18, 2023

github-actions bot added the stale label Dec 3, 2023

vysarge closed this Dec 5, 2023

vysarge deleted the vsarge/sft_O2 branch March 12, 2024 22:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for megatron_amp_O2 + MegatronGPTSFTModel #7910

Fix for megatron_amp_O2 + MegatronGPTSFTModel #7910

vysarge commented Nov 18, 2023

github-actions bot commented Dec 3, 2023

vysarge commented Dec 5, 2023

Fix for megatron_amp_O2 + MegatronGPTSFTModel #7910

Fix for megatron_amp_O2 + MegatronGPTSFTModel #7910

Conversation

vysarge commented Nov 18, 2023

github-actions bot commented Dec 3, 2023

vysarge commented Dec 5, 2023