Finetune GPT2 model with TP=2 #8649

AnirudhVIyer · 2024-03-13T21:59:19Z

AnirudhVIyer
Mar 13, 2024

I am trying to finetune a GPT model with TP=2 AND SP=True.
I have converted the TP=1 .nemo file to TP=2 .nemo file. However there is a issue when I run the file tuning script. There is an error when I try to load the nemo model.

File "/scratch/avi2011/nemo_proj/fine_tuning/megatron_gpt_finetuning.py", line 68, in main model = MegatronGPTSFTModel.restore_from(cfg.model.restore_from_path, model_cfg, trainer=trainer) File "/usr/local/lib/python3.10/dist-packages/nemo/collections/nlp/models/nlp_model.py", line 465, in restore_from return super().restore_from( File "/usr/local/lib/python3.10/dist-packages/nemo/core/classes/modelPT.py", line 442, in restore_from instance = cls._save_restore_connector.restore_from( File "/usr/local/lib/python3.10/dist-packages/nemo/collections/nlp/parts/nlp_overrides.py", line 700, in restore_from loaded_params = super().load_config_and_state_dict( File "/usr/local/lib/python3.10/dist-packages/nemo/core/connectors/save_restore_connector.py", line 169, in load_config_and_state_dict state_dict = self._load_state_dict_from_disk(model_weights, map_location=map_location) File "/usr/local/lib/python3.10/dist-packages/nemo/collections/nlp/parts/nlp_overrides.py", line 663, in _load_state_dict_from_disk raise ValueError(f'Expected {model_weights} to be a file or directory.') ValueError: Expected /state/partition1/job-44049844/tmp0m2inrp1/model_weights.ckpt to be a file or directory.

any suggestions?

jacklanda · 2024-05-18T05:54:43Z

jacklanda
May 18, 2024

Please try to convert a TP=1 .nemo file and run it with the same training process.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finetune GPT2 model with TP=2 #8649

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Finetune GPT2 model with TP=2 #8649

AnirudhVIyer Mar 13, 2024

Replies: 1 comment

jacklanda May 18, 2024

AnirudhVIyer
Mar 13, 2024

jacklanda
May 18, 2024