Megatron GPT model finetuning #6210

MaximumEntropy · 2023-03-15T21:39:54Z

What does this PR do ?

Adds the ability to fine-tune Megatron GPT Models.

Collection: NLP

Changelog

Add specific line by line info of high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

for more information, see https://pre-commit.ci

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

for more information, see https://pre-commit.ci

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

…IDIA/NeMo into sandeepsub/gpt_sft_stable_rebase_main

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

…IDIA/NeMo into sandeepsub/gpt_sft_stable_rebase_main

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

…IDIA/NeMo into sandeepsub/gpt_sft_stable_rebase_main

examples/nlp/language_modeling/tuning/megatron_gpt_sft.py

yidong72

Loos good. left some comments.

yidong72 · 2023-04-04T22:24:15Z

examples/nlp/language_modeling/tuning/megatron_gpt_sft.py

+    return model
+
+
+def load_from_checkpoint_dir(cls, cfg, trainer, modify_confg_fn):


Or we can put this into a utility function. It is used a lot in other places to load from checkpoint dir

examples/nlp/language_modeling/tuning/megatron_gpt_sft.py

yidong72 · 2023-04-04T22:27:48Z

nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_dataset.py

+            text = self.prompt_template.replace('{input}', original_context).replace('{output}', output)
+
+        if self.separate_prompt_and_response_with_newline and self.prompt_template is None:
+            text = context + '\n' + output


should we use user provided separators?

I think the prompt_template should cover this case right?

yidong72 · 2023-04-04T22:28:47Z

nemo/collections/nlp/data/language_modeling/megatron/gpt_sft_dataset.py

+        if self.prompt_template is not None:
+            import ipdb
+
+            ipdb.set_trace()


remove the debug statement?

yidong72 · 2023-04-04T22:33:37Z

scripts/nlp_language_modeling/niv2/preprocess_niv2.py

+from argparse import ArgumentParser
+from multiprocessing import Pool
+
+from sacremoses import MosesDetokenizer


is it part of the plan to release NIV and T0 data preprocessing scripts? We would like others to SFT GPT with the same instruction dataset?

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

…IDIA/NeMo into sandeepsub/gpt_sft_stable_rebase_main

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

ericharper

LGTM. Thanks!

* copy from sft_from_gpt * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Changed tokenization and example * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * maybe remove (got from upstream) * Eval metrics while finetuning Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Add missing args Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Add arg Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Wrap in try except Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Try fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Add separate validation and test batch sizes Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Add assert Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix checkpoint name Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Explict sampling args Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Update t0 script Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Add niv2 script Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Change workers Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix labels Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Ignore download Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Minor fixes Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Add dist opt support Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Minor Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Allow skipping validation Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix tokenization and padding to max batch Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Adds several configurable flags for Megatron GPT models (NVIDIA#5991) * Initial Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Multiple fixes Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add to CI test Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * check position embs for gpt prompt learning Signed-off-by: Adi Renduchintala <adithya.r@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update args Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Disable tts unit test Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Empty Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Update Jenkinsfile Changed optimizer for GPT training from 'fused_adam' to 'distributed_fused_adam'. Signed-off-by: khcs <khcs@users.noreply.github.com> * update config to to use correct key Signed-off-by: ericharper <complex451@gmail.com> * revert Jenkinsfile back to fused_adam Signed-off-by: ericharper <complex451@gmail.com> --------- Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Signed-off-by: Adi Renduchintala <adithya.r@gmail.com> Signed-off-by: khcs <khcs@users.noreply.github.com> Signed-off-by: ericharper <complex451@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <adithya.r@gmail.com> Co-authored-by: khcs <khcs@users.noreply.github.com> Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com> Co-authored-by: ericharper <complex451@gmail.com> * Fast glu activations (NVIDIA#6058) * fast glu activations Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Clean up activation list Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Explicitly check for united embeddings when logging params (NVIDIA#6085) * Explicitly check for united embeddings Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Option for model extracted dir Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Add index mapping dir Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Assistant prompt Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Remove ipdb Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Override dropout Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Change sampler Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Roll back again Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Revert TTS Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Reset TTS Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Revert further Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Revert more to main Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix Test DS Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Address PR comments Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add the option to provide a prompt template via fstrings Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add CI test Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * fix ci test Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix CI test Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Minor Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix CI Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix CI Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix CI Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix workers issue Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix workers Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> --------- Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Signed-off-by: Adi Renduchintala <adithya.r@gmail.com> Signed-off-by: khcs <khcs@users.noreply.github.com> Signed-off-by: ericharper <complex451@gmail.com> Co-authored-by: soares-f <soarescmsa@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <adithya.r@gmail.com> Co-authored-by: khcs <khcs@users.noreply.github.com> Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com> Co-authored-by: ericharper <complex451@gmail.com> Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>

soares-f and others added 30 commits December 19, 2022 06:42

copy from sft_from_gpt

b63dcee

[pre-commit.ci] auto fixes from pre-commit.com hooks

d05d632

for more information, see https://pre-commit.ci

Changed tokenization and example

0cb5907

Merge branch 'GPT_SFT' of https://github.com/soares-f/NeMo into GPT_SFT

e57114c

[pre-commit.ci] auto fixes from pre-commit.com hooks

0785902

for more information, see https://pre-commit.ci

maybe remove (got from upstream)

8f11a14

merge and commit

b2dd38d

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Eval metrics while finetuning

e8f1924

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Add missing args

b49d37b

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Add arg

2de9931

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Fix

7636372

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Fix

6b30660

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Wrap in try except

2e9ab6c

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Try fix

7f5eba1

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Fix

4387574

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Add separate validation and test batch sizes

8bdeff4

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Fix

983f6e3

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Fix

78ab97f

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Fix

6e19953

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Add assert

63c81fe

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Fix

63d6489

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Fix checkpoint name

ed45634

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Explict sampling args

19c1a1c

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Update t0 script

7fa203f

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Add niv2 script

1258436

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Change workers

3651097

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Merge branch 'main' of github.com:NVIDIA/NeMo into sandeepsub/gpt_sft

406f773

Fix labels

102c2a3

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Merge branch 'main' of github.com:NVIDIA/NeMo into sandeepsub/gpt_sft

54b9a77

Ignore download

50f7160

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

pre-commit-ci bot and others added 3 commits April 3, 2023 23:36

[pre-commit.ci] auto fixes from pre-commit.com hooks

7224f67

for more information, see https://pre-commit.ci

Add CI test

9a971d7

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Merge branch 'sandeepsub/gpt_sft_stable_rebase_main' of github.com:NV…

63a187f

…IDIA/NeMo into sandeepsub/gpt_sft_stable_rebase_main

github-actions bot added the CI label Apr 4, 2023

MaximumEntropy added 7 commits April 4, 2023 09:50

fix ci test

3fdf1d4

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Merge branch 'main' into sandeepsub/gpt_sft_stable_rebase_main

86efeee

Fix CI test

2f3efd2

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Merge branch 'sandeepsub/gpt_sft_stable_rebase_main' of github.com:NV…

78513c6

…IDIA/NeMo into sandeepsub/gpt_sft_stable_rebase_main

Merge branch 'main' into sandeepsub/gpt_sft_stable_rebase_main

a06f6cc

Minor

dea00db

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Merge branch 'sandeepsub/gpt_sft_stable_rebase_main' of github.com:NV…

1d96011

…IDIA/NeMo into sandeepsub/gpt_sft_stable_rebase_main

github-advanced-security bot found potential problems Apr 4, 2023

View reviewed changes

examples/nlp/language_modeling/tuning/megatron_gpt_sft.py Fixed Show fixed Hide fixed

examples/nlp/language_modeling/tuning/megatron_gpt_sft.py Fixed Show resolved Hide resolved

MaximumEntropy requested a review from yidong72 April 4, 2023 21:23

yidong72 reviewed Apr 4, 2023

View reviewed changes

MaximumEntropy added 9 commits April 4, 2023 21:25

Fix CI

d6d9837

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Fix CI

7749ede

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Merge branch 'main' into sandeepsub/gpt_sft_stable_rebase_main

7fd0c85

Fix

ad69891

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Merge branch 'sandeepsub/gpt_sft_stable_rebase_main' of github.com:NV…

5df8955

…IDIA/NeMo into sandeepsub/gpt_sft_stable_rebase_main

Merge branch 'main' into sandeepsub/gpt_sft_stable_rebase_main

791402f

Fix CI

6c003b0

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Fix workers issue

d99e276

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Merge branch 'main' into sandeepsub/gpt_sft_stable_rebase_main

9951a19

okuchaiev requested a review from arendu April 6, 2023 21:32

MaximumEntropy added 2 commits April 6, 2023 14:41

Fix workers

6443e69

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Merge branch 'main' into sandeepsub/gpt_sft_stable_rebase_main

7062845

ericharper approved these changes Apr 6, 2023

View reviewed changes

ericharper merged commit 714eded into main Apr 6, 2023

ericharper deleted the sandeepsub/gpt_sft_stable_rebase_main branch April 6, 2023 23:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Megatron GPT model finetuning #6210

Megatron GPT model finetuning #6210

MaximumEntropy commented Mar 15, 2023

yidong72 left a comment

yidong72 Apr 4, 2023

yidong72 Apr 4, 2023

MaximumEntropy Apr 5, 2023

yidong72 Apr 4, 2023

yidong72 Apr 4, 2023

ericharper left a comment

		return model


		def load_from_checkpoint_dir(cls, cfg, trainer, modify_confg_fn):

Megatron GPT model finetuning #6210

Megatron GPT model finetuning #6210

Conversation

MaximumEntropy commented Mar 15, 2023

What does this PR do ?

Changelog

Usage

Before your PR is "Ready for review"

Who can review?

Additional Information

yidong72 left a comment

Choose a reason for hiding this comment

yidong72 Apr 4, 2023

Choose a reason for hiding this comment

yidong72 Apr 4, 2023

Choose a reason for hiding this comment

MaximumEntropy Apr 5, 2023

Choose a reason for hiding this comment

yidong72 Apr 4, 2023

Choose a reason for hiding this comment

yidong72 Apr 4, 2023

Choose a reason for hiding this comment

ericharper left a comment

Choose a reason for hiding this comment