Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mistral 7b conversion script #8052

Merged
merged 8 commits into from
Jan 25, 2024

Conversation

akoumpa
Copy link
Member

@akoumpa akoumpa commented Dec 19, 2023

What does this PR do ?

Adds conversion script for mistral-7b to NeMo.

Collection: [Note which collection this PR will affect]
nlp/language_modeling

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

Jenkins CI

To run Jenkins, a NeMo User with write access must comment jenkins on the PR.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

@akoumpa akoumpa force-pushed the mistral_7b_support/akoumparouli branch 3 times, most recently from 57d578c to 91ac647 Compare December 19, 2023 18:36
@akoumpa akoumpa changed the title Mistral 7b support/akoumparouli Mistral 7b support Dec 19, 2023
@akoumpa akoumpa force-pushed the mistral_7b_support/akoumparouli branch 10 times, most recently from a8516bf to c1b0eb0 Compare December 20, 2023 00:43
@github-actions github-actions bot added the NLP label Dec 20, 2023
@akoumpa akoumpa force-pushed the mistral_7b_support/akoumparouli branch from 4a4e725 to 29c543a Compare December 26, 2023 23:10
@janekl
Copy link
Collaborator

janekl commented Dec 27, 2023

Could you please keep the conversion script consistent with other ones? I specifically mean:

In particular, at best we keep these aligned across scripts:

  • defining a new convert_state_dict function
  • config handling (override_model_dict etc.)
  • script parameters.

Llama2 conversion script has a bit different flavour, but I believe we should follow the "majority" style for new scripts (and align Llama2 at some point).

for key in keys:
checkpoint['state_dict'][key.replace('model.', 'model.module.', 1)] = checkpoint['state_dict'].pop(key)

model = load_model(MegatronGPTModel, checkpoint, strict=False, trainer=trainer)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function load_model is a bit convolved, can you avoid using it? I recommend instantiating model with sth like

model = MegatronGPTModel(cfg, trainer)
missing_keys, unexpected_keys = model.load_state_dict(hf_state_dict, strict=False)

see also #7977.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if the model.load_state_dict with strict=False verifies if any weight were loaded at all? I'm not implying that load_model performs this check, rather want to understand if this is something we care doing at this stage of the script.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That PR #7977 has been merged. Could you please replace load_model with load_state_dict_helper? The former is really cluttered and unnecessarily saves two tokenizers -- the one stored in tokenizer.tokenizer_model is not needed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And as for strict=False there are checks for missing_keys and unexpected_keys lists in load_state_dict_helper that will complain if any expected weights are not loaded, or are superfluous.

@janekl
Copy link
Collaborator

janekl commented Dec 27, 2023

As for architecture, Mistral seems to heavily borrow from Llama2.

Would it be possible to just extend Llama2 conversion script to cover HF Mistral checkpoint?

Copy link
Contributor

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

@github-actions github-actions bot added the stale label Jan 11, 2024
@akoumpa
Copy link
Member Author

akoumpa commented Jan 11, 2024

Hi @janekl thanks for your comments! I'm leaning towards merging this as is and once we settle on how we want to refactor the import scripts do the actual refactor. I think you are right it shares a lot with the Llama script, and once the PRs you mention have been approved & merged we can follow the appropriate guidelines here as well.

@akoumpa akoumpa removed the stale label Jan 11, 2024
@akoumpa akoumpa force-pushed the mistral_7b_support/akoumparouli branch 2 times, most recently from 4dc48ae to fab45fc Compare January 17, 2024 22:57
@akoumpa akoumpa marked this pull request as ready for review January 18, 2024 17:57
From mistral checkpoint not hf.
Pending: support for block-diagonal attention mask.

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
akoumpa and others added 4 commits January 18, 2024 10:00
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
for more information, see https://pre-commit.ci

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
@akoumpa akoumpa force-pushed the mistral_7b_support/akoumparouli branch from fab45fc to 1a0e698 Compare January 18, 2024 18:01
@akoumpa akoumpa changed the title Mistral 7b support Mistral 7b conversion script Jan 18, 2024
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
@akoumpa akoumpa force-pushed the mistral_7b_support/akoumparouli branch from 1a0e698 to 763fbb2 Compare January 18, 2024 23:30
@ericharper
Copy link
Collaborator

jenkins

@ericharper
Copy link
Collaborator

jenkins

@janekl
Copy link
Collaborator

janekl commented Jan 25, 2024

I think that adding a Jenkins conversion test for a small Mistral model would be really desirable.

Examples are here

NeMo/Jenkinsfile

Lines 391 to 419 in f10d694

stage('Llama') {
steps {
sh 'CUDA_VISIBLE_DEVICES=0 python scripts/nlp_language_modeling/convert_hf_llama_to_nemo.py \
--in-file=/home/TestData/nlp/megatron_llama/llama-ci-hf \
--out-file=/home/TestData/nlp/megatron_llama/ci.nemo \
--precision=16'
sh 'rm -f /home/TestData/nlp/megatron_llama/ci.nemo'
}
}
stage('StarCoder') {
steps {
sh 'python scripts/nlp_language_modeling/convert_starcoder_hf_to_nemo.py \
--config examples/nlp/language_modeling/conf/megatron_gpt_config.yaml \
--input /home/TestData/nlp/megatron_gpt/starcoder-ci-hf \
--output /home/TestData/nlp/megatron_gpt/starcoder-ci-hf'
sh 'rm -f /home/TestData/nlp/megatron_gpt/starcoder-ci-hf/megatron_starcoder_tp1_pp1.nemo'
}
}
stage('Falcon') {
steps {
sh 'python scripts/nlp_language_modeling/convert_hf_falcon_to_nemo.py \
--config examples/nlp/language_modeling/conf/megatron_falcon_config.yaml \
--input /home/TestData/nlp/megatron_gpt/falcon-ci-hf \
--output /home/TestData/nlp/megatron_gpt/falcon-ci-hf/falcon_ci.nemo'
sh 'rm -f /home/TestData/nlp/megatron_gpt/falcon-ci-hf/falcon_ci.nemo'
}
}
}
}

cc @ericharper

@janekl
Copy link
Collaborator

janekl commented Jan 25, 2024

Hi @janekl thanks for your comments! I'm leaning towards merging this as is and once we settle on how we want to refactor the import scripts do the actual refactor. I think you are right it shares a lot with the Llama script, and once the PRs you mention have been approved & merged we can follow the appropriate guidelines here as well.

Sorry for replying late. That's fine, we can polish it later, there is some consistency PR open here #8192.

In any case, please see my comments in two other threads. We really need to test conversion in analogy to other scripts. Otherwise we won't have any confidence in refactoring work later. I think @ericharper should support this extra effort.

Copy link
Collaborator

@ericharper ericharper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@ericharper ericharper merged commit 9944304 into NVIDIA:main Jan 25, 2024
11 checks passed
yaoyu-33 pushed a commit that referenced this pull request Jan 31, 2024
* Import script for mistral-7b.

From mistral checkpoint not hf.
Pending: support for block-diagonal attention mask.

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* add window_size to nemo_config.

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Switch from Mistral checkpoint to HF-Mistral.

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Force lowercase when checking for normalization type.

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* NeMo-Mistral-7B to HF-Mistral-7B.

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
stevehuang52 pushed a commit that referenced this pull request Jan 31, 2024
* Import script for mistral-7b.

From mistral checkpoint not hf.
Pending: support for block-diagonal attention mask.

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* add window_size to nemo_config.

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Switch from Mistral checkpoint to HF-Mistral.

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Force lowercase when checking for normalization type.

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* NeMo-Mistral-7B to HF-Mistral-7B.

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
ssh-meister pushed a commit to ssh-meister/NeMo that referenced this pull request Feb 15, 2024
* Import script for mistral-7b.

From mistral checkpoint not hf.
Pending: support for block-diagonal attention mask.

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* add window_size to nemo_config.

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Switch from Mistral checkpoint to HF-Mistral.

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Force lowercase when checking for normalization type.

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* NeMo-Mistral-7B to HF-Mistral-7B.

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Sasha Meister <ameister@nvidia.com>
layalir pushed a commit to layalir/NeMo that referenced this pull request Feb 29, 2024
layalir pushed a commit to layalir/NeMo that referenced this pull request Feb 29, 2024
* Import script for mistral-7b.

From mistral checkpoint not hf.
Pending: support for block-diagonal attention mask.

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* add window_size to nemo_config.

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Switch from Mistral checkpoint to HF-Mistral.

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Force lowercase when checking for normalization type.

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* NeMo-Mistral-7B to HF-Mistral-7B.

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
pablo-garay pushed a commit that referenced this pull request Mar 19, 2024
* Import script for mistral-7b.

From mistral checkpoint not hf.
Pending: support for block-diagonal attention mask.

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* add window_size to nemo_config.

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Switch from Mistral checkpoint to HF-Mistral.

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Force lowercase when checking for normalization type.

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* NeMo-Mistral-7B to HF-Mistral-7B.

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Pablo Garay <pagaray@nvidia.com>
rohitrango pushed a commit to rohitrango/NeMo that referenced this pull request Jun 25, 2024
* Import script for mistral-7b.

From mistral checkpoint not hf.
Pending: support for block-diagonal attention mask.

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* add window_size to nemo_config.

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Switch from Mistral checkpoint to HF-Mistral.

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Force lowercase when checking for normalization type.

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* NeMo-Mistral-7B to HF-Mistral-7B.

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants