Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First draft of mcore bert model in NeMo #7814

Merged
merged 73 commits into from
Dec 15, 2023
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
23cd37c
First draft of mcore bert model in NeMo
shanmugamr1992 Oct 26, 2023
6d578ed
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 26, 2023
e2b7141
Merge branch 'main' into mcore_bert
ericharper Oct 27, 2023
16e2d4c
First draft of mcore bert model in Nemo
shanmugamr1992 Nov 3, 2023
505cefe
Merge branch 'main' into mcore_bert
shanmugamr1992 Nov 3, 2023
48ca125
Addressed eric's comments
shanmugamr1992 Nov 7, 2023
9b19dce
Merge branch 'main' into mcore_bert
shanmugamr1992 Nov 7, 2023
ebfa15a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 7, 2023
11cadbe
Addressed eric's comments
shanmugamr1992 Nov 7, 2023
c695864
Merge branch 'mcore_bert' of github.com:NVIDIA/NeMo into mcore_bert
shanmugamr1992 Nov 7, 2023
7ea7c5e
Added a ci test
shanmugamr1992 Nov 7, 2023
ec9fb02
Merge branch 'main' into mcore_bert
shanmugamr1992 Nov 7, 2023
28f85fb
Merge branch 'main' into mcore_bert
shanmugamr1992 Nov 8, 2023
6a10f32
Removing unused imports
Nov 8, 2023
51e296e
Addressing eric's comments
Nov 9, 2023
6883557
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 9, 2023
046d73a
Merge branch 'main' into mcore_bert
shanmugamr1992 Nov 9, 2023
e1b3643
Addressing eric's comments
Nov 9, 2023
de459be
Merge branch 'main' into mcore_bert
shanmugamr1992 Nov 14, 2023
086cd43
Merge branch 'main' into mcore_bert
shanmugamr1992 Nov 14, 2023
146c800
Merge branch 'main' into mcore_bert
shanmugamr1992 Nov 15, 2023
1f5401c
Merge branch 'main' into mcore_bert
shanmugamr1992 Nov 16, 2023
e9a4219
Resolving merge conflicts
Nov 20, 2023
f035777
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 20, 2023
33e57c8
Update Jenkinsfile
shanmugamr1992 Nov 20, 2023
f663c19
Changing mcore version
Nov 20, 2023
3ade257
Fixing tests
Nov 22, 2023
e5f5057
Merge branch 'main' into mcore_bert
shanmugamr1992 Nov 22, 2023
ca42927
Merge branch 'main' into mcore_bert
shanmugamr1992 Nov 27, 2023
c6912c4
Checking mcore and bert installation
Nov 27, 2023
46273c9
Adding latest megatron lm commit
Nov 27, 2023
876aa2f
Adding latest megatron lm commit
Nov 27, 2023
5215048
Adding latest megatron lm commit
Nov 27, 2023
cc99736
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 27, 2023
affca9c
Update Jenkinsfile
shanmugamr1992 Nov 27, 2023
a89ed87
Merge branch 'main' into mcore_bert
shanmugamr1992 Nov 28, 2023
45c2b72
Seeing failure for the import
Nov 28, 2023
2bb0257
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 28, 2023
e4ffc73
Fixing tests
Nov 30, 2023
4571c02
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 30, 2023
fcd17f4
Adding bert model imports to mcore model part
Dec 1, 2023
3ddb21b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 1, 2023
cf71579
Resolve issues to the bert model
Dec 2, 2023
6f2cade
Merge branch 'main' into mcore_bert
shanmugamr1992 Dec 4, 2023
2904e30
Dummy change
Dec 4, 2023
25a7443
Testing bert stuff alone
Dec 5, 2023
fee0a0e
Testing bert stuff alone
Dec 5, 2023
7fba193
Testing bert stuff alone
Dec 5, 2023
edefe03
Merge branch 'main' into mcore_bert
shanmugamr1992 Dec 5, 2023
1d6e085
Adding bert model imports to mcore model part
Dec 8, 2023
a59cc7f
Merge branch 'main' into mcore_bert
shanmugamr1992 Dec 8, 2023
3a18b11
Adding bert model imports to mcore model part
Dec 11, 2023
0b3ae33
Merge branch 'main' into mcore_bert
shanmugamr1992 Dec 11, 2023
00cc6c3
Fixing bert code
Dec 11, 2023
4667e93
Fixing bert code
Dec 12, 2023
065c543
Merge branch 'main' into mcore_bert
shanmugamr1992 Dec 12, 2023
9ddbb96
Fixing bert code
Dec 12, 2023
34767b0
Fixing bert code
Dec 12, 2023
88e074f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 12, 2023
d09629a
Merge branch 'main' into mcore_bert
shanmugamr1992 Dec 12, 2023
e67a4e3
fix bert
Dec 12, 2023
97ab138
fix bert
Dec 12, 2023
93f3dc3
fix bert
Dec 12, 2023
1f3a684
fix bert
Dec 12, 2023
db400e9
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 12, 2023
1e18ae3
Merge branch 'main' into mcore_bert
shanmugamr1992 Dec 12, 2023
37fe90f
fix bert
Dec 12, 2023
eb29dcc
fix bert
Dec 12, 2023
40ceebf
fix bert
Dec 13, 2023
85be8e2
Merge branch 'main' into mcore_bert
shanmugamr1992 Dec 14, 2023
1a78df5
fix bert
Dec 14, 2023
b5ea021
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 14, 2023
2ab126e
Revert oriignal jenksins file
Dec 14, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 83 additions & 11 deletions Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -58,16 +58,16 @@ pipeline {
}

// megatron-core 0.3 has been pinned in the requirements, this should not be needed on r1.21.0
// stage('Megatron Core installation') {
// steps {
// // pinned MCore https://github.com/NVIDIA/Megatron-LM/commit/ab0336a5c8eab77aa74ae604ba1e73decbf6d560
// // ToT for 23.08 branch
// sh 'git clone https://github.com/NVIDIA/Megatron-LM.git && \
// cd Megatron-LM && \
// git checkout ab0336a5c8eab77aa74ae604ba1e73decbf6d560 && \
// pip install -e .'
// }
// }
stage('Megatron Core installation') {
steps {
// pinned MCore https://github.com/NVIDIA/Megatron-LM/commit/ab0336a5c8eab77aa74ae604ba1e73decbf6d560
// ToT for 23.08 branch
shanmugamr1992 marked this conversation as resolved.
Show resolved Hide resolved
sh 'git clone https://github.com/NVIDIA/Megatron-LM.git && \
cd Megatron-LM && \
git checkout ad4c68568d5d6f5a723652db003897e3c2b62545 && \
pip install -e .'
}
}


stage('PyTorch Lightning version') {
Expand Down Expand Up @@ -2895,6 +2895,78 @@ pipeline {
sh "rm -rf examples/nlp/language_modeling/bert_index_mappings"
}
}
stage('L2: Megatron Core Bert Pretraining and Resume Training') {
when {
anyOf {
branch 'main'
changeRequest target: 'main'
}
}
failFast true
steps {
sh "python examples/nlp/language_modeling/megatron_bert_pretraining.py \
trainer.devices=2 \
trainer.accelerator=gpu \
trainer.log_every_n_steps=1 \
trainer.val_check_interval=10 \
trainer.limit_val_batches=2 \
trainer.accumulate_grad_batches=1 \
trainer.max_steps=10 \
trainer.precision=16 \
trainer.gradient_clip_val=1.0 \
exp_manager.exp_dir=examples/nlp/language_modeling/bert_pretrain_results \
model.tensor_model_parallel_size=2 \
model.optim.name=fused_adam \
model.optim.lr=2e-4 \
model.sequence_parallel=True \
model.optim.sched.warmup_steps=2 \
model.optim.sched.constant_steps=2 \
model.optim.sched.min_lr=8e-5 \
model.max_position_embeddings=128 \
model.encoder_seq_length=128 \
model.data.seq_length=128 \
model.tokenizer.vocab_file=/home/TestData/nlp/megatron_bert/data/bert/vocab.txt \
model.num_layers=8 \
model.hidden_size=256 \
model.num_attention_heads=8 \
model.activations_checkpoint_method='block' \
model.activations_checkpoint_num_layers=1 \
model.data.data_prefix=[.5,/home/TestData/nlp/megatron_bert/data/bert/simple_wiki_bert_preproc_text_sentence,.5,/home/TestData/nlp/megatron_bert/data/bert/simple_wiki_bert_preproc_text_sentence] \
model.data.index_mapping_dir=examples/nlp/language_modeling/bert_index_mappings"
sh "python examples/nlp/language_modeling/megatron_bert_pretraining.py \
trainer.devices=2 \
trainer.accelerator=gpu \
trainer.log_every_n_steps=1 \
trainer.val_check_interval=10 \
trainer.limit_val_batches=2 \
trainer.accumulate_grad_batches=1 \
trainer.max_steps=20 \
trainer.precision=16 \
trainer.gradient_clip_val=1.0 \
exp_manager.exp_dir=examples/nlp/language_modeling/bert_pretrain_results \
exp_manager.resume_if_exists=True \
model.mcore_bert=True \
model.tensor_model_parallel_size=2 \
model.optim.name=fused_adam \
model.optim.lr=2e-4 \
model.optim.sched.warmup_steps=2 \
model.optim.sched.constant_steps=2 \
model.optim.sched.min_lr=8e-5 \
model.max_position_embeddings=128 \
model.encoder_seq_length=128 \
model.data.seq_length=128 \
model.tokenizer.vocab_file=/home/TestData/nlp/megatron_bert/data/bert/vocab.txt \
model.num_layers=8 \
model.hidden_size=256 \
model.num_attention_heads=8 \
model.activations_checkpoint_method='block' \
model.activations_checkpoint_num_layers=1 \
model.data.data_prefix=[.5,/home/TestData/nlp/megatron_bert/data/bert/simple_wiki_bert_preproc_text_sentence,.5,/home/TestData/nlp/megatron_bert/data/bert/simple_wiki_bert_preproc_text_sentence] \
model.data.index_mapping_dir=examples/nlp/language_modeling/bert_index_mappings"
sh "rm -rf examples/nlp/language_modeling/bert_pretrain_results"
sh "rm -rf examples/nlp/language_modeling/bert_index_mappings"
}
}
stage('L2: Megatron RETRO Pretraining and Resume Training') {
when {
anyOf {
Expand Down Expand Up @@ -4891,4 +4963,4 @@ assert_frame_equal(training_curve, gt_curve, rtol=1e-3, atol=1e-3)"'''
cleanWs()
}
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ exp_manager:

model:
# model parallelism
mcore_bert: False
micro_batch_size: 4
global_batch_size: 8
tensor_model_parallel_size: 1
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@
from nemo.collections.nlp.modules.common.tokenizer_utils import get_nmt_tokenizer
from nemo.collections.nlp.parts import utils_funcs
from nemo.collections.nlp.parts.nlp_overrides import NEMO_MEGATRON_MODEL_PARALLEL_APPSTATE_OVERRIDE, GradScaler
from nemo.collections.nlp.parts.utils_funcs import activation_to_func
from nemo.core.optim import MainParamsOptimizerWrapper, prepare_lr_scheduler
from nemo.utils import AppState, logging
from nemo.utils.get_rank import is_global_rank_zero
Expand All @@ -54,6 +55,8 @@

try:
from megatron.core import ModelParallelConfig, parallel_state
from megatron.core.transformer.transformer_config import TransformerConfig
from megatron.core.utils import init_method_normal, scaled_init_method_normal

HAVE_MEGATRON_CORE = True

Expand Down Expand Up @@ -293,6 +296,99 @@ def on_validation_end(self) -> None:
if self.gc_interval > 0 and self.gc_in_validation:
gc.collect()

def build_transformer_config(self, model_specific_config=None) -> TransformerConfig:
shanmugamr1992 marked this conversation as resolved.
Show resolved Hide resolved
""" Builds the megatron core transformer config for the model.
For attributes in the nemo model config that are the same
as the megatron core TransformerConfig, we will use the value from the nemo model config.
For attributes in TransformerConfig that are not in the nemo model config, we add custom logic.
"""

# create a dictionary copy of the model config
cfg = OmegaConf.to_container(self.cfg, resolve=True)

# create a dict to store the transformer config arguments
transformer_config_dict = {}

# get model parallel configs from the base class
model_parallel_config = self.build_model_parallel_config()

add_bias_linear = self.cfg.get('bias', True)

activation = self.cfg.get('activation', 'gelu')
# TODO: need to check which activation functions are supported in mcore
activation_func = activation_to_func(activation)

normalization = self.cfg.get('normalization', 'layernorm')

init_method_std = self.cfg.get('init_method_std', 0.02)
# default used in mcore
init_method = init_method_normal(init_method_std)

output_layer_init_method = init_method
num_layers = self.cfg.get('num_layers', 1)
use_scaled_init_method = self.cfg.get('use_scaled_init_method', True)
if use_scaled_init_method:
output_layer_init_method = scaled_init_method_normal(init_method_std, num_layers=num_layers)

attention_softmax_in_fp32 = False # not currently used in NeMo unless apply_query_key_layer_scaling is True
apply_query_key_layer_scaling = self.cfg.get('apply_query_key_layer_scaling', False)
if apply_query_key_layer_scaling:
attention_softmax_in_fp32 = True

bias_activation_fusion = self.cfg.get('bias_activation_fusion', True)
bias_gelu_fusion = True if bias_activation_fusion else False

bias_dropout_fusion = self.cfg.get('bias_dropout_add_fusion', True)

# TODO: need to check if recompute APIs are matching up properly
recompute_granularity = self.cfg.get('activations_checkpoint_granularity', None)
recompute_method = self.cfg.get('activations_checkpoint_method', None)
recompute_num_layers = self.cfg.get('activations_checkpoint_num_layers', None)

# any configs that are not in the nemo model config will be added here
config_mapping = {
'apply_residual_connection_post_layernorm': False, # we don't use this in NeMo
'layernorm_zero_centered_gamma': False,
'add_bias_linear': add_bias_linear,
'gated_linear_unit': False,
'activation_func': activation_func,
'normalization': normalization,
'init_method': init_method,
'output_layer_init_method': output_layer_init_method,
'attention_softmax_in_fp32': attention_softmax_in_fp32,
'bias_gelu_fusion': bias_gelu_fusion,
'bias_dropout_fusion': bias_dropout_fusion,
'recompute_granularity': recompute_granularity,
'recompute_method': recompute_method,
'recompute_num_layers': recompute_num_layers,
'distribute_saved_activations': False, # not currently used in NeMo
'fp8': None,
}

# populate the transformer config dict
for field in fields(TransformerConfig):
# model specif mapping has highest priority
if field.name in model_specific_config:
transformer_config_dict[field.name] = config_mapping[field.name]
# config mapping has second highest priority
elif field.name in config_mapping:
transformer_config_dict[field.name] = config_mapping[field.name]
# then config
elif field.name in cfg:
transformer_config_dict[field.name] = cfg[field.name]
# then model parallel config
elif field in fields(model_parallel_config):
transformer_config_dict[field.name] = getattr(model_parallel_config, field.name)
else:
logging.warning(
f"The model: {self} does not have field.name: {field.name} in its cfg. "
f"Add this key to cfg or config_mapping to make to make it configurable."
)

transformer_config = TransformerConfig(**transformer_config_dict)

return transformer_config

def _build_vocab(self):
"""
Manipulate vocabulary (e.g., pad vocabulary for increased performance)/
Expand Down
Loading
Loading