forked from deepspeedai/DeepSpeedExamples
-
Notifications
You must be signed in to change notification settings - Fork 0
Microsoft master fix merge conflicts #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* FlexGen reference * Fix DS version and opt issue * Fix script
This PR updates the Llama check in the DS-Chat Step 3 PPO trainer to use the actor module object instead of model when accessing the configuration. This is necessary since not all model types will work when using model, particularly for the BLOOM model family.
Performance metrics in DeepSpeed-Chat were broken with transformers version 4.33.2, so that version is explicitly filtered out in the DeepSpeed-Chat requirements.txt. This will fix the currently broken nv-ds-chat workflow. The issue is recently fixed in this HuggingFace transformers PR, so this will not be a problem in the next transformers release.
Currently, chatbot assumes OPTForCausalLM model. Modify it to use the required model from the checkpoint. Change-Id: I04cbc28f87c7be4fc89a3fac39a3e5634b151b32 Signed-off-by: Moshe Island <misland@habana.ai> Co-authored-by: Moshe Island <misland@habana.ai>
* FlexGen reference * Fix DS version and opt issue * Fix script * Fix type and padding * loop option * PR feedback * Bug fix * Format fixes
DeepSpeed's bf16_optimizer does not have an overflow attribute. This is ok since bf16 dtype has same range as fp32 and is not expected to overflow. Therefore, for bf16, always return no overflow. Change-Id: I66a2204f3af81e52e7fa8d024afafdbbc7494327 Signed-off-by: Moshe Island <misland@habana.ai> Co-authored-by: Moshe Island <misland@habana.ai>
…i#746) Currently, only disable_dropout configuration is supported. However, some models (e.g. Bloom) have a default of dropout=0 in model config. Therefore, modify to support explicit dropout configuration. Also, update accordingly existing training scripts. Change-Id: I5ee96a77ca2b58d9787573a48009e2af36a270b0 Signed-off-by: Moshe Island <misland@habana.ai> Co-authored-by: Moshe Island <misland@habana.ai> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
When using lora only, get_optimizer_grouped_parameters() returns a list of 3 parameter groups, where only the second is not empty. Then, deepspeed removes empty parameter groups. [ref: DeepSpeedEngine._configure_optimizer() deepspeed v0.10.3] However, the lr_scheduler still contains 3 groups. This causes the lr scheduler to update the lora params with the wrong lr. Fix it by removing all empty groups in get_optimizer_grouped_parameters(). Change-Id: I520841312bdedd6a572cf4c827e0bbf06f983575 Signed-off-by: Moshe Island <misland@habana.ai> Co-authored-by: Moshe Island <misland@habana.ai>
Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com> Co-authored-by: Michael Wyatt <mrwyattii@gmail.com>
* support trust_remote_code * make trust_remote _code as an argument --------- Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com> Co-authored-by: Michael Wyatt <mrwyattii@gmail.com>
Co-authored-by: Conglong Li <conglong.li@gmail.com> Co-authored-by: xiaoxiawu-microsoft <xiaoxiawu@microsoft.com>
Using loss in fp32 can improve training accuracy for all 3 stages. This was tested with Bloom model using bf16 dtype While at it, fix stage2 reward model creation: pass zero_stage to create_critic_model. Also, in stage3, when using bf16 and tensorboard enabled, we record the actor and critic loss. Tensorboard accepets a scalar bf16 loss tensor and converts it to numpy. This fails since numpy does not support conversion from tensor to bf16. Fix it by logging to tensorboard the loss.item(). Change-Id: I9c8e95d4886cdb44aaa6c14c4aee738e133ae405 Signed-off-by: Moshe Island <misland@habana.ai> Co-authored-by: Moshe Island <misland@habana.ai>
Add support for periodic evaluation during rm reward model training. Configurable via added arguments: --eval_interval and --eval_iters. The default configuration is backward compatible. In addition, display also the score of the rejected predictions. Change-Id: Ib377fd731fe676c01114c087581a30777a3f3f49 Signed-off-by: Moshe Island <misland@habana.ai> Co-authored-by: Moshe Island <misland@habana.ai> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
* Fix typo * Fix precommit check * Format fix --------- Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Michael Wyatt <mrwyattii@gmail.com>
…ai#766) Co-authored-by: Michael Wyatt <mrwyattii@gmail.com>
When using only optimize lora, we still need to train the v_head parameter. Change-Id: I252c3ee69819997bf336482c6779b070f2e76df8 Signed-off-by: Moshe Island <misland@habana.ai> Co-authored-by: Moshe Island <misland@habana.ai> Co-authored-by: Lev Kurilenko <113481193+lekurile@users.noreply.github.com>
Current default name used to detect LN layers is "LayerNorm.weight". This does not work for the following models: - opt: uses "layer_norm" - llama: uses "norm" and "layernorm" - bloom: uses "layernorm" and "ln_f" Therefore, modify the default names to accomodate for the above. Also, compare names in lower-caps to capture models with different caps. Change-Id: I5b805df2663c62daf3d9c8a31a973742e344e76b Signed-off-by: Moshe Island <misland@habana.ai> Co-authored-by: Moshe Island <misland@habana.ai> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
) Bloom-560m model has high variance in its last LN layer weight. This causes accuracy issues in bf16 stage2 training. Therefore, reset the parameters of the last LN layer before training. This is a good practice in any case where we replace the classifier that follows the LN. In addition, in case we are using only optimize lora, we need to force the training of the LN parameters that were reset. Note that current fix uses plain initialization of final LN. A separate commit will provide support for zero3 initialization. Change-Id: I323d8947907eb4a1cc0fa6354bdaf0cbbf33a68d Signed-off-by: Moshe Island <misland@habana.ai> Co-authored-by: Moshe Island <misland@habana.ai>
Currently, ppl is calculated for local worker and then averaged over data parallel workers. Fix it by first averaging the loss over data parallel workers and then caclulate ppl of averaged loss. While at it, print loss in evaluate. Change-Id: Ic4108ca48a18b326677d80c1eee81c535b3a27a9 Signed-off-by: Moshe Island <misland@habana.ai> Co-authored-by: Moshe Island <misland@habana.ai> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Stages 1 & 2 append '<|endoftext|>' text marker to all samples. However, some tokenizers (e.g. OPT, Bloom), encode this marker as a sequence of subword tokens and not as a single special token. This commit adds an optional support to add the EOT marker as a special token to force the tokenizer to encode it as a single token. Note that using EOT special token may change the dynamics of stage3 training. Therefore, to be backward compliant, this commit makes it optional. Change-Id: If98d348fcaa7d6685e755aabe305e23e7649c367 Signed-off-by: Moshe Island <misland@habana.ai> Co-authored-by: Moshe Island <misland@habana.ai> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.