Microsoft master fix merge conflicts #12

santacml · 2023-10-18T16:50:18Z

No description provided.

* FlexGen reference * Fix DS version and opt issue * Fix script

This PR updates the Llama check in the DS-Chat Step 3 PPO trainer to use the actor module object instead of model when accessing the configuration. This is necessary since not all model types will work when using model, particularly for the BLOOM model family.

Performance metrics in DeepSpeed-Chat were broken with transformers version 4.33.2, so that version is explicitly filtered out in the DeepSpeed-Chat requirements.txt. This will fix the currently broken nv-ds-chat workflow. The issue is recently fixed in this HuggingFace transformers PR, so this will not be a problem in the next transformers release.

…eedai#736)

Currently, chatbot assumes OPTForCausalLM model. Modify it to use the required model from the checkpoint. Change-Id: I04cbc28f87c7be4fc89a3fac39a3e5634b151b32 Signed-off-by: Moshe Island <misland@habana.ai> Co-authored-by: Moshe Island <misland@habana.ai>

* FlexGen reference * Fix DS version and opt issue * Fix script * Fix type and padding * loop option * PR feedback * Bug fix * Format fixes

DeepSpeed's bf16_optimizer does not have an overflow attribute. This is ok since bf16 dtype has same range as fp32 and is not expected to overflow. Therefore, for bf16, always return no overflow. Change-Id: I66a2204f3af81e52e7fa8d024afafdbbc7494327 Signed-off-by: Moshe Island <misland@habana.ai> Co-authored-by: Moshe Island <misland@habana.ai>

…i#746) Currently, only disable_dropout configuration is supported. However, some models (e.g. Bloom) have a default of dropout=0 in model config. Therefore, modify to support explicit dropout configuration. Also, update accordingly existing training scripts. Change-Id: I5ee96a77ca2b58d9787573a48009e2af36a270b0 Signed-off-by: Moshe Island <misland@habana.ai> Co-authored-by: Moshe Island <misland@habana.ai> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

When using lora only, get_optimizer_grouped_parameters() returns a list of 3 parameter groups, where only the second is not empty. Then, deepspeed removes empty parameter groups. [ref: DeepSpeedEngine._configure_optimizer() deepspeed v0.10.3] However, the lr_scheduler still contains 3 groups. This causes the lr scheduler to update the lora params with the wrong lr. Fix it by removing all empty groups in get_optimizer_grouped_parameters(). Change-Id: I520841312bdedd6a572cf4c827e0bbf06f983575 Signed-off-by: Moshe Island <misland@habana.ai> Co-authored-by: Moshe Island <misland@habana.ai>

Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com> Co-authored-by: Michael Wyatt <mrwyattii@gmail.com>

* support trust_remote_code * make trust_remote _code as an argument --------- Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com> Co-authored-by: Michael Wyatt <mrwyattii@gmail.com>

Co-authored-by: Conglong Li <conglong.li@gmail.com> Co-authored-by: xiaoxiawu-microsoft <xiaoxiawu@microsoft.com>

Using loss in fp32 can improve training accuracy for all 3 stages. This was tested with Bloom model using bf16 dtype While at it, fix stage2 reward model creation: pass zero_stage to create_critic_model. Also, in stage3, when using bf16 and tensorboard enabled, we record the actor and critic loss. Tensorboard accepets a scalar bf16 loss tensor and converts it to numpy. This fails since numpy does not support conversion from tensor to bf16. Fix it by logging to tensorboard the loss.item(). Change-Id: I9c8e95d4886cdb44aaa6c14c4aee738e133ae405 Signed-off-by: Moshe Island <misland@habana.ai> Co-authored-by: Moshe Island <misland@habana.ai>

Add support for periodic evaluation during rm reward model training. Configurable via added arguments: --eval_interval and --eval_iters. The default configuration is backward compatible. In addition, display also the score of the rejected predictions. Change-Id: Ib377fd731fe676c01114c087581a30777a3f3f49 Signed-off-by: Moshe Island <misland@habana.ai> Co-authored-by: Moshe Island <misland@habana.ai> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

* Fix typo * Fix precommit check * Format fix --------- Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

Co-authored-by: Michael Wyatt <mrwyattii@gmail.com>

…ai#766) Co-authored-by: Michael Wyatt <mrwyattii@gmail.com>

When using only optimize lora, we still need to train the v_head parameter. Change-Id: I252c3ee69819997bf336482c6779b070f2e76df8 Signed-off-by: Moshe Island <misland@habana.ai> Co-authored-by: Moshe Island <misland@habana.ai> Co-authored-by: Lev Kurilenko <113481193+lekurile@users.noreply.github.com>

Current default name used to detect LN layers is "LayerNorm.weight". This does not work for the following models: - opt: uses "layer_norm" - llama: uses "norm" and "layernorm" - bloom: uses "layernorm" and "ln_f" Therefore, modify the default names to accomodate for the above. Also, compare names in lower-caps to capture models with different caps. Change-Id: I5b805df2663c62daf3d9c8a31a973742e344e76b Signed-off-by: Moshe Island <misland@habana.ai> Co-authored-by: Moshe Island <misland@habana.ai> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

) Bloom-560m model has high variance in its last LN layer weight. This causes accuracy issues in bf16 stage2 training. Therefore, reset the parameters of the last LN layer before training. This is a good practice in any case where we replace the classifier that follows the LN. In addition, in case we are using only optimize lora, we need to force the training of the LN parameters that were reset. Note that current fix uses plain initialization of final LN. A separate commit will provide support for zero3 initialization. Change-Id: I323d8947907eb4a1cc0fa6354bdaf0cbbf33a68d Signed-off-by: Moshe Island <misland@habana.ai> Co-authored-by: Moshe Island <misland@habana.ai>

Currently, ppl is calculated for local worker and then averaged over data parallel workers. Fix it by first averaging the loss over data parallel workers and then caclulate ppl of averaged loss. While at it, print loss in evaluate. Change-Id: Ic4108ca48a18b326677d80c1eee81c535b3a27a9 Signed-off-by: Moshe Island <misland@habana.ai> Co-authored-by: Moshe Island <misland@habana.ai> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

Stages 1 & 2 append '<|endoftext|>' text marker to all samples. However, some tokenizers (e.g. OPT, Bloom), encode this marker as a sequence of subword tokens and not as a single special token. This commit adds an optional support to add the EOT marker as a special token to force the tokenizer to encode it as a single token. Note that using EOT special token may change the dynamics of stage3 training. Therefore, to be backward compliant, this commit makes it optional. Change-Id: If98d348fcaa7d6685e755aabe305e23e7649c367 Signed-off-by: Moshe Island <misland@habana.ai> Co-authored-by: Moshe Island <misland@habana.ai> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

tjruwase and others added 27 commits September 13, 2023 15:33

FlexGen reference (deepspeedai#730)

0b30bcb

* FlexGen reference * Fix DS version and opt issue * Fix script

Update README.md (deepspeedai#739)

9c94044

support DeepSpeedChat to run on different device besides cuda (deepsp…

db56381

…eedai#736)

support bf16 for RLHF training (deepspeedai#733)

9b3d898

Fix padding and dtype issues (deepspeedai#738)

58e4e9c

* FlexGen reference * Fix DS version and opt issue * Fix script * Fix type and padding * loop option * PR feedback * Bug fix * Format fixes

Add default value for tokenizer path (deepspeedai#699)

0d11c63

Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com> Co-authored-by: Michael Wyatt <mrwyattii@gmail.com>

support trust_remote_code in inference test (deepspeedai#709)

ca41e8b

* support trust_remote_code * make trust_remote _code as an argument --------- Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com> Co-authored-by: Michael Wyatt <mrwyattii@gmail.com>

Deepspeed-VisualChat (deepspeedai#753)

6c05e03

Co-authored-by: Conglong Li <conglong.li@gmail.com> Co-authored-by: xiaoxiawu-microsoft <xiaoxiawu@microsoft.com>

Update README.md (deepspeedai#757)

4364031

add the path to load the local dataset (deepspeedai#761)

10aef97

Fix typo (deepspeedai#749)

0855679

* Fix typo * Fix precommit check * Format fix --------- Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

Resolving epochs being hard-coded (deepspeedai#759)

1ba50ed

Co-authored-by: Michael Wyatt <mrwyattii@gmail.com>

Resolves the issue with evaluation on step2 for single GPU (deepspeed…

3517c6d

…ai#766) Co-authored-by: Michael Wyatt <mrwyattii@gmail.com>

merge main into python_package

262ec5c

santacml merged commit f14540e into python_package Oct 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Microsoft master fix merge conflicts #12

Microsoft master fix merge conflicts #12

Uh oh!

santacml commented Oct 18, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

Microsoft master fix merge conflicts #12

Microsoft master fix merge conflicts #12

Uh oh!

Conversation

santacml commented Oct 18, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants