deepspeed-chat: fix training stage1 ppl calculation #773

mosheisland · 2023-10-17T06:50:51Z

Currently, ppl is calculated for local worker and then averaged over data parallel workers. Fix it by first averaging the loss over data parallel workers and then caclulate ppl of averaged loss.

While at it, print loss in evaluate.

Change-Id: Ic4108ca48a18b326677d80c1eee81c535b3a27a9

Currently, ppl is calculated for local worker and then averaged over data parallel workers. Fix it by first averaging the loss over data parallel workers and then caclulate ppl of averaged loss. While at it, print loss in evaluate. Change-Id: Ic4108ca48a18b326677d80c1eee81c535b3a27a9 Signed-off-by: Moshe Island <misland@habana.ai>

Currently, ppl is calculated for local worker and then averaged over data parallel workers. Fix it by first averaging the loss over data parallel workers and then caclulate ppl of averaged loss. While at it, print loss in evaluate. Change-Id: Ic4108ca48a18b326677d80c1eee81c535b3a27a9 Signed-off-by: Moshe Island <misland@habana.ai> Co-authored-by: Moshe Island <misland@habana.ai> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

mosheisland requested review from RezaYazdaniAminabadi, ShadenSmith, arashb, awan-10, conglongli, duli2012, eltonzheng, jeffra, minjiaz, mrwyattii, samyam, tjruwase, xiaoxiawu-microsoft and yaozhewei as code owners October 17, 2023 06:50

tjruwase approved these changes Oct 17, 2023

View reviewed changes

Merge branch 'master' into 10_fix_stage1_ppl

a0575c7

tjruwase merged commit f7ff9dd into deepspeedai:master Oct 17, 2023

mosheisland deleted the 10_fix_stage1_ppl branch November 22, 2023 07:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deepspeed-chat: fix training stage1 ppl calculation #773

deepspeed-chat: fix training stage1 ppl calculation #773

Uh oh!

mosheisland commented Oct 17, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

deepspeed-chat: fix training stage1 ppl calculation #773

deepspeed-chat: fix training stage1 ppl calculation #773

Uh oh!

Conversation

mosheisland commented Oct 17, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants