Skip to content

Conversation

@mosheisland
Copy link
Contributor

Currently, ppl is calculated for local worker and then averaged over data parallel workers. Fix it by first averaging the loss over data parallel workers and then caclulate ppl of averaged loss.

While at it, print loss in evaluate.

Change-Id: Ic4108ca48a18b326677d80c1eee81c535b3a27a9

Currently, ppl is calculated for local worker and then averaged over data
parallel workers. Fix it by first averaging the loss over data parallel
workers and then caclulate ppl of averaged loss.

While at it, print loss in evaluate.

Change-Id: Ic4108ca48a18b326677d80c1eee81c535b3a27a9
Signed-off-by: Moshe Island <misland@habana.ai>
@tjruwase tjruwase merged commit f7ff9dd into deepspeedai:master Oct 17, 2023
@mosheisland mosheisland deleted the 10_fix_stage1_ppl branch November 22, 2023 07:52
hwchen2017 pushed a commit that referenced this pull request Jun 8, 2025
Currently, ppl is calculated for local worker and then averaged over data
parallel workers. Fix it by first averaging the loss over data parallel
workers and then caclulate ppl of averaged loss.

While at it, print loss in evaluate.

Change-Id: Ic4108ca48a18b326677d80c1eee81c535b3a27a9

Signed-off-by: Moshe Island <misland@habana.ai>
Co-authored-by: Moshe Island <misland@habana.ai>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants