#fix qwen2 abnormal loss caused by SoftmaxCrossEntropyWithLogits on 910A/B #2034

xuhangscut · 2025-05-06T09:06:49Z

Using SoftmaxCrossEntropyWithLogits function trainning loss chart as follows:

It should use CrossEntropyLoss function like this trend in pytorch:

It can be seen that the decrease in loss has become less significant, and the ability learned through testing has decreased.
This pr will maintain the code for SoftmaxCrossEntropyWithLogits function using on orangepi

experiment environment:
python 3.9.10
mindspore 2.5.0
Ascend 910A/B

…t support bfloat16

…10A/B

…loss

xuhangscut and others added 4 commits April 23, 2025 11:47

#fix train deepseek-distill-qwen-1.5b on bfloat16 causes np1.26 do no…

65b838e

…t support bfloat16

Merge branch 'mindspore-lab:0.4' into 0.4

d08bb35

#fix qwen2 abnormal loss caused by SoftmaxCrossEntropyWithLogits on 9…

8dd78a1

…10A/B

Merge branch '0.4' of https://github.com/xuhangscut/mindnlp into fix_…

28d10eb

…loss

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

#fix qwen2 abnormal loss caused by SoftmaxCrossEntropyWithLogits on 910A/B #2034

#fix qwen2 abnormal loss caused by SoftmaxCrossEntropyWithLogits on 910A/B #2034

Uh oh!

xuhangscut commented May 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

#fix qwen2 abnormal loss caused by SoftmaxCrossEntropyWithLogits on 910A/B #2034

Are you sure you want to change the base?

#fix qwen2 abnormal loss caused by SoftmaxCrossEntropyWithLogits on 910A/B #2034

Uh oh!

Conversation

xuhangscut commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

xuhangscut commented May 6, 2025 •

edited

Loading