-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
训练速度慢,GPU利用率低 #1793
Labels
bug
Something isn't working
Comments
试试用mindspore高版本? |
完整代码也用附件传一下,我看看能不能复现 |
我暂时还没有Ascend环境,应该用不了2.3版本吧。代码已上传,麻烦您了。 |
加一下QQ群,给你申请点代金券 721548151 |
昨晚刚加😀 |
@EdwinWang37 没有,我后面换Ascend了 |
@dayunyan 非常感谢!那我也换成Ascend试试,不过为啥变慢,这个问题还真是个谜呀 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug/ 问题描述 (Mandatory / 必填)
LoRA微调Qwen2.5-3B模型时,训练阶段前10个step的速度比较快,能达到1~2s/step,随后逐渐减慢到10s/step以上,并且GPU的利用率在前期能达到100%,但在100个step之后就长时间地停在2%。
Ascend
/GPU
/CPU
) / 硬件环境:Software Environment / 软件环境 (Mandatory / 必填):
-- MindSpore version (e.g., 1.7.0.Bxxx) : 2.2.14
-- Python version (e.g., Python 3.7.5) : 3.9
-- OS platform and distribution (e.g., Linux Ubuntu 16.04): 22.04
-- GCC/Compiler version (if compiled from source):
Excute Mode / 执行模式 (Mandatory / 必填)(
PyNative
/Graph
):To Reproduce / 重现步骤 (Mandatory / 必填)
Steps to reproduce the behavior:
Expected behavior / 预期结果 (Mandatory / 必填)
训练速度保持稳定且快速,GPU利用率能稳定且不能过低。
Screenshots/ 日志 / 截图 (Mandatory / 必填)
Additional context / 备注 (Optional / 选填)
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: