-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
多卡似乎不能将每张卡跑满,请问如何才能让每张卡的计算负载跑满呢 #66
Comments
你训练用的哪个方法 |
用的lora,训练baichuan-13B |
不应该呀,我训练的时候卡基本都是占满的 |
是不是因为我没有用deepspeed呢?能麻烦看一下您跑baichuan-13b的shell脚本吗 |
或许在这个位置,开启了模型并行,你注释掉这两行试试 |
CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=1 train_lora.py |
我注释掉您说的那两句了,但是跑的时候还是单张卡占用高 |
好像找到问题了,需要设置启动时的参数--nproc_per_node=2 |
你能完整训练完吗,我和你一样的训练代码跑了200步就挂了 |
最后没用deepspeed,速度反而会特别慢 |
我设置了CUDA_VISIBLE_DEVICE和device_map,在2张A100上跑的时候,发现确实都有内存占用,但是gpu负载总是某张卡高,其他都很低。
The text was updated successfully, but these errors were encountered: