You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would also like to ask if others have encountered memory leaks. During fine-tuning by using the RL-based models GPU memory increases over time. For example, 2 hours after the training started, memory allocation on the GPU increases.
System Info
请问7B的推理模型结合7B的PRM训练需要多少显存?在测试中发现80G会报显存溢出?是否能在多卡上训练呢?
Who can help?
@ziyuwan
Information
Tasks
Reproduction
python -u train_math.py
--dataset_path "./math_500.jsonl"
--model_name_or_path "./deepseek-math-7b-instruct" \
--prm_model_name_or_path "./math-shepherd-mistral-7b-prm"
--algorithm_name "APPO"
--num_mini_batch 4
--ppo_epoch 1
基座模型用的是deepseek-math-7b-instruct,PRM用的是math-shepherd-mistral-7b-prm
Expected behavior
利用80G单卡或多卡实现对7B基座模型的RL训练
The text was updated successfully, but these errors were encountered: