- 
                Notifications
    You must be signed in to change notification settings 
- Fork 935
Description
Describe the bug
用grpo微调Omni-7B模型,采用full模式,训练时报错如下。怀疑是视频数据比较长,传入的token数大于模型的model length导致。
 
请问这种情况下需要调整哪些参数?GRPO训练时会有Padding模式把截断的数据拼接在一起吗?
Your hardware and system info
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)
GPU: H20, 8卡96G
cuda:12.4
torch: 2.6.0+cu124
transformers: 4.52.3
ms-swift commit: 6dc42ab
Additional context
Add any other context about the problem here(在这里补充其他信息)
启动脚本:
`export DEBUG_MODE="true"
export LOG_PATH="./debug_log_omini-swift_debug_0527.txt"
MAX_PIXELS=1003520 
NPROC_PER_NODE=8 
WANDB_API_KEY=31b42ed749c63f21cc34b408e4b4e83f41b21a59 
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 
swift rlhf 
--rlhf_type grpo 
--model /apdcephfs_gy2/share_303215196/francishni/pretrain_models/Qwen2.5-Omni-7B 
--reward_funcs external_mc_acc format 
--reward_weights 1 1 
--train_type full 
--torch_dtype bfloat16 
--dataset omini-swift-debug.json 
--external_plugins examples/train/grpo/plugin/plugin.py 
--max_completion_length 1024 
--num_train_epochs 1 
--per_device_train_batch_size 1 
--per_device_eval_batch_size 1 
--learning_rate 1e-5 
--gradient_accumulation_steps 1 
--eval_steps 10000000 
--save_steps 300 
--save_total_limit 8 
--logging_steps 5 
--max_length 49152 
--output_dir output/omini-swift_debug 
--warmup_ratio 0.05 
--dataloader_num_workers 8 
--dataset_num_proc 8 
--num_generations 8 
--temperature 1. 
--top_p 0.99 
--top_k 50 
--system 'examples/train/grpo/prompt.txt' 
--deepspeed zero2 
--log_completions true 
--report_to wandb`