Skip to content

Qwen-Omni 全量微调grpo报错ValueError: max_new_tokens must be greater than 0, but is -16384 #4392

@TUDelftHao

Description

@TUDelftHao

Describe the bug
用grpo微调Omni-7B模型,采用full模式,训练时报错如下。怀疑是视频数据比较长,传入的token数大于模型的model length导致。

Image

请问这种情况下需要调整哪些参数?GRPO训练时会有Padding模式把截断的数据拼接在一起吗?

Your hardware and system info
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)
GPU: H20, 8卡96G
cuda:12.4
torch: 2.6.0+cu124
transformers: 4.52.3
ms-swift commit: 6dc42ab

Additional context
Add any other context about the problem here(在这里补充其他信息)
启动脚本:
`export DEBUG_MODE="true"
export LOG_PATH="./debug_log_omini-swift_debug_0527.txt"

MAX_PIXELS=1003520
NPROC_PER_NODE=8
WANDB_API_KEY=31b42ed749c63f21cc34b408e4b4e83f41b21a59
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
swift rlhf
--rlhf_type grpo
--model /apdcephfs_gy2/share_303215196/francishni/pretrain_models/Qwen2.5-Omni-7B
--reward_funcs external_mc_acc format
--reward_weights 1 1
--train_type full
--torch_dtype bfloat16
--dataset omini-swift-debug.json
--external_plugins examples/train/grpo/plugin/plugin.py
--max_completion_length 1024
--num_train_epochs 1
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--learning_rate 1e-5
--gradient_accumulation_steps 1
--eval_steps 10000000
--save_steps 300
--save_total_limit 8
--logging_steps 5
--max_length 49152
--output_dir output/omini-swift_debug
--warmup_ratio 0.05
--dataloader_num_workers 8
--dataset_num_proc 8
--num_generations 8
--temperature 1.
--top_p 0.99
--top_k 50
--system 'examples/train/grpo/prompt.txt'
--deepspeed zero2
--log_completions true
--report_to wandb`

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions