Qwen-Omni 全量微调grpo报错ValueError: `max_new_tokens` must be greater than 0, but is -16384

**Describe the bug**
用grpo微调Omni-7B模型，采用full模式，训练时报错如下。怀疑是视频数据比较长，传入的token数大于模型的model length导致。

<img width="1528" alt="Image" src="https://github.com/user-attachments/assets/11f19636-ab33-4b50-bbab-250bef427aca" />

请问这种情况下需要调整哪些参数？GRPO训练时会有Padding模式把截断的数据拼接在一起吗？

**Your hardware and system info**
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息，如CUDA版本，系统，GPU型号和torch版本等)
GPU: H20, 8卡96G
cuda:12.4
torch: 2.6.0+cu124
transformers: 4.52.3
ms-swift commit: 6dc42ab56e19fd4485e0fb0309d5552b2a1d35d8

**Additional context**
Add any other context about the problem here(在这里补充其他信息)
启动脚本：
`export DEBUG_MODE="true" 
export LOG_PATH="./debug_log_omini-swift_debug_0527.txt"

MAX_PIXELS=1003520 \
NPROC_PER_NODE=8 \
WANDB_API_KEY=31b42ed749c63f21cc34b408e4b4e83f41b21a59 \
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
swift rlhf \
    --rlhf_type grpo \
    --model /apdcephfs_gy2/share_303215196/francishni/pretrain_models/Qwen2.5-Omni-7B \
    --reward_funcs external_mc_acc format \
    --reward_weights 1 1 \
    --train_type full \
    --torch_dtype bfloat16 \
    --dataset omini-swift-debug.json \
    --external_plugins examples/train/grpo/plugin/plugin.py \
    --max_completion_length 1024 \
    --num_train_epochs 1 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --learning_rate 1e-5 \
    --gradient_accumulation_steps 1 \
    --eval_steps 10000000 \
    --save_steps 300 \
    --save_total_limit 8 \
    --logging_steps 5 \
    --max_length 49152 \
    --output_dir output/omini-swift_debug \
    --warmup_ratio 0.05 \
    --dataloader_num_workers 8 \
    --dataset_num_proc 8 \
    --num_generations 8 \
    --temperature 1. \
    --top_p 0.99 \
    --top_k 50 \
    --system 'examples/train/grpo/prompt.txt' \
    --deepspeed zero2 \
    --log_completions true \
    --report_to wandb`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qwen-Omni 全量微调grpo报错ValueError: `max_new_tokens` must be greater than 0, but is -16384 #4392

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Qwen-Omni 全量微调grpo报错ValueError: max_new_tokens must be greater than 0, but is -16384 #4392

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Qwen-Omni 全量微调grpo报错ValueError: `max_new_tokens` must be greater than 0, but is -16384 #4392