Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After fine-tuning, the model outputs repetitive phrases #89

Open
Jackyzjz opened this issue Sep 11, 2024 · 4 comments
Open

After fine-tuning, the model outputs repetitive phrases #89

Jackyzjz opened this issue Sep 11, 2024 · 4 comments

Comments

@Jackyzjz
Copy link

Thanks for your good job。

I am trying to fine-tune the videollama2 model with my own data. However, after fine-tuning, the model starts to repeatedly output the same content. Could you help me solve this issue?

@thisurawz1
Copy link

Can you share the inference script that you used to do the inference with fine-tuned LoRA weights?

@Jackyzjz
Copy link
Author

I am performing LoRA fine-tuning based on videollama2-7b, and the script is as follows:

#!/bin/bash
export NCCL_P2P_DISABLE="1"
export NCCL_IB_DISABLE="1"

Environment Variables

ARG_WORLD_SIZE=${1:-1}
ARG_NPROC_PER_NODE=${2:-8}
ARG_MASTER_ADDR="127.0.0.1"
ARG_MASTER_PORT=16666
ARG_RANK=0

Multiple conditions

if [ ! -n "$WORLD_SIZE" ] || [ ! -n "$NPROC_PER_NODE" ]; then
WORLD_SIZE=$ARG_WORLD_SIZE
NPROC_PER_NODE=$ARG_NPROC_PER_NODE
fi
if [ ! -n "$MASTER_ADDR" ] || [ ! -n "$MASTER_PORT" ] || [ ! -n "$RANK" ]; then
MASTER_ADDR=$ARG_MASTER_ADDR
MASTER_PORT=$ARG_MASTER_PORT
RANK=$ARG_RANK
fi

echo "WORLD_SIZE: $WORLD_SIZE"
echo "NPROC_PER_NODE: $NPROC_PER_NODE"

Training Arguments

GLOBAL_BATCH_SIZE=8
LOCAL_BATCH_SIZE=1
GRADIENT_ACCUMULATION_STEPS=$[$GLOBAL_BATCH_SIZE/($WORLD_SIZE*$NPROC_PER_NODE*$LOCAL_BATCH_SIZE)]

Log Arguments

export TRANSFORMERS_OFFLINE=1
export WANDB_PROJECT=videollama2
RUN_NAME=new_dataset_lora
DATA_DIR=datasets
OUTP_DIR=/ssd/jacky
torchrun --nnodes $WORLD_SIZE
--nproc_per_node $NPROC_PER_NODE
--master_addr=$MASTER_ADDR
--master_port=$MASTER_PORT
--node_rank $RANK
videollama2/train_flash_attn.py
--lora_enable True --lora_r 128 --lora_alpha 256 --mm_projector_lr 2e-5
--deepspeed scripts/zero2.json
--model_type videollama2
--model_path /ssd/jacky/VideoLLaMA2-7B
--vision_tower /ssd/jacky/clip-vit-large-patch14-336
--mm_projector_type stc_connector
--data_path ${DATA_DIR}/videollava_sft/image_train.json
--data_folder ${DATA_DIR}/videollava_sft/
--mm_vision_select_layer -2
--num_frames 8
--bf16 True
--tf32 True
--fp16 False
--output_dir ${OUTP_DIR}/finetune_${RUN_NAME}
--num_train_epochs 5
--per_device_train_batch_size $LOCAL_BATCH_SIZE
--per_device_eval_batch_size 4
--gradient_accumulation_steps $GRADIENT_ACCUMULATION_STEPS
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 375
--save_total_limit 99
--learning_rate 2e-5
--weight_decay 0.
--warmup_ratio 0.03
--lr_scheduler_type "cosine"
--logging_steps 1
--model_max_length 2048
--gradient_checkpointing True
--dataloader_num_workers 4
--report_to tensorboard
--run_name $RUN_NAME \

@LiangMeng89
Copy link

Thanks for your good job。

I am trying to fine-tune the videollama2 model with my own data. However, after fine-tuning, the model starts to repeatedly output the same content. Could you help me solve this issue?

I also have this problem,do you solve it?

@LiangMeng89
Copy link

Thanks for your good job。

I am trying to fine-tune the videollama2 model with my own data. However, after fine-tuning, the model starts to repeatedly output the same content. Could you help me solve this issue?

Hello,I'm a phD student from ZJU, I also use videollama2 to do my own research,we create a WeChat group to discuss some issues of videollama2 and help each other,could you join us? Please contact me: WeChat number == LiangMeng19357260600, phone number == +86 19357260600,e-mail == liangmeng89@zju.edu.cn.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants