Difference between Qwen2-VL-2B with Colqwen2base #167

Huangsz2021 · 2025-01-10T02:43:04Z

Hi there,
I encountered an issue when trying to reproduce the colqwen2-v1.0.
First,Is there any difference between Qwen2-VL-2B with Colqwen2base? As in the train_colqwen2_model.yaml,the pretrained_model_name_or_path is colqwen2base.How can I train colqwen2base from Qwen2-VL-2B.
Second,when I try to train Qwen2-VL-2B via accelerate launch scripts/train/train_colbert.py scripts/configs/qwen2/train_colqwen2_model.yaml with the following configuration.I get a problem where the train loss does not decrease at 0.69 or so.How can I fix this.
This is my config
`config:
(): colpali_engine.trainer.colmodel_training.ColModelTrainingConfig
output_dir: !path colqwen2_model
processor:
(): colpali_engine.utils.transformers_wrappers.AllPurposeWrapper
class_to_instanciate: !ext colpali_engine.models.ColQwen2Processor
pretrained_model_name_or_path: "Qwen2-VL-2B-Instruct" # "./models/paligemma-3b-mix-448"
# num_image_tokens: 2048
# max_length: 50

model:
(): colpali_engine.utils.transformers_wrappers.AllPurposeWrapper
class_to_instanciate: !ext colpali_engine.models.ColQwen2
pretrained_model_name_or_path: "Qwen2-VL-2B-Instruct"
torch_dtype: !ext torch.bfloat16
use_cache: false
# attn_implementation: "flash_attention_2"
# device_map: "auto"
# quantization_config:
# (): transformers.BitsAndBytesConfig
# load_in_4bit: true
# bnb_4bit_quant_type: "nf4"
# bnb_4bit_compute_dtype: "bfloat16"
# bnb_4bit_use_double_quant: true

dataset_loading_func: !ext colpali_engine.utils.dataset_transformation.load_train_set
eval_dataset_loader: !import ../data/test_data.yaml

max_length: 50

run_eval: true
loss_func:
(): colpali_engine.loss.late_interaction_losses.ColbertPairwiseCELoss
tr_args:
(): transformers.training_args.TrainingArguments
output_dir: null
overwrite_output_dir: true
num_train_epochs: 1
per_device_train_batch_size: 3
gradient_checkpointing: true
gradient_checkpointing_kwargs: { "use_reentrant": false }
# gradient_checkpointing: true
# 6 x 8 gpus = 48 batch size
# gradient_accumulation_steps: 4
per_device_eval_batch_size: 2
eval_strategy: "steps"
dataloader_num_workers: 1
# fp16: true
# save_steps: 500
save_strategy: "epoch"
logging_steps: 10
eval_steps: 100
warmup_steps: 100
learning_rate: 5e-4
save_total_limit: 1
# resume_from_checkpoint: true
# optim: "paged_adamw_8bit"
# wandb logging
# wandb_project: "colqwen2"
# run_name: "colqwen2-ba32-nolora"
# report_to: "wandb"

`
And this is the train log

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Difference between Qwen2-VL-2B with Colqwen2base #167

Difference between Qwen2-VL-2B with Colqwen2base #167

Huangsz2021 commented Jan 10, 2025

Difference between Qwen2-VL-2B with Colqwen2base #167

Difference between Qwen2-VL-2B with Colqwen2base #167

Comments

Huangsz2021 commented Jan 10, 2025

max_length: 50