You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi there,
I encountered an issue when trying to reproduce the colqwen2-v1.0.
First,Is there any difference between Qwen2-VL-2B with Colqwen2base? As in the train_colqwen2_model.yaml,the pretrained_model_name_or_path is colqwen2base.How can I train colqwen2base from Qwen2-VL-2B.
Second,when I try to train Qwen2-VL-2B via accelerate launch scripts/train/train_colbert.py scripts/configs/qwen2/train_colqwen2_model.yaml with the following configuration.I get a problem where the train loss does not decrease at 0.69 or so.How can I fix this.
This is my config
`config:
(): colpali_engine.trainer.colmodel_training.ColModelTrainingConfig
output_dir: !path colqwen2_model
processor:
(): colpali_engine.utils.transformers_wrappers.AllPurposeWrapper
class_to_instanciate: !ext colpali_engine.models.ColQwen2Processor
pretrained_model_name_or_path: "Qwen2-VL-2B-Instruct" # "./models/paligemma-3b-mix-448"
# num_image_tokens: 2048
# max_length: 50
Hi there,
I encountered an issue when trying to reproduce the colqwen2-v1.0.
First,Is there any difference between Qwen2-VL-2B with Colqwen2base? As in the
train_colqwen2_model.yaml
,the pretrained_model_name_or_path is colqwen2base.How can I train colqwen2base from Qwen2-VL-2B.Second,when I try to train Qwen2-VL-2B via
accelerate launch scripts/train/train_colbert.py scripts/configs/qwen2/train_colqwen2_model.yaml
with the following configuration.I get a problem where the train loss does not decrease at 0.69 or so.How can I fix this.This is my config
`config:
(): colpali_engine.trainer.colmodel_training.ColModelTrainingConfig
output_dir: !path colqwen2_model
processor:
(): colpali_engine.utils.transformers_wrappers.AllPurposeWrapper
class_to_instanciate: !ext colpali_engine.models.ColQwen2Processor
pretrained_model_name_or_path: "Qwen2-VL-2B-Instruct" # "./models/paligemma-3b-mix-448"
# num_image_tokens: 2048
# max_length: 50
model:
(): colpali_engine.utils.transformers_wrappers.AllPurposeWrapper
class_to_instanciate: !ext colpali_engine.models.ColQwen2
pretrained_model_name_or_path: "Qwen2-VL-2B-Instruct"
torch_dtype: !ext torch.bfloat16
use_cache: false
# attn_implementation: "flash_attention_2"
# device_map: "auto"
# quantization_config:
# (): transformers.BitsAndBytesConfig
# load_in_4bit: true
# bnb_4bit_quant_type: "nf4"
# bnb_4bit_compute_dtype: "bfloat16"
# bnb_4bit_use_double_quant: true
dataset_loading_func: !ext colpali_engine.utils.dataset_transformation.load_train_set
eval_dataset_loader: !import ../data/test_data.yaml
max_length: 50
run_eval: true
loss_func:
(): colpali_engine.loss.late_interaction_losses.ColbertPairwiseCELoss
tr_args:
(): transformers.training_args.TrainingArguments
output_dir: null
overwrite_output_dir: true
num_train_epochs: 1
per_device_train_batch_size: 3
gradient_checkpointing: true
gradient_checkpointing_kwargs: { "use_reentrant": false }
# gradient_checkpointing: true
# 6 x 8 gpus = 48 batch size
# gradient_accumulation_steps: 4
per_device_eval_batch_size: 2
eval_strategy: "steps"
dataloader_num_workers: 1
# fp16: true
# save_steps: 500
save_strategy: "epoch"
logging_steps: 10
eval_steps: 100
warmup_steps: 100
learning_rate: 5e-4
save_total_limit: 1
# resume_from_checkpoint: true
# optim: "paged_adamw_8bit"
# wandb logging
# wandb_project: "colqwen2"
# run_name: "colqwen2-ba32-nolora"
# report_to: "wandb"
peft_config:
(): peft.LoraConfig
r: 32
lora_alpha: 32
lora_dropout: 0.1
init_lora_weights: "gaussian"
bias: "none"
task_type: "FEATURE_EXTRACTION"
target_modules: '(.(model).(down_proj|gate_proj|up_proj|k_proj|q_proj|v_proj|o_proj).$|.(custom_text_proj).$)'
# target_modules: '(.(language_model).(down_proj|gate_proj|up_proj|k_proj|q_proj|v_proj|o_proj).$|.(custom_text_proj).$)'
`
And this is the train log
The text was updated successfully, but these errors were encountered: