Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference between Qwen2-VL-2B with Colqwen2base #167

Open
Huangsz2021 opened this issue Jan 10, 2025 · 0 comments
Open

Difference between Qwen2-VL-2B with Colqwen2base #167

Huangsz2021 opened this issue Jan 10, 2025 · 0 comments

Comments

@Huangsz2021
Copy link

Hi there,
I encountered an issue when trying to reproduce the colqwen2-v1.0.
First,Is there any difference between Qwen2-VL-2B with Colqwen2base? As in the train_colqwen2_model.yaml,the pretrained_model_name_or_path is colqwen2base.How can I train colqwen2base from Qwen2-VL-2B.
Second,when I try to train Qwen2-VL-2B via accelerate launch scripts/train/train_colbert.py scripts/configs/qwen2/train_colqwen2_model.yaml with the following configuration.I get a problem where the train loss does not decrease at 0.69 or so.How can I fix this.
This is my config
`config:
(): colpali_engine.trainer.colmodel_training.ColModelTrainingConfig
output_dir: !path colqwen2_model
processor:
(): colpali_engine.utils.transformers_wrappers.AllPurposeWrapper
class_to_instanciate: !ext colpali_engine.models.ColQwen2Processor
pretrained_model_name_or_path: "Qwen2-VL-2B-Instruct" # "./models/paligemma-3b-mix-448"
# num_image_tokens: 2048
# max_length: 50

model:
(): colpali_engine.utils.transformers_wrappers.AllPurposeWrapper
class_to_instanciate: !ext colpali_engine.models.ColQwen2
pretrained_model_name_or_path: "Qwen2-VL-2B-Instruct"
torch_dtype: !ext torch.bfloat16
use_cache: false
# attn_implementation: "flash_attention_2"
# device_map: "auto"
# quantization_config:
# (): transformers.BitsAndBytesConfig
# load_in_4bit: true
# bnb_4bit_quant_type: "nf4"
# bnb_4bit_compute_dtype: "bfloat16"
# bnb_4bit_use_double_quant: true

dataset_loading_func: !ext colpali_engine.utils.dataset_transformation.load_train_set
eval_dataset_loader: !import ../data/test_data.yaml

max_length: 50

run_eval: true
loss_func:
(): colpali_engine.loss.late_interaction_losses.ColbertPairwiseCELoss
tr_args:
(): transformers.training_args.TrainingArguments
output_dir: null
overwrite_output_dir: true
num_train_epochs: 1
per_device_train_batch_size: 3
gradient_checkpointing: true
gradient_checkpointing_kwargs: { "use_reentrant": false }
# gradient_checkpointing: true
# 6 x 8 gpus = 48 batch size
# gradient_accumulation_steps: 4
per_device_eval_batch_size: 2
eval_strategy: "steps"
dataloader_num_workers: 1
# fp16: true
# save_steps: 500
save_strategy: "epoch"
logging_steps: 10
eval_steps: 100
warmup_steps: 100
learning_rate: 5e-4
save_total_limit: 1
# resume_from_checkpoint: true
# optim: "paged_adamw_8bit"
# wandb logging
# wandb_project: "colqwen2"
# run_name: "colqwen2-ba32-nolora"
# report_to: "wandb"

peft_config:
(): peft.LoraConfig
r: 32
lora_alpha: 32
lora_dropout: 0.1
init_lora_weights: "gaussian"
bias: "none"
task_type: "FEATURE_EXTRACTION"
target_modules: '(.(model).(down_proj|gate_proj|up_proj|k_proj|q_proj|v_proj|o_proj).$|.(custom_text_proj).$)'
# target_modules: '(.
(language_model).(down_proj|gate_proj|up_proj|k_proj|q_proj|v_proj|o_proj).$|.(custom_text_proj).$)'

`
And this is the train log
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant