How do I evaluate the results after stage-1 of training BLIP2? #774

hawkiyc · 2024-12-12T07:35:31Z

Hi, developers,

I am revising your code to build a modified BLIP2 model for time-series input. Now, I am trying to figure out the architecture of this framework. I have tested the bash run_scripts/blip2/train/pretrain_stage1.sh command with the coco dataset (btw, there are mismatches between images and annotations in the vg dataset, so I removed it), and it seems to work fine. However, I cannot find any script or .yaml file for evaluation of the result of stage 1. I have checked the lavis/configs/datasets/coco/defaults_cap.yaml file, and there is information for train, val, and test subsets.

defaults_cap.yaml

datasets:
  coco_caption: # name of the dataset builder
    dataset_card: dataset_card/coco_caption.md
    # data_dir: ${env.data_dir}/datasets
    data_type: images # [images|videos|features]

    build_info:
      # Be careful not to append minus sign (-) before split to avoid itemizing
      annotations:
        train:
          url: https://storage.googleapis.com/sfr-vision-language-research/datasets/coco_karpathy_train.json
          md5: aa31ac474cf6250ebb81d18348a07ed8
          storage: coco/annotations/coco_karpathy_train.json
        val:
          url: https://storage.googleapis.com/sfr-vision-language-research/datasets/coco_karpathy_val.json
          md5: b273847456ef5580e33713b1f7de52a0
          storage:  coco/annotations/coco_karpathy_val.json
        test:
          url: https://storage.googleapis.com/sfr-vision-language-research/datasets/coco_karpathy_test.json
          md5: 3ff34b0ef2db02d01c37399f6a2a6cd1
          storage: coco/annotations/coco_karpathy_test.json
      images:
        storage: coco/images/

Here is the printed result in the terminal:

Train: data epoch: [4]  [5550/5667]  eta: 0:03:26  lr: 0.000019  loss: 4.0731  loss_itc: 0.9712 (0.9633)  loss_itm: 0.1881 (0.1714)  loss_lm: 2.8563 (2.8436)  time: 1.7917  data: 0.0000  max mem: 27191
Train: data epoch: [4]  [5600/5667]  eta: 0:01:58  lr: 0.000019  loss: 4.1341  loss_itc: 0.9485 (0.9633)  loss_itm: 0.1703 (0.1713)  loss_lm: 2.8336 (2.8436)  time: 1.7898  data: 0.0000  max mem: 27191
Train: data epoch: [4]  [5650/5667]  eta: 0:00:30  lr: 0.000019  loss: 3.8998  loss_itc: 0.9417 (0.9632)  loss_itm: 0.1509 (0.1713)  loss_lm: 2.8545 (2.8438)  time: 1.7882  data: 0.0000  max mem: 27191
Train: data epoch: [4]  [5666/5667]  eta: 0:00:01  lr: 0.000019  loss: 3.9018  loss_itc: 0.9507 (0.9632)  loss_itm: 0.1535 (0.1713)  loss_lm: 2.8405 (2.8438)  time: 1.8221  data: 0.0000  max mem: 27191
Train: data epoch: [4] Total time: 2:47:07 (1.7694 s / it)
INFO - 2024-12-12 03:24:12,536 - base_task - Averaged stats: lr: 0.0000  loss: 3.9783  loss_itc: 0.9632  loss_itm: 0.1713  loss_lm: 2.8438
INFO - 2024-12-12 03:24:12,543 - runner_base - No validation splits found.
INFO - 2024-12-12 03:24:12,598 - runner_base - Saving checkpoint at epoch 4 to /home/revlis_ai/Documents/training_models_temp/LAVIS_with_JoLT/lavis/output/BLIP2/Pretrain_stage1/20241211132/checkpoint_4.pth.
INFO - 2024-12-12 03:24:15,828 - runner_base - Saving checkpoint at epoch 4 to /home/revlis_ai/Documents/training_models_temp/LAVIS_with_JoLT/lavis/output/BLIP2/Pretrain_stage1/20241211132/checkpoint_4.pth.
INFO - 2024-12-12 03:24:23,201 - runner_base - No validation splits found.
INFO - 2024-12-12 03:24:23,203 - runner_base - Training time 13:55:33
[rank0]:[W1212 03:24:24.182641511 ProcessGroupNCCL.cpp:1168] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())

Output log file

{
    "run": {
        "task": "image_text_pretrain",
        "lr_sched": "linear_warmup_cosine_lr",
        "init_lr": 0.0001,
        "min_lr": 1e-05,
        "warmup_lr": 1e-06,
        "weight_decay": 0.05,
        "max_epoch": 5,
        "batch_size_train": 100,
        "batch_size_eval": 64,
        "num_workers": 4,
        "warmup_steps": 5000,
        "seed": 42,
        "output_dir": "output/BLIP2/Pretrain_stage1",
        "amp": true,
        "resume_ckpt_path": null,
        "evaluate": false,
        "train_splits": [
            "train"
        ],
        "device": "cuda",
        "world_size": 1,
        "dist_url": "env://",
        "distributed": true,
        "rank": 0,
        "gpu": 0,
        "dist_backend": "nccl"
    },
    "model": {
        "arch": "blip2",
        "load_finetuned": false,
        "pretrained": "https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/BLIP2/blip2_pretrained.pth",
        "finetuned": "",
        "image_size": 224,
        "drop_path_rate": 0,
        "use_grad_checkpoint": false,
        "vit_precision": "fp16",
        "freeze_vit": true,
        "num_query_token": 32,
        "model_type": "pretrain",
        "load_pretrained": false
    },
    "preprocess": {
        "vis_processor": {
            "train": {
                "name": "blip_image_train",
                "image_size": 224
            },
            "eval": {
                "name": "blip_image_eval",
                "image_size": 224
            }
        },
        "text_processor": {
            "train": {
                "name": "blip_caption"
            },
            "eval": {
                "name": "blip_caption"
            }
        }
    },
    "datasets": {
        "coco_caption": {
            "dataset_card": "dataset_card/coco_caption.md",
            "data_type": "images",
            "build_info": {
                "annotations": {
                    "train": {
                        "url": "https://storage.googleapis.com/sfr-vision-language-research/datasets/coco_karpathy_train.json",
                        "md5": "aa31ac474cf6250ebb81d18348a07ed8",
                        "storage": "coco/annotations/coco_karpathy_train.json"
                    },
                    "val": {
                        "url": "https://storage.googleapis.com/sfr-vision-language-research/datasets/coco_karpathy_val.json",
                        "md5": "b273847456ef5580e33713b1f7de52a0",
                        "storage": "coco/annotations/coco_karpathy_val.json"
                    },
                    "test": {
                        "url": "https://storage.googleapis.com/sfr-vision-language-research/datasets/coco_karpathy_test.json",
                        "md5": "3ff34b0ef2db02d01c37399f6a2a6cd1",
                        "storage": "coco/annotations/coco_karpathy_test.json"
                    }
                },
                "images": {
                    "storage": "coco/images/"
                }
            },
            "vis_processor": {
                "train": {
                    "name": "blip2_image_train",
                    "image_size": 224
                }
            },
            "text_processor": {
                "train": {
                    "name": "blip_caption"
                }
            }
        }
    }
}
{"train_lr": "0.000", "train_loss": "5.582", "train_loss_itc": "1.492", "train_loss_itm": "0.402", "train_loss_lm": "3.688"}
{"train_lr": "0.000", "train_loss": "4.538", "train_loss_itc": "1.097", "train_loss_itm": "0.266", "train_loss_lm": "3.174"}
{"train_lr": "0.000", "train_loss": "4.288", "train_loss_itc": "1.035", "train_loss_itm": "0.222", "train_loss_lm": "3.031"}
{"train_lr": "0.000", "train_loss": "4.110", "train_loss_itc": "0.993", "train_loss_itm": "0.192", "train_loss_lm": "2.925"}
{"train_lr": "0.000", "train_loss": "3.978", "train_loss_itc": "0.963", "train_loss_itm": "0.171", "train_loss_lm": "2.844"}

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do I evaluate the results after stage-1 of training BLIP2? #774

How do I evaluate the results after stage-1 of training BLIP2? #774

hawkiyc commented Dec 12, 2024

How do I evaluate the results after stage-1 of training BLIP2? #774

How do I evaluate the results after stage-1 of training BLIP2? #774

Comments

hawkiyc commented Dec 12, 2024