Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do I evaluate the results after stage-1 of training BLIP2? #774

Open
hawkiyc opened this issue Dec 12, 2024 · 0 comments
Open

How do I evaluate the results after stage-1 of training BLIP2? #774

hawkiyc opened this issue Dec 12, 2024 · 0 comments

Comments

@hawkiyc
Copy link

hawkiyc commented Dec 12, 2024

Hi, developers,

I am revising your code to build a modified BLIP2 model for time-series input. Now, I am trying to figure out the architecture of this framework. I have tested the bash run_scripts/blip2/train/pretrain_stage1.sh command with the coco dataset (btw, there are mismatches between images and annotations in the vg dataset, so I removed it), and it seems to work fine. However, I cannot find any script or .yaml file for evaluation of the result of stage 1. I have checked the lavis/configs/datasets/coco/defaults_cap.yaml file, and there is information for train, val, and test subsets.

defaults_cap.yaml

datasets:
  coco_caption: # name of the dataset builder
    dataset_card: dataset_card/coco_caption.md
    # data_dir: ${env.data_dir}/datasets
    data_type: images # [images|videos|features]

    build_info:
      # Be careful not to append minus sign (-) before split to avoid itemizing
      annotations:
        train:
          url: https://storage.googleapis.com/sfr-vision-language-research/datasets/coco_karpathy_train.json
          md5: aa31ac474cf6250ebb81d18348a07ed8
          storage: coco/annotations/coco_karpathy_train.json
        val:
          url: https://storage.googleapis.com/sfr-vision-language-research/datasets/coco_karpathy_val.json
          md5: b273847456ef5580e33713b1f7de52a0
          storage:  coco/annotations/coco_karpathy_val.json
        test:
          url: https://storage.googleapis.com/sfr-vision-language-research/datasets/coco_karpathy_test.json
          md5: 3ff34b0ef2db02d01c37399f6a2a6cd1
          storage: coco/annotations/coco_karpathy_test.json
      images:
        storage: coco/images/

Here is the printed result in the terminal:

Train: data epoch: [4]  [5550/5667]  eta: 0:03:26  lr: 0.000019  loss: 4.0731  loss_itc: 0.9712 (0.9633)  loss_itm: 0.1881 (0.1714)  loss_lm: 2.8563 (2.8436)  time: 1.7917  data: 0.0000  max mem: 27191
Train: data epoch: [4]  [5600/5667]  eta: 0:01:58  lr: 0.000019  loss: 4.1341  loss_itc: 0.9485 (0.9633)  loss_itm: 0.1703 (0.1713)  loss_lm: 2.8336 (2.8436)  time: 1.7898  data: 0.0000  max mem: 27191
Train: data epoch: [4]  [5650/5667]  eta: 0:00:30  lr: 0.000019  loss: 3.8998  loss_itc: 0.9417 (0.9632)  loss_itm: 0.1509 (0.1713)  loss_lm: 2.8545 (2.8438)  time: 1.7882  data: 0.0000  max mem: 27191
Train: data epoch: [4]  [5666/5667]  eta: 0:00:01  lr: 0.000019  loss: 3.9018  loss_itc: 0.9507 (0.9632)  loss_itm: 0.1535 (0.1713)  loss_lm: 2.8405 (2.8438)  time: 1.8221  data: 0.0000  max mem: 27191
Train: data epoch: [4] Total time: 2:47:07 (1.7694 s / it)
INFO - 2024-12-12 03:24:12,536 - base_task - Averaged stats: lr: 0.0000  loss: 3.9783  loss_itc: 0.9632  loss_itm: 0.1713  loss_lm: 2.8438
INFO - 2024-12-12 03:24:12,543 - runner_base - No validation splits found.
INFO - 2024-12-12 03:24:12,598 - runner_base - Saving checkpoint at epoch 4 to /home/revlis_ai/Documents/training_models_temp/LAVIS_with_JoLT/lavis/output/BLIP2/Pretrain_stage1/20241211132/checkpoint_4.pth.
INFO - 2024-12-12 03:24:15,828 - runner_base - Saving checkpoint at epoch 4 to /home/revlis_ai/Documents/training_models_temp/LAVIS_with_JoLT/lavis/output/BLIP2/Pretrain_stage1/20241211132/checkpoint_4.pth.
INFO - 2024-12-12 03:24:23,201 - runner_base - No validation splits found.
INFO - 2024-12-12 03:24:23,203 - runner_base - Training time 13:55:33
[rank0]:[W1212 03:24:24.182641511 ProcessGroupNCCL.cpp:1168] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())

Output log file

{
    "run": {
        "task": "image_text_pretrain",
        "lr_sched": "linear_warmup_cosine_lr",
        "init_lr": 0.0001,
        "min_lr": 1e-05,
        "warmup_lr": 1e-06,
        "weight_decay": 0.05,
        "max_epoch": 5,
        "batch_size_train": 100,
        "batch_size_eval": 64,
        "num_workers": 4,
        "warmup_steps": 5000,
        "seed": 42,
        "output_dir": "output/BLIP2/Pretrain_stage1",
        "amp": true,
        "resume_ckpt_path": null,
        "evaluate": false,
        "train_splits": [
            "train"
        ],
        "device": "cuda",
        "world_size": 1,
        "dist_url": "env://",
        "distributed": true,
        "rank": 0,
        "gpu": 0,
        "dist_backend": "nccl"
    },
    "model": {
        "arch": "blip2",
        "load_finetuned": false,
        "pretrained": "https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/BLIP2/blip2_pretrained.pth",
        "finetuned": "",
        "image_size": 224,
        "drop_path_rate": 0,
        "use_grad_checkpoint": false,
        "vit_precision": "fp16",
        "freeze_vit": true,
        "num_query_token": 32,
        "model_type": "pretrain",
        "load_pretrained": false
    },
    "preprocess": {
        "vis_processor": {
            "train": {
                "name": "blip_image_train",
                "image_size": 224
            },
            "eval": {
                "name": "blip_image_eval",
                "image_size": 224
            }
        },
        "text_processor": {
            "train": {
                "name": "blip_caption"
            },
            "eval": {
                "name": "blip_caption"
            }
        }
    },
    "datasets": {
        "coco_caption": {
            "dataset_card": "dataset_card/coco_caption.md",
            "data_type": "images",
            "build_info": {
                "annotations": {
                    "train": {
                        "url": "https://storage.googleapis.com/sfr-vision-language-research/datasets/coco_karpathy_train.json",
                        "md5": "aa31ac474cf6250ebb81d18348a07ed8",
                        "storage": "coco/annotations/coco_karpathy_train.json"
                    },
                    "val": {
                        "url": "https://storage.googleapis.com/sfr-vision-language-research/datasets/coco_karpathy_val.json",
                        "md5": "b273847456ef5580e33713b1f7de52a0",
                        "storage": "coco/annotations/coco_karpathy_val.json"
                    },
                    "test": {
                        "url": "https://storage.googleapis.com/sfr-vision-language-research/datasets/coco_karpathy_test.json",
                        "md5": "3ff34b0ef2db02d01c37399f6a2a6cd1",
                        "storage": "coco/annotations/coco_karpathy_test.json"
                    }
                },
                "images": {
                    "storage": "coco/images/"
                }
            },
            "vis_processor": {
                "train": {
                    "name": "blip2_image_train",
                    "image_size": 224
                }
            },
            "text_processor": {
                "train": {
                    "name": "blip_caption"
                }
            }
        }
    }
}
{"train_lr": "0.000", "train_loss": "5.582", "train_loss_itc": "1.492", "train_loss_itm": "0.402", "train_loss_lm": "3.688"}
{"train_lr": "0.000", "train_loss": "4.538", "train_loss_itc": "1.097", "train_loss_itm": "0.266", "train_loss_lm": "3.174"}
{"train_lr": "0.000", "train_loss": "4.288", "train_loss_itc": "1.035", "train_loss_itm": "0.222", "train_loss_lm": "3.031"}
{"train_lr": "0.000", "train_loss": "4.110", "train_loss_itc": "0.993", "train_loss_itm": "0.192", "train_loss_lm": "2.925"}
{"train_lr": "0.000", "train_loss": "3.978", "train_loss_itc": "0.963", "train_loss_itm": "0.171", "train_loss_lm": "2.844"}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant