This document provides instructions on evaluating Osprey on four representative tasks, including open-vocabulary segmentation, referring object classification, detailed region description and region level captioning.
We have developed two types of models:the first is Osprey, the second is Osprey-Chat(denote Osprey*
in our paper). Osprey-Chat exhibits better conversation and image-level understanding&reasoning capabilities with additional llava data(llava_v1_5_mix665k.json).
- Download SentenceBERT model, which is used for calculating the semantic similarity.
- The evaluation is based on
detectron2
, please install the following dependences.
git clone https://github.com/facebookresearch/detectron2.git
python -m pip install -e detectron2
pip install git+https://github.com/cocodataset/panopticapi.git
pip install git+https://github.com/mcordts/cityscapesScripts.git
- Prepare datasets, please refer to Data preparation.
cd osprey/eval
python eval_open_vocab_seg_detectron2.py --dataset cityscapes --model path/to/osprey-7b --bert path/to/all-MiniLM-L6-v2
cd osprey/eval
python eval_open_vocab_seg_detectron2.py --dataset ade --model path/to/osprey-7b --bert path/to/all-MiniLM-L6-v2
- Download our generated lvis_val_1k_category.json (We randomly sample 1K images with 4,004 objects from LVIS dataset.)
cd osprey/eval
python lvis_paco_eval.py --model path/to/osprey-7b --bert path/to/all-MiniLM-L6-v2 --img path/to/coco-all-imgs --json lvis_val_1k_category.json
- Download our generated paco_val_1k_category.json (We randomly sample 1K images with 4,263 objects from PACO dataset.)
cd osprey/eval
python lvis_paco_eval.py --model path/to/osprey-7b --bert path/to/all-MiniLM-L6-v2 --img path/to/coco-all-imgs --json paco_val_1k_category.json
- Fill in the gpt interface in
eval_gpt.py
. - Change the path in
gpt_eval.sh
.
cd osprey/eval
sh gpt_eval.sh
Note that we have converted the boxes in box_refer_caption.json
and box_refer_reason.json
to polygon format denoted by segmentation
.
cd osprey/eval
python ferret_bench_eval.py --model_name path/to/osprey-chat-7b --root_path path/to/coco_imgs --json_path ./ferret_bench/box_refer_caption.json
cd osprey/eval
python ferret_bench_eval.py --model_name path/to/osprey-chat-7b --root_path path/to/coco_imgs --json_path ./ferret_bench/box_refer_reason.json
Then use GPT-4 to evaluate the result as in Ferret.
- Download coco from POPE and put under osprey/eval/pope.
- Change the path in
pope_eval.sh
.
cd osprey/eval
sh pope_eval.sh
- We fine-tune Osprey-7B on training set of RefCOCOg. The fintuned model can be found in Osprey-7B-refcocog-fintune.
- Download finetune_refcocog_val_with_mask.json.
- Generate output json files:
cd osprey/eval
python refcocog_eval.py --model path/to/Osprey-7B-refcocog-fintune --img path/to/coco-all-imgs --json finetune_refcocog_val_with_mask.json
- Finally, evaluate the output json file using
CaptionMetrics
.