- Extract contents of
ViP-Bench
to./playground/data/eval/ViP-Bench
. - Single-GPU inference and evaluate for bbox and human drawn visual prompts, respectively.
CUDA_VISIBLE_DEVICES=0 bash scripts/eval/vipbench.sh bbox
CUDA_VISIBLE_DEVICES=0 bash scripts/eval/vipbench.sh human
Optionally, Change the model name from vip-llava-7b
to other LLaVA or ViP-LLaVA models.
- Submit the results to the evaluation server:
./playground/data/eval/ViP-Bench/results/vip-llava-7b-human.json
.
Optionally, see here, which is an evaluation script using your own openai key.
In source_image
, we provide the source plain images along with the bounding box/mask annotations. Researchers can use such grounding information to match the special tokens such as <obj>
in "question"
entry of vip-bench-meta-data.json
. For example, <obj>
can be replaced by textual coordinates to evaluate the region-level multimodal models.
Please download the evaluation json
dataset here.
CUDA_VISIBLE_DEVICES=0 bash scripts/eval/v7w.sh
CUDA_VISIBLE_DEVICES=0 bash scripts/eval/pointQA.sh
For Q -> A:
CUDA_VISIBLE_DEVICES=0 bash scripts/eval/vcr-qa.sh
For QA -> R:
CUDA_VISIBLE_DEVICES=0 bash scripts/eval/vcr-qar.sh