update AI2D accuracy

open-compass · kennymckormick · Jan 22, 2024 · Jan 20, 2024 · Jan 20, 2024 · Jan 21, 2024
commit 412c67c7c6ea2aae3b3c6111f678409e487c8ea4
diff --git a/results/AI2D.md b/results/AI2D.md
@@ -0,0 +1,39 @@
+# AI2D Evaluation Results
+
+> During evaluation, we use `GPT-3.5-Turbo-0613` as the choice extractor for all VLMs if the choice can not be extracted via heuristic matching. **Zero-shot** inference is adopted. 
+
+## AI2D Accuracy
+
+| Model                       |   overall |
+|:----------------------------|----------:|
+| Monkey-Chat                 |      72.6 |
+| GPT-4v (detail: low)        |      71.3 |
+| Qwen-VL-Chat                |      68.5 |
+| Monkey                      |      67.6 |
+| GeminiProVision             |      66.7 |
+| QwenVLPlus                  |      63.7 |
+| Qwen-VL                     |      63.4 |
+| LLaVA-InternLM2-20B (QLoRA) |      61.4 |
+| CogVLM-17B-Chat             |      60.3 |
+| ShareGPT4V-13B              |      59.3 |
+| TransCore-M                 |      59.2 |
+| LLaVA-v1.5-13B (QLoRA)      |      59   |
+| LLaVA-v1.5-13B              |      57.9 |
+| ShareGPT4V-7B               |      56.7 |
+| InternLM-XComposer-VL       |      56.1 |
+| LLaVA-InternLM-7B (QLoRA)   |      56   |
+| LLaVA-v1.5-7B (QLoRA)       |      55.2 |
+| mPLUG-Owl2                  |      55.2 |
+| SharedCaptioner             |      55.1 |
+| IDEFICS-80B-Instruct        |      54.4 |
+| LLaVA-v1.5-7B               |      54.1 |
+| PandaGPT-13B                |      49.2 |
+| LLaVA-v1-7B                 |      47.8 |
+| IDEFICS-9B-Instruct         |      42.7 |
+| InstructBLIP-7B             |      40.2 |
+| VisualGLM                   |      40.2 |
+| InstructBLIP-13B            |      38.6 |
+| MiniGPT-4-v1-13B            |      33.4 |
+| OpenFlamingo v2             |      30.7 |
+| MiniGPT-4-v2                |      29.4 |
+| MiniGPT-4-v1-7B             |      28.7 |