[Result] Update XTuner Performance (open-compass#31)

* update report_missing * update results * update
shan23chen · Dec 28, 2023 · f1d0ce4 · f1d0ce4
1 parent 3601f1b
commit f1d0ce4
Show file tree

Hide file tree

Showing 5 changed files with 110 additions and 100 deletions.
diff --git a/results/MME.md b/results/MME.md
@@ -8,38 +8,40 @@ In each cell, we list `vanilla score / ChatGPT Answer Extraction Score` if the t
 
 VLMs are sorted by the descending order of Total score.
 
-| Model                |       Total |  perception | reasoning |
-| :------------------- | ----------: | ----------: | --------: |
-| Full                 |        2800 |        2000 |       800 |
-| GeminiProVision      | 2131 / 2149 | 1601 / 1609 | 530 / 540 |
-| XComposer            |        1874 |        1497 |       377 |
-| qwen_chat            | 1849 / 1860 | 1457 / 1468 |       392 |
-| sharegpt4v_7b        | 1799 / 1808 |        1491 | 308 / 318 |
-| llava_v1.5_13b       | 1800 / 1805 | 1485 / 1490 |       315 |
-| mPLUG-Owl2           | 1781 / 1786 | 1435 / 1436 | 346 / 350 |
-| llava_v1.5_7b        |        1775 |        1490 |       285 |
-| GPT-4v (detail: low) | 1737 / 1771 | 1300 / 1334 |       437 |
-| TransCore_M          | 1682 / 1701 | 1427 / 1429 | 254 / 272 |
-| instructblip_13b     | 1624 / 1646 | 1381 / 1383 | 243 / 263 |
-| idefics_80b_instruct | 1507 / 1519 | 1276 / 1285 | 231 / 234 |
-| instructblip_7b      | 1313 / 1391 | 1084 / 1137 | 229 / 254 |
-| idefics_9b_instruct  |        1177 |         942 |       235 |
-| PandaGPT_13B         |        1072 |         826 |       246 |
-| MiniGPT-4-v1-13B     |  648 / 1067 |   533 / 794 | 115 / 273 |
-| MiniGPT-4-v1-7B      |  806 / 1048 |   622 / 771 | 184 / 277 |
-| llava_v1_7b          | 1027 / 1044 |   793 / 807 | 234 / 238 |
-| MiniGPT-4-v2         |         968 |         708 |       260 |
-| VisualGLM_6b         |         738 |         628 |       110 |
-| flamingov2           |         607 |         535 |        72 |
-| qwen_base            |     6 / 483 |     0 / 334 |   6 / 149 |
+| Model                         | Total       | Perception   | Reasoning   |
+|:------------------------------|:------------|:-------------|:------------|
+| GeminiProVision               | 2131 / 2149 | 1601 / 1609  | 530 / 540   |
+| InternLM-XComposer-VL         | 1874        | 1497         | 377         |
+| Qwen-VL-Chat                  | 1849 / 1860 | 1457 / 1468  | 392         |
+| ShareGPT4V-7B                 | 1799 / 1808 | 1491         | 308 / 317   |
+| LLaVA-v1.5-13B                | 1800 / 1805 | 1485 / 1490  | 315         |
+| mPLUG-Owl2                    | 1781 / 1786 | 1435 / 1436  | 346 / 350   |
+| LLaVA-v1.5-7B                 | 1775        | 1490         | 285         |
+| GPT-4v (detail: low)          | 1737 / 1771 | 1300 / 1334  | 437         |
+| LLaVA-v1.5-13B (LoRA, XTuner) | 1766        | 1475         | 291         |
+| LLaVA-v1.5-7B (LoRA, XTuner)  | 1716        | 1434         | 282         |
+| TransCore-M                   | 1681 / 1701 | 1427 / 1429  | 254 / 272   |
+| instructblip_13b              | 1624 / 1646 | 1381 / 1383  | 243 / 263   |
+| LLaVA-InternLM-7B (LoRA)      | 1637        | 1393         | 244         |
+| IDEFICS-80B-Instruct          | 1507 / 1519 | 1276 / 1285  | 231 / 234   |
+| InstructBLIP-7B               | 1313 / 1391 | 1084 / 1137  | 229 / 254   |
+| IDEFICS-9B-Instruct           | 1177        | 942          | 235         |
+| PandaGPT-13B                  | 1072        | 826          | 246         |
+| MiniGPT-4-v1-13B              | 648 / 1067  | 533 / 794    | 115 / 273   |
+| MiniGPT-4-v1-7B               | 806 / 1048  | 622 / 771    | 184 / 277   |
+| LLaVA-v1-7B                   | 1027 / 1044 | 793 / 807    | 234 / 237   |
+| MiniGPT-4-v2                  | 968         | 708          | 260         |
+| VisualGLM                     | 738         | 628          | 110         |
+| OpenFlamingo v2               | 607         | 535          | 72          |
+| Qwen-VL                       | 6 / 483     | 0 / 334      | 6 / 149     |
 
 ### Comments
 
 For most VLMs, using ChatGPT as the answer extractor or not may not significantly affect the final score. However, for some VLMs including instructblip_7b, MiniGPT-4-v1, and qwen_base, the score improvement with ChatGPT answer extractor is significant. The table below demonstrates the score gap between two answer extraction strategies: 
 
 | MME Score Improvement with ChatGPT Answer Extractor | Models                                                       |
 | --------------------------------------------------- | ------------------------------------------------------------ |
-| **No (0)**                                          | XComposer, llava_v1.5_7b, idefics_9b_instruct, PandaGPT_13B, MiniGPT-4-v2, <br>VisualGLM_6b, flamingov2 |
+| **No (0)**                                          | XComposer, llava_v1.5_7b, idefics_9b_instruct, PandaGPT_13B, MiniGPT-4-v2, <br>VisualGLM_6b, flamingov2, LLaVA-XTuner Series |
 | **Minor (1~20)**                                    | qwen_chat (11), llava_v1.5_13b (5), mPLUG-Owl2 (5), idefics_80b_instruct (12), llava_v1_7b (17), <br>sharegpt4v_7b (9), TransCore_M (19), GeminiProVision (18) |
 | **Moderate (21~100)**                               | instructblip_13b (22), instructblip_7b (78), GPT-4v (34)     |
 | **Huge (> 100)**                                    | MiniGPT-4-v1-7B (242), MiniGPT-4-v1-13B (419), qwen_base (477) |
diff --git a/results/MMMU.md b/results/MMMU.md
@@ -11,28 +11,29 @@
 
 ### MMMU Scores
 
-| Model                |   Overall<br>(Val) |   Art & Design<br>(Val) |   Business<br>(Val) |   Science<br>(Val) |   Health & Medicine<br>(Val) |   Humanities & Social Science<br>(Val) |   Tech & Engineering<br>(Val) |   Overall<br>(Dev) |
-|:---------------------|-------------------:|------------------------:|--------------------:|-------------------:|-----------------------------:|---------------------------------------:|------------------------------:|-------------------:|
-| GPT-4v         |               53.8 |                    66.7 |                60   |               46   |                         54.7 |                                   71.7 |                          36.7 |               52.7 |
-| GeminiProVision      |               48.4 |                    59.2 |                36   |               42   |                         52   |                                   66.7 |                          42.9 |               54   |
-| qwen_chat            |               37.6 |                    49.2 |                36   |               28   |                         32.7 |                                   55.8 |                          31.9 |               30   |
-| llava_v1.5_13b       |               36.8 |                    49.2 |                23.3 |               36   |                         34   |                                   51.7 |                          33.3 |               42   |
-| sharegpt4v_7b        |               36.7 |                    50   |                27.3 |               26.7 |                         37.3 |                                   50   |                          34.8 |               30   |
-| TransCore_M          |               36.6 |                    54.2 |                32   |               27.3 |                         32   |                                   49.2 |                          32.4 |               38.7 |
-| llava_v1.5_7b        |               36.1 |                    45.8 |                25.3 |               34   |                         32   |                                   48.3 |                          35.7 |               38.7 |
-| XComposer            |               35.7 |                    45.8 |                28.7 |               22.7 |                         30.7 |                                   53.3 |                          37.6 |               36.7 |
-| mPLUG-Owl2           |               34.6 |                    47.5 |                26   |               21.3 |                         37.3 |                                   50   |                          31.9 |               40.7 |
-| instructblip_13b     |               32.9 |                    37.5 |                29.3 |               32   |                         28.7 |                                   37.5 |                          33.8 |               30   |
-| PandaGPT_13B         |               32.7 |                    42.5 |                35.3 |               30   |                         29.3 |                                   45.8 |                          21.9 |               26.7 |
-| llava_v1_7b          |               32.1 |                    31.7 |                24.7 |               31.3 |                         32   |                                   37.5 |                          35.2 |               33.3 |
-| instructblip_7b      |               30.4 |                    38.3 |                28   |               22   |                         30.7 |                                   39.2 |                          28.6 |               24   |
-| VisualGLM_6b         |               28.9 |                    30   |                24   |               28   |                         28   |                                   40.8 |                          26.2 |               28.7 |
-| qwen_base            |               28.8 |                    43.3 |                18.7 |               25.3 |                         32.7 |                                   42.5 |                          19.5 |               29.3 |
-| flamingov2           |               28.2 |                    27.5 |                30   |               28.7 |                         28   |                                   33.3 |                          24.3 |               21.3 |
-| Frequent Choice | 26.8 |  |  |  |  |  |  |  |
-| MiniGPT-4-v1-13B     |               26.2 |                    33.3 |                19.3 |               28.7 |                         26   |                                   34.2 |                          21   |               23.3 |
-| idefics_80b_instruct |               25.1 |                    39.2 |                17.3 |               23.3 |                         24   |                                   48.3 |                          11.4 |               23.3 |
-| MiniGPT-4-v2         |               24.6 |                    27.5 |                22.7 |               21.3 |                         28   |                                   33.3 |                          19   |               32   |
-| MiniGPT-4-v1-7B      |               23   |                    32.5 |                27.3 |               18.7 |                         17.3 |                                   15   |                          26.2 |               19.3 |
-| Random Choice | 22.1 |  |  |  |  |  |  |  |
-| idefics_9b_instruct  |               19.6 |                    22.5 |                11.3 |               20.7 |                         23.3 |                                   31.7 |                          13.3 |               20   |
+| Model                         |   Overall<br>(Val) |   Art & Design<br>(Val) |   Business<br>(Val) |   Science<br>(Val) |   Health & Medicine<br>(Val) |   Humanities & Social Science<br>(Val) |   Tech & Engineering<br>(Val) |   Overall<br>(Dev) |
+|:------------------------------|-------------------:|------------------------:|--------------------:|-------------------:|-----------------------------:|---------------------------------------:|------------------------------:|-------------------:|
+| GPT-4v (detail: low)          |               53.8 |                    66.7 |                60   |               46   |                         54.7 |                                   71.7 |                          36.7 |               52.7 |
+| GeminiProVision               |               48.4 |                    59.2 |                36   |               42   |                         52   |                                   66.7 |                          42.9 |               54   |
+| Qwen-VL-Chat                  |               37.6 |                    49.2 |                36   |               28   |                         32.7 |                                   55.8 |                          31.9 |               30   |
+| LLaVA-InternLM-7B (LoRA)      |               37   |                    44.2 |                32   |               29.3 |                         38.7 |                                   47.5 |                          34.8 |               43.3 |
+| LLaVA-v1.5-13B                |               36.8 |                    49.2 |                23.3 |               36   |                         34   |                                   51.7 |                          33.3 |               42   |
+| ShareGPT4V-7B                 |               36.7 |                    50   |                27.3 |               26.7 |                         37.3 |                                   50   |                          34.8 |               30   |
+| TransCore-M                   |               36.6 |                    54.2 |                32   |               27.3 |                         32   |                                   49.2 |                          32.4 |               38.7 |
+| LLaVA-v1.5-7B                 |               36.1 |                    45.8 |                25.3 |               34   |                         32   |                                   48.3 |                          35.7 |               38.7 |
+| InternLM-XComposer-VL         |               35.7 |                    45.8 |                28.7 |               22.7 |                         30.7 |                                   53.3 |                          37.6 |               36.7 |
+| LLaVA-v1.5-13B (LoRA, XTuner) |               35.1 |                    40.8 |                30.7 |               26.7 |                         35.3 |                                   45   |                          35.2 |               43.3 |
+| mPLUG-Owl2                    |               34.6 |                    47.5 |                26   |               21.3 |                         37.3 |                                   50   |                          31.9 |               40.7 |
+| LLaVA-v1.5-7B (LoRA, XTuner)  |               33.7 |                    48.3 |                23.3 |               30   |                         32.7 |                                   46.7 |                          28.6 |               37.3 |
+| instructblip_13b              |               32.9 |                    37.5 |                29.3 |               32   |                         28.7 |                                   37.5 |                          33.8 |               30   |
+| PandaGPT-13B                  |               32.7 |                    42.5 |                35.3 |               30   |                         29.3 |                                   45.8 |                          21.9 |               26.7 |
+| LLaVA-v1-7B                   |               32.1 |                    31.7 |                24.7 |               31.3 |                         32   |                                   37.5 |                          35.2 |               33.3 |
+| InstructBLIP-7B               |               30.4 |                    38.3 |                28   |               22   |                         30.7 |                                   39.2 |                          28.6 |               24   |
+| VisualGLM                     |               28.9 |                    30   |                24   |               28   |                         28   |                                   40.8 |                          26.2 |               28.7 |
+| Qwen-VL                       |               28.8 |                    43.3 |                18.7 |               25.3 |                         32.7 |                                   42.5 |                          19.5 |               29.3 |
+| OpenFlamingo v2               |               28.2 |                    27.5 |                30   |               28.7 |                         28   |                                   33.3 |                          24.3 |               21.3 |
+| MiniGPT-4-v1-13B              |               26.2 |                    33.3 |                19.3 |               28.7 |                         26   |                                   34.2 |                          21   |               23.3 |
+| IDEFICS-80B-Instruct          |               25.1 |                    39.2 |                17.3 |               23.3 |                         24   |                                   48.3 |                          11.4 |               23.3 |
+| MiniGPT-4-v2                  |               24.6 |                    27.5 |                22.7 |               21.3 |                         28   |                                   33.3 |                          19   |               32   |
+| MiniGPT-4-v1-7B               |               23   |                    32.5 |                27.3 |               18.7 |                         17.3 |                                   15   |                          26.2 |               19.3 |
+| IDEFICS-9B-Instruct           |               19.6 |                    22.5 |                11.3 |               20.7 |                         23.3 |                                   31.7 |                          13.3 |               20   |