Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
…into internal_main_dev
  • Loading branch information
Luodian committed Jun 12, 2024
2 parents e43bd84 + d99a24a commit 465bd42
Show file tree
Hide file tree
Showing 71 changed files with 3,517 additions and 29 deletions.
56 changes: 56 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# For the main pipeline structure-related code, we maintain the original license provided with lm-evaluation-harness, which is the MIT License.

MIT License

Copyright (c) 2024 LMMs-Lab

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

# For the multimodal models and datasets that we have added (defined as code in the lmms_eval/tasks and lmms_eval/models folders), we apply the Apache License.

Apache 2.0 License

Copyright (c) 2024 LMMs-Lab

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

When modifying the code, please include the following information about the original lmms-eval source:
# Adopted from lmms-eval from https://github.com/EvolvingLMMs-Lab/lmms-eval. Below is the original copyright:
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
29 changes: 28 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
🏠 [LMMs-Lab Homepage](https://lmms-lab.github.io/) | 🎉 [Blog](https://lmms-lab.github.io/lmms-eval-blog/lmms-eval-0.1/) | 📚 [Documentation](docs/README.md) | 🤗 [Huggingface Datasets](https://huggingface.co/lmms-lab) | <a href="https://emoji.gg/emoji/1684-discord-thread"><img src="https://cdn3.emoji.gg/emojis/1684-discord-thread.png" width="14px" height="14px" alt="Discord_Thread"></a> [discord/lmms-eval](https://discord.gg/zdkwKUqrPy)


In today's world, we're on an exciting journey toward creating Artificial General Intelligence (AGI), much like the enthusiasm of the 1960s moon landing. This journey is powered by advanced large language models (LLMs) and large multimodal models (LMMs), which are complex systems capable of understanding, learning, and performing a wide variety of human tasks. These advancements bring us closer to achieving AGI.
In today's world, we're on an exciting journey toward creating Artificial General Intelligence (AGI), much like the enthusiasm of the 1960s moon landing. This journey is powered by advanced large language models (LLMs) and large multimodal models (LMMs), which are complex systems capable of understanding, learning, and performing a wide variety of human tasks.

To gauge how advanced these models are, we use a variety of evaluation benchmarks. These benchmarks are tools that help us understand the capabilities of these models, showing us how close we are to achieving AGI. However, finding and using these benchmarks is a big challenge. The necessary benchmarks and datasets are spread out and hidden in various places like Google Drive, Dropbox, and different school and research lab websites. It feels like we're on a treasure hunt, but the maps are scattered everywhere.

Expand Down Expand Up @@ -163,6 +163,7 @@ We also provide the raw data exported from Weights & Biases for the detailed res
- COCO 2017 Caption (coco2017_cap)
- COCO 2017 Caption MiniVal (coco2017_cap_val)
- COCO 2017 Caption MiniTest (coco2017_cap_test)
- [ConBench](https://github.com/foundation-multimodal-models/ConBench) (conbench)
- DOCVQA (docvqa)
- DOCVQA Validation (docvqa_val)
- DOCVQA Test (docvqa_test)
Expand All @@ -176,6 +177,13 @@ We also provide the raw data exported from Weights & Biases for the detailed res
- Infographic VQA Test (info_vqa_test)
- LLaVA-Bench (llava_in_the_wild)
- LLaVA-Bench-COCO (llava_bench_coco)
- MathVerse (mathverse)
- MathVerse Text Dominant (mathverse_testmini_text_dominant)
- MathVerse Text Only (mathverse_testmini_text_only)
- MathVerse Text Lite (mathverse_testmini_text_lite)
- MathVerse Vision Dominant (mathverse_testmini_vision_dominant)
- MathVerse Vision Intensive (mathverse_testmini_vision_intensive)
- MathVerse Vision Only (mathverse_testmini_vision_only)
- MathVista (mathvista)
- MathVista Validation (mathvista_testmini)
- MathVista Test (mathvista_test)
Expand All @@ -190,6 +198,19 @@ We also provide the raw data exported from Weights & Biases for the detailed res
- MMMU (mmmu)
- MMMU Validation (mmmu_val)
- MMMU Test (mmmu_test)
- MMUPD (mmupd)
- MMUPD Base (mmupd_base)
- MMAAD Base (mmaad_base)
- MMIASD Base (mmiasd_base)
- MMIVQD Base (mmivqd_base)
- MMUPD Option (mmupd_option)
- MMAAD Option (mmaad_option)
- MMIASD Option (mmiasd_option)
- MMIVQD Option (mmivqd_option)
- MMUPD Instruction (mmupd_instruction)
- MMAAD Instruction (mmaad_instruction)
- MMIASD Instruction (mmiasd_instruction)
- MMIVQD Instruction (mmivqd_instruction)
- MMVet (mmvet)
- Multi-DocVQA (multidocvqa)
- Multi-DocVQA Validation (multidocvqa_val)
Expand Down Expand Up @@ -226,6 +247,9 @@ We also provide the raw data exported from Weights & Biases for the detailed res
- ScienceQA (scienceqa_full)
- ScienceQA Full (scienceqa)
- ScienceQA IMG (scienceqa_img)
- ScreenSpot (screenspot)
- ScreenSpot REC / Grounding (screenspot_rec)
- ScreenSpot REG / Instruction Generation (screenspot_reg)
- SeedBench (seedbench)
- SeedBench 2 (seedbench_2)
- ST-VQA (stvqa)
Expand All @@ -241,6 +265,9 @@ We also provide the raw data exported from Weights & Biases for the detailed res
- VQAv2 (vqav2)
- VQAv2 Validation (vqav2_val)
- VQAv2 Test (vqav2_test)
- WebSRC (websrc)
- WebSRC Validation (websrc_val)
- WebSRC Test (websrc_test)

## Datasets to be added and tested
- TallyQA (tallyqa)
Expand Down
1 change: 1 addition & 0 deletions lmms_eval/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
"llava_onevision": "Llava_OneVision",
"from_log": "FromLog",
"mplug_owl_video": "mplug_Owl",
"phi3v": "Phi3v",
}

for model_name, model_class in AVAILABLE_MODELS.items():
Expand Down
1 change: 1 addition & 0 deletions lmms_eval/models/idefics2.py
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,7 @@ def _collate(x):
gen_kwargs["max_new_tokens"] = 1024
if "temperature" not in gen_kwargs:
gen_kwargs["temperature"] = 0

prompts = []
for context, visual in zip(contexts, visuals):
content = []
Expand Down
15 changes: 2 additions & 13 deletions lmms_eval/models/llava.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,19 +26,11 @@
try:
from llava.model.builder import load_pretrained_model
from llava.mm_utils import get_model_name_from_path, process_images, tokenizer_image_token
from llava.constants import IMAGE_TOKEN_INDEX, DEFAULT_IMAGE_TOKEN, DEFAULT_IM_START_TOKEN, DEFAULT_IM_END_TOKEN, IGNORE_INDEX
from llava.conversation import conv_templates, SeparatorStyle
from llava.constants import IMAGE_TOKEN_INDEX, DEFAULT_IMAGE_TOKEN
from llava.conversation import conv_templates
except Exception as e:
eval_logger.debug("LLaVA is not installed. Please install LLaVA to use this model.\nError: %s" % e)

from transformers.integrations.deepspeed import (
is_deepspeed_zero3_enabled,
set_hf_deepspeed_config,
unset_hf_deepspeed_config,
)

from transformers.utils import is_flash_attn_2_available

# inference implementation for attention, can be "sdpa", "eager", "flash_attention_2". Seems FA2 is not effective during inference: https://discuss.huggingface.co/t/flash-attention-has-no-effect-on-inference/73453/5
# if is_flash_attn_2_available:
# best_fit_attn_implementation = "flash_attention_2" # flash_attn has a bug that says: ERROR Error query and key must have the same dtype in generating
Expand All @@ -60,10 +52,7 @@ def __init__(
pretrained: str = "liuhaotian/llava-v1.5-7b",
truncation: Optional[bool] = True,
device: Optional[str] = "cuda:0",
dtype: Optional[Union[str, torch.dtype]] = "auto",
batch_size: Optional[Union[int, str]] = 1,
trust_remote_code: Optional[bool] = False,
revision=None,
model_name=None,
attn_implementation=best_fit_attn_implementation,
device_map="cuda:0",
Expand Down
38 changes: 28 additions & 10 deletions lmms_eval/models/llava_hf.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from accelerate import Accelerator, DistributedType
from accelerate.state import AcceleratorState
from typing import List, Optional, Union, Tuple
from transformers import LlavaForConditionalGeneration, AutoProcessor
from transformers import LlavaForConditionalGeneration, LlavaNextForConditionalGeneration, AutoProcessor

import warnings

Expand All @@ -31,10 +31,10 @@ class LlavaHf(lmms):
Example usage:
accelerate launch --num_processes=8 -m lmms_eval \
accelerate launch --num_processes=8 --main_process_port 12345 -m lmms_eval \
--model llava_hf \
--model_args pretrained=llava-hf/llava-1.5-7b-hf \
--tasks mme \
--tasks seedbench \
--batch_size 1 \
--output_path ./logs/ \
--log_samples
Expand Down Expand Up @@ -67,7 +67,16 @@ def __init__(
self.device_map = device_map
if isinstance(dtype, str) and dtype != "auto":
dtype = getattr(torch, dtype)
self._model = LlavaForConditionalGeneration.from_pretrained(pretrained, revision=revision, torch_dtype=dtype, device_map=self.device_map, trust_remote_code=trust_remote_code, attn_implementation=attn_implementation)

if "1.5" in pretrained:
self._model = LlavaForConditionalGeneration.from_pretrained(pretrained, revision=revision, torch_dtype=dtype, device_map=self.device_map, trust_remote_code=trust_remote_code, attn_implementation=attn_implementation)
elif "1.6" in pretrained:
self._model = LlavaNextForConditionalGeneration.from_pretrained(pretrained, revision=revision, torch_dtype=dtype, device_map=self.device_map, trust_remote_code=trust_remote_code, attn_implementation=attn_implementation)
else:
eval_logger.info("Not sure whether you use 1.5 or 1.6. Use 1.5 by default. This might cause bugs if you are actually using 1.6")
self._model = LlavaForConditionalGeneration.from_pretrained(pretrained, revision=revision, torch_dtype=dtype, device_map=self.device_map, trust_remote_code=trust_remote_code, attn_implementation=attn_implementation)

self.pretrained = pretrained
self._image_processor = AutoProcessor.from_pretrained(pretrained, revision=revision, trust_remote_code=trust_remote_code)
# Pad from left for batched generation: https://huggingface.co/docs/transformers/v4.39.3/en/model_doc/llava#usage-tips
self._image_processor.tokenizer.padding_side = "left"
Expand Down Expand Up @@ -106,6 +115,7 @@ def __init__(
self.model.to(self._device)
self._rank = 0
self._word_size = 1
self.accelerator = accelerator

@property
def config(self):
Expand Down Expand Up @@ -199,8 +209,8 @@ def loglikelihood(self, requests: List[Instance]) -> List[Tuple[float, bool]]:
labels[: len(contxt_id)] = -100

if self.accelerator.is_main_process and doc_id % 100 == 0:
eval_logger.info(f"Prompt for doc ID {doc_id}:\n\n{formatted_contexts[0]}\n")
eval_logger.info(f"Prompt and continuation for doc ID {doc_id}:\n\n{formatted_continuation[0]}\n")
eval_logger.debug(f"Prompt for doc ID {doc_id}:\n\n{formatted_contexts[0]}\n")
eval_logger.debug(f"Prompt and continuation for doc ID {doc_id}:\n\n{formatted_continuation[0]}\n")

with torch.inference_mode():
outputs = self.model(**model_inputs, labels=labels)
Expand Down Expand Up @@ -268,7 +278,9 @@ def _collate(x):

# Some benchmarks like MME do not contain image tokens, so we prepend them to the prompt.
if DEFAULT_IMAGE_TOKEN not in context:
context = f"{DEFAULT_IMAGE_TOKEN}\n{context}"
image_tokens = [DEFAULT_IMAGE_TOKEN] * len(visuals)
image_tokens = " ".join(image_tokens)
context = f"{image_tokens}\n{context}"
# Apply chat template
messages = [{"role": "user", "content": context}]
if self.chat_template is not None:
Expand All @@ -281,7 +293,7 @@ def _collate(x):
text = self.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

if self.accelerator.is_main_process and doc_id[0] % 100 == 0:
eval_logger.info(f"Prompt for doc ID {doc_id[0]}:\n\n{text}\n")
eval_logger.debug(f"Prompt for doc ID {doc_id[0]}:\n\n{text}\n")

inputs = self._image_processor(images=visuals, text=text, return_tensors="pt").to(self._device, self.model.dtype)

Expand All @@ -303,15 +315,21 @@ def _collate(x):
num_beams=gen_kwargs["num_beams"],
max_new_tokens=gen_kwargs["max_new_tokens"],
use_cache=self.use_cache,
pad_token_id=self.tokenizer.eos_token_id,
)
except Exception as e:
eval_logger.error(f"Error {e} in generating")
cont = ""
text_outputs = self.tokenizer.batch_decode(cont, skip_special_tokens=True)[0]
text_outputs = text_outputs.split("ASSISTANT:")[-1].strip()
if "1.5" in self.pretrained:
text_outputs = text_outputs.split("ASSISTANT:")[-1].strip()
elif "mistral" in self.pretrained:
text_outputs = text_outputs.split("[/INST]")[-1].strip()
else:
text_outputs = text_outputs.split("ASSISTANT:")[-1].strip()

if self.accelerator.is_main_process and doc_id[0] % 100 == 0:
eval_logger.info(f"Generated text for doc ID {doc_id[0]}:\n\n{text_outputs}\n")
eval_logger.debug(f"Generated text for doc ID {doc_id[0]}:\n\n{text_outputs}\n")

res.append(text_outputs)
self.cache_hook.add_partial("generate_until", (context, gen_kwargs), text_outputs)
Expand Down
Loading

0 comments on commit 465bd42

Please sign in to comment.