[Model] Adding support for MiniCPM-V #4087

HwwwwwwwH · 2024-04-15T10:28:31Z

Adding support for MiniCPM-V-2, please review.
HuggingFace Page: https://huggingface.co/openbmb/MiniCPM-V-2

NOTE: This model was added after the release of 0.5.3.post1, so it'll only be included in the next release (e.g. 0.5.4). If you want to use it now, please install vLLM from source (i.e. main branch).

HwwwwwwwH · 2024-04-16T03:35:16Z

There's an incompatible pip's dependency error, the questions are listed as follows:

MiniCPM-V need Timm package, where should I add this dependency requirement? I can see many different requirements files in the root dictionary of vllm.
Timm package needs torch==2.1.2, nvidia-nccl-cu12==2.18.1, triton==2.1.0, but these of vllm are torch==2.2.1, nvidia-nccl-cu12==2.19.3, triton==2.20. How can I solve this problem?

youkaichao · 2024-04-23T17:26:46Z

seems to be related with @ywang96 RFC #4194 on multi-modality models.

youkaichao · 2024-04-23T17:27:45Z

Timm package needs torch==2.1.2, nvidia-nccl-cu12==2.18.1, triton==2.1.0, but these of vllm are torch==2.2.1, nvidia-nccl-cu12==2.19.3, triton==2.20. How can I solve this problem?

We can't do anything until timm has the same dependency as vllm. Or you can try to remove timm dependency.

HwwwwwwwH · 2024-04-26T03:05:50Z

Sry, we were confused by this situation.
Actually, timm only requires torch >= 1.7 and we've add this dependency in requirements-common.txt.
Please review. @youkaichao @ywang96

esmeetu

Sorry for the delay review. Thanks for your contribution! Looks good to me and left some minor comments. But there are so many custom stuff that are hard to review carefully. IMO, it's better that you can encapsulate this into your own package and import it into vllm for better maintenance.

requirements-common.txt

vllm/model_executor/models/minicpmv.py

jeejeelee · 2024-05-16T02:56:59Z

@HwwwwwwwH Thanks for your excellent work, may I ask what is preventing the progress of this PR?

HwwwwwwwH · 2024-05-27T02:44:02Z

Very sry for late!! We've been working on the new VLM MiniCPM-V-2.5 last few days.

I've pushed the new commit according to the reviews. And I see some new features about VLM, is there any requirements for adapting these features?

Really sry~

jeejeelee · 2024-05-27T05:16:27Z

ping @ywang96

ZHANG-SH97 · 2024-08-01T02:37:09Z

@HwwwwwwwH I find Qwen2Model in init_llm, Are there any plans to release maybe minicpmv3-Qwen2 in the future? * v *

Howe-Young · 2024-08-01T06:36:25Z

It should have been fixed last night. Please update to the latest main branch.

thanks for your reply! The latest code can run normally, but there is a problem with Chinese output inference(contains some '<|eot_id|><|eot_id|>' characters), which is not available in English. What is the reason for this?
English output:

Chinese output:

DarkLight1337 · 2024-08-01T06:40:43Z

Could you show the input prompt for each case?

whyiug · 2024-08-01T07:02:49Z

It should have been fixed last night. Please update to the latest main branch.

thanks for your reply! The latest code can run normally, but there is a problem with Chinese output inference(contains some '<|eot_id|><|eot_id|>' characters), which is not available in English. What is the reason for this? English output: Chinese output:
@Howe-Young
perhaps you need add stop_tokens.

stop_token_ids = [tokenizer.eos_id, tokenizer.eot_id]
sampling_params = SamplingParams(
stop_token_ids=stop_token_ids,
)

Howe-Young · 2024-08-01T07:16:21Z

show the input prompt for each case?

English prompt:

question = "please describe the image in detail"
messages = [{
    'role': 'user',
    'content': f'(<image>./</image>)\n{question}'
}]
prompt = tokenizer.apply_chat_template(messages,
                                           tokenize=False,
                                           add_generation_prompt=True)

Chinese prompt:

question = "详细描述图片内容"
messages = [{
    'role': 'user',
    'content': f'(<image>./</image>)\n{question}'
}]
prompt = tokenizer.apply_chat_template(messages,
                                           tokenize=False,
                                           add_generation_prompt=True)

Howe-Young · 2024-08-01T07:21:13Z

stop_token_ids = [tokenizer.eos_id, tokenizer.eot_id]
sampling_params = SamplingParams(
stop_token_ids=stop_token_ids,
)

thanks, add stop_token_ids=stop_token_ids it works!

1223243 · 2024-08-02T07:23:29Z

你好！MiniCPMv2_5与 openai 兼容 API 一起使用效果很好，但目前似乎不支持某些 API 参数。例如，当我想根据问题获取 logprobs 时：
from openai import OpenAI
  client = OpenAI(
      base_url="http://localhost:8000/v1",
      api_key="token-abc123",
  )

  completion = client.chat.completions.create(
  model="openbmb/MiniCPM-Llama3-V-2_5",
  messages=[
      {"role": "user", "content": "Do you think 2 is larger than 1? Answer yes or no."}
  ],
  extra_body={
      "stop": ['<|eot_id|>'],
      "echo": True,
      "max_tokens": 1,
      "logprobs": True,
  }
  )
输出仅包含输出的 logprobs（例如，“yes”， logprob = “-0.0065”），不包含输入提示。那么如何解决这个问题呢？

请问一下，我使用pip install vllm 安装的vllm版本是0.5.3.post1，为啥还是不能使用python -m vllm.entrypoints.openai.api_server \ --model /home/nlp/xc/NLP/LLM/openLLM/MiniCPM-Llama3-V-2_5 \，他提醒我说不支持这个模型

DarkLight1337 · 2024-08-02T07:25:46Z

你好！MiniCPMv2_5与 openai 兼容 API 一起使用效果很好，但目前似乎不支持某些 API 参数。例如，当我想根据问题获取 logprobs 时：
from openai import OpenAI
  client = OpenAI(
      base_url="http://localhost:8000/v1",
      api_key="token-abc123",
  )

  completion = client.chat.completions.create(
  model="openbmb/MiniCPM-Llama3-V-2_5",
  messages=[
      {"role": "user", "content": "Do you think 2 is larger than 1? Answer yes or no."}
  ],
  extra_body={
      "stop": ['<|eot_id|>'],
      "echo": True,
      "max_tokens": 1,
      "logprobs": True,
  }
  )
输出仅包含输出的 logprobs（例如，“yes”， logprob = “-0.0065”），不包含输入提示。那么如何解决这个问题呢？
请问一下，我使用pip install vllm 安装的vllm版本是0.5.3.post1，为啥还是不能使用python -m vllm.entrypoints.openai.api_server \ --model /home/nlp/xc/NLP/LLM/openLLM/MiniCPM-Llama3-V-2_5 \，他提醒我说不支持这个模型

This model was added after the release of 0.5.3.post1, so it'll only be included in the next release (e.g. 0.5.4). If you want to use it now, please install vLLM from source (i.e. main branch).

ywang96 · 2024-08-02T07:29:04Z

你好！MiniCPMv2_5与 openai 兼容 API 一起使用效果很好，但目前似乎不支持某些 API 参数。例如，当我想根据问题获取 logprobs 时：
from openai import OpenAI
  client = OpenAI(
      base_url="http://localhost:8000/v1",
      api_key="token-abc123",
  )

  completion = client.chat.completions.create(
  model="openbmb/MiniCPM-Llama3-V-2_5",
  messages=[
      {"role": "user", "content": "Do you think 2 is larger than 1? Answer yes or no."}
  ],
  extra_body={
      "stop": ['<|eot_id|>'],
      "echo": True,
      "max_tokens": 1,
      "logprobs": True,
  }
  )
输出仅包含输出的 logprobs（例如，“yes”， logprob = “-0.0065”），不包含输入提示。那么如何解决这个问题呢？
请问一下，我使用pip install vllm 安装的vllm版本是0.5.3.post1，为啥还是不能使用python -m vllm.entrypoints.openai.api_server \ --model /home/nlp/xc/NLP/LLM/openLLM/MiniCPM-Llama3-V-2_5 \，他提醒我说不支持这个模型
This model was added after the release of 0.5.3.post1, so it'll only be included in the next release (e.g. 0.5.4). If you want to use it now, please install vLLM from source (i.e. main branch).

@DarkLight1337 I'm updating this PR description to link to this comment from you given how many times we had to answer the same question :P

PancakeAwesome · 2024-08-06T12:20:40Z

offline vllm推理 minicpmv2-6 会出现推理结果一直重复输出某段文字。
vllm ==0.5.4
推理代码：

messages = [{
    'role': 'user',
    'content': f'(<image>./</image>)\n{question}'
}]
prompt = tokenizer.apply_chat_template(messages,
                                       tokenize=False,
                                       add_generation_prompt=True)

stop_token_ids = ['<|eot_id|>']
sampling_params = SamplingParams(temperature=0.7, max_tokens=8192, stop_token_ids=stop_token_ids)

inputs = {
    "prompt": prompt,
    "multi_modal_data": {
        "image": image
    },
}


outputs = llm.generate(inputs, sampling_params=sampling_params)

for o in outputs:
    generated_text = o.outputs[0].text
    print(generated_text)

辛苦帮忙看下，感谢~ @ywang96

ywang96 · 2024-08-06T16:21:34Z

offline vllm推理 minicpmv2-6 会出现推理结果一直重复输出某段文字。 vllm ==0.5.4 推理代码：

messages = [{
    'role': 'user',
    'content': f'(<image>./</image>)\n{question}'
}]
prompt = tokenizer.apply_chat_template(messages,
                                       tokenize=False,
                                       add_generation_prompt=True)

stop_token_ids = ['<|eot_id|>']
sampling_params = SamplingParams(temperature=0.7, max_tokens=8192, stop_token_ids=stop_token_ids)

inputs = {
    "prompt": prompt,
    "multi_modal_data": {
        "image": image
    },
}


outputs = llm.generate(inputs, sampling_params=sampling_params)

for o in outputs:
    generated_text = o.outputs[0].text
    print(generated_text)

辛苦帮忙看下，感谢~ @ywang96

Could you share a sample input/output with the repetitive generation?

whyiug · 2024-08-07T01:41:15Z

offline vllm推理 minicpmv2-6 会出现推理结果一直重复输出某段文字。 vllm ==0.5.4 推理代码：

messages = [{
    'role': 'user',
    'content': f'(<image>./</image>)\n{question}'
}]
prompt = tokenizer.apply_chat_template(messages,
                                       tokenize=False,
                                       add_generation_prompt=True)

stop_token_ids = ['<|eot_id|>']
sampling_params = SamplingParams(temperature=0.7, max_tokens=8192, stop_token_ids=stop_token_ids)

inputs = {
    "prompt": prompt,
    "multi_modal_data": {
        "image": image
    },
}


outputs = llm.generate(inputs, sampling_params=sampling_params)

for o in outputs:
    generated_text = o.outputs[0].text
    print(generated_text)

辛苦帮忙看下，感谢~ @ywang96
try it.

stop_tokens = ['<|im_end|>', '<|endoftext|>']
stop_token_ids = [tokenizer.convert_tokens_to_ids(i) for i in stop_tokens]
sampling_params = SamplingParams(
            stop_token_ids=stop_token_ids,
)

PancakeAwesome · 2024-08-07T01:53:20Z

offline vllm推理 minicpmv2-6 会出现推理结果一直重复输出某段文字。 vllm ==0.5.4 推理代码：

messages = [{
    'role': 'user',
    'content': f'(<image>./</image>)\n{question}'
}]
prompt = tokenizer.apply_chat_template(messages,
                                       tokenize=False,
                                       add_generation_prompt=True)

stop_token_ids = ['<|eot_id|>']
sampling_params = SamplingParams(temperature=0.7, max_tokens=8192, stop_token_ids=stop_token_ids)

inputs = {
    "prompt": prompt,
    "multi_modal_data": {
        "image": image
    },
}


outputs = llm.generate(inputs, sampling_params=sampling_params)

for o in outputs:
    generated_text = o.outputs[0].text
    print(generated_text)

辛苦帮忙看下，感谢~ @ywang96

Could you share a sample input/output with the repetitive generation?

prompt in Chinese, which probably means producing some classic advertising copy

PancakeAwesome · 2024-08-07T01:54:24Z

By the way, how can i use minicpmv2-6's fewshot feature wtih VLLM structure.

PancakeAwesome · 2024-08-07T01:56:14Z

Here is minicpmv2-6 infer best practice with VLLM :

from PIL import Image
from transformers import AutoTokenizer
from vllm import LLM, SamplingParams

# 图像文件路径列表
IMAGES = [
    "/root/ld/ld_project/MiniCPM-V/assets/airplane.jpeg",  # 本地图片路径
]

# 模型名称或路径
MODEL_NAME = "/root/ld/ld_model_pretrained/Minicpmv2_6"  # 本地模型路径或Hugging Face模型名称

# 打开并转换图像
image = Image.open(IMAGES[0]).convert("RGB")

# 初始化分词器
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)

# 初始化语言模型
llm = LLM(model=MODEL_NAME,
           gpu_memory_utilization=1,  # 使用全部GPU内存
           trust_remote_code=True,
           max_model_len=2048)  # 根据内存状况可调整此值

# 构建对话消息
messages = [{'role': 'user', 'content': '(<image>./</image>)\n' + '请描述这张图片'}]

# 应用对话模板到消息
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

# 设置停止符ID
# 2.0
# stop_token_ids = [tokenizer.eos_id]
# 2.5
#stop_token_ids = [tokenizer.eos_id, tokenizer.eot_id]
# 2.6 
stop_tokens = ['<|im_end|>', '<|endoftext|>']
stop_token_ids = [tokenizer.convert_tokens_to_ids(i) for i in stop_tokens]

# 设置生成参数
sampling_params = SamplingParams(
    stop_token_ids=stop_token_ids,
    # temperature=0.7,
    # top_p=0.8,
    # top_k=100,
    # seed=3472,
    max_tokens=1024,
    # min_tokens=150,
    temperature=0,
    use_beam_search=True,
    # length_penalty=1.2,
    best_of=3)

# 获取模型输出
outputs = llm.generate({
    "prompt": prompt,
    "multi_modal_data": {
        "image": image
    }
}, sampling_params=sampling_params)
print(outputs[0].outputs[0].text)

PancakeAwesome · 2024-08-07T01:58:15Z

offline vllm推理 minicpmv2-6 会出现推理结果一直重复输出某段文字。 vllm ==0.5.4 推理代码：

messages = [{
    'role': 'user',
    'content': f'(<image>./</image>)\n{question}'
}]
prompt = tokenizer.apply_chat_template(messages,
                                       tokenize=False,
                                       add_generation_prompt=True)

stop_token_ids = ['<|eot_id|>']
sampling_params = SamplingParams(temperature=0.7, max_tokens=8192, stop_token_ids=stop_token_ids)

inputs = {
    "prompt": prompt,
    "multi_modal_data": {
        "image": image
    },
}


outputs = llm.generate(inputs, sampling_params=sampling_params)

for o in outputs:
    generated_text = o.outputs[0].text
    print(generated_text)

辛苦帮忙看下，感谢~ @ywang96
try it.

stop_tokens = ['<|im_end|>', '<|endoftext|>']
stop_token_ids = [tokenizer.convert_tokens_to_ids(i) for i in stop_tokens]
sampling_params = SamplingParams(
            stop_token_ids=stop_token_ids,
)

Thank u very much, I think problem is each version has different stoptoken-ids. These codes will work, I think.

PancakeAwesome · 2024-08-07T02:00:12Z

By the way, how can i use minicpmv2-6's fewshot feature wtih VLLM structure.

here is official fewshot feature usage with transformers:

import torch
from PIL import Image
from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True,
    attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager
model = model.eval().cuda()
tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True)

question = "production date" 
image1 = Image.open('example1.jpg').convert('RGB')
answer1 = "2023.08.04"
image2 = Image.open('example2.jpg').convert('RGB')
answer2 = "2007.04.24"
image_test = Image.open('test.jpg').convert('RGB')

msgs = [
    {'role': 'user', 'content': [image1, question]}, {'role': 'assistant', 'content': [answer1]},
    {'role': 'user', 'content': [image2, question]}, {'role': 'assistant', 'content': [answer2]},
    {'role': 'user', 'content': [image_test, question]}
]

answer = model.chat(
    image=None,
    msgs=msgs,
    tokenizer=tokenizer
)
print(answer)

PancakeAwesome · 2024-08-07T02:35:42Z

By the way, how can i use minicpmv2-6's fewshot feature wtih VLLM structure.

Looking forward your reply~Thank u. @ywang96 @whyiug

xyfZzz · 2024-08-07T04:34:21Z

在5.4.0版本的vllm中以openai api形式部署minicpm-v-2.6，遇到这个报错，请帮忙看下：

Process Process-1:
Traceback (most recent call last):
  File "/app/apps/anaconda3/envs/vllm_054_cu118/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/app/apps/anaconda3/envs/vllm_054_cu118/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/app/apps/anaconda3/envs/vllm_054_cu118/lib/python3.9/site-packages/vllm/entrypoints/openai/rpc/server.py", line 217, in run_rpc_serve
r
    server = AsyncEngineRPCServer(async_engine_args, usage_context, port)
  File "/app/apps/anaconda3/envs/vllm_054_cu118/lib/python3.9/site-packages/vllm/entrypoints/openai/rpc/server.py", line 25, in __init__
    self.engine = AsyncLLMEngine.from_engine_args(async_engine_args,
  File "/app/apps/anaconda3/envs/vllm_054_cu118/lib/python3.9/site-packages/vllm/engine/async_llm_engine.py", line 471, in from_engine_args
    engine = cls(
  File "/app/apps/anaconda3/envs/vllm_054_cu118/lib/python3.9/site-packages/vllm/engine/async_llm_engine.py", line 381, in __init__
    self.engine = self._init_engine(*args, **kwargs)
  File "/app/apps/anaconda3/envs/vllm_054_cu118/lib/python3.9/site-packages/vllm/engine/async_llm_engine.py", line 552, in _init_engine
    return engine_class(*args, **kwargs)
  File "/app/apps/anaconda3/envs/vllm_054_cu118/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 249, in __init__
    self.model_executor = executor_class(
  File "/app/apps/anaconda3/envs/vllm_054_cu118/lib/python3.9/site-packages/vllm/executor/executor_base.py", line 47, in __init__
    self._init_executor()
  File "/app/apps/anaconda3/envs/vllm_054_cu118/lib/python3.9/site-packages/vllm/executor/gpu_executor.py", line 35, in _init_executor
    self.driver_worker.init_device()
  File "/app/apps/anaconda3/envs/vllm_054_cu118/lib/python3.9/site-packages/vllm/worker/worker.py", line 123, in init_device
    torch.cuda.set_device(self.device)
  File "/app/apps/anaconda3/envs/vllm_054_cu118/lib/python3.9/site-packages/torch/cuda/__init__.py", line 420, in set_device
    torch._C._cuda_setDevice(device)
  File "/app/apps/anaconda3/envs/vllm_054_cu118/lib/python3.9/site-packages/torch/cuda/__init__.py", line 300, in _lazy_init
    raise RuntimeError(
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

AlphaINF · 2024-08-07T05:08:38Z

在0.5.4版本中推理minicpm-V，会出现out of memory的情况，采用OpenAI格式部署
OpenBMB/MiniCPM-o#392

AlphaINF · 2024-08-07T08:14:09Z

我在一张A100-80G显卡上面做了测试，发现使用vllm加载时，内存会先到16GB（读取模型），读取完毕后的某一个瞬间，内存会达到29GB的峰值，然后又降低到了19GB。原因不明。

sfyumi · 2024-08-07T08:14:45Z

How to load the vision model in a separate gpu to avoid oom？

AlphaINF · 2024-08-07T08:31:08Z

@sfyumi I have a solution. In default vllm's max-num-seqs default to 256 and it's too large for the 3090, just lower the number to 32 for max-num-seqs and raise gpu-memory-utilization to 1.

HwwwwwwwH · 2024-08-07T11:03:14Z

By the way, how can i use minicpmv2-6's fewshot feature wtih VLLM structure.

here is official fewshot feature usage with transformers:

import torch
from PIL import Image
from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True,
    attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager
model = model.eval().cuda()
tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True)

question = "production date" 
image1 = Image.open('example1.jpg').convert('RGB')
answer1 = "2023.08.04"
image2 = Image.open('example2.jpg').convert('RGB')
answer2 = "2007.04.24"
image_test = Image.open('test.jpg').convert('RGB')

msgs = [
    {'role': 'user', 'content': [image1, question]}, {'role': 'assistant', 'content': [answer1]},
    {'role': 'user', 'content': [image2, question]}, {'role': 'assistant', 'content': [answer2]},
    {'role': 'user', 'content': [image_test, question]}
]

answer = model.chat(
    image=None,
    msgs=msgs,
    tokenizer=tokenizer
)
print(answer)

I think this could work

msgs = [
    {'role': 'user', 'content': "(<image>./</image>)" + question}, {'role': 'assistant', 'content': answer1},
    {'role': 'user', 'content': "(<image>./</image>)" + question}, {'role': 'assistant', 'content': answer2},
    {'role': 'user', 'content': "(<image>./</image>)" + question}
]
prompt = tokenizer.apply_chat_template(
    msgs,
    tokenize=False,
    add_generation_prompt=True
)
inputs = {
    "prompt": prompt,
    "multi_modal_data": {
        "image": [image1, image2, image_test]
    },
}

HwwwwwwwH · 2024-08-07T11:06:13Z

我在一张A100-80G显卡上面做了测试，发现使用vllm加载时，内存会先到16GB（读取模型），读取完毕后的某一个瞬间，内存会达到29GB的峰值，然后又降低到了19GB。原因不明。

vLLM will send dummy data(with multiple dummy images) to the model. Since MiniCPM-V has only a few image tokens, there might be a large number of dummy images which could cause OOM. You can add max_model_len=2048 while initializing LLM.

Signed-off-by: Alvant <alvasian@yandex.ru>

Signed-off-by: LeiWang1999 <leiwang1999@outlook.com>

HwwwwwwwH added 4 commits April 15, 2024 18:20

minicpm-v

caac38f

fix format

a619354

add minicpmv example

6f7e0ef

fix format

75c2d31

HwwwwwwwH added 4 commits April 23, 2024 15:42

add timm import hints

4b4c7f3

Merge branch 'main' of github.com:HwwwwwwwH/vllm

8e70b68

Merge branch 'main' into minicpmv

6c953a9

adapt to new vllm version

189d28e

add timm dependency to requirements-common.txt, change examples/minicpmv

af353f2

HwwwwwwwH mentioned this pull request Apr 26, 2024

[Model] add minicpm #3893

Merged

HwwwwwwwH added 4 commits May 5, 2024 23:55

merge latest main

1e88026

merge latest main

4204a02

merge latest main

8b63870

Merge branch 'main' into minicpmv

cc64a0b

esmeetu reviewed May 7, 2024

View reviewed changes

jeejeelee mentioned this pull request May 17, 2024

vllm 0.4.2报错 Model architectures ['MiniCPMV'] are not supported for now OpenBMB/MiniCPM-o#82

Closed

HwwwwwwwH and others added 6 commits May 24, 2024 10:17

Merge branch 'main' into minicpmv

0280936

minicpmv_2 init

b01948c

Make changes based on the review

0b1be33

Make changes based on the review

93bbc4c

Merge branch 'main' into minicpmv_2

a29df42

fix

7724d0e

fix:get model dtype from default_dtype

fe58513

dtrifiro mentioned this pull request Aug 5, 2024

Sync with upstream@v0.5.4-7-g9118217f opendatahub-io/vllm#120

Closed

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024

[Model] Adding support for MiniCPM-V (vllm-project#4087)

3b9cecd

Signed-off-by: Alvant <alvasian@yandex.ru>

LeiWang1999 pushed a commit to LeiWang1999/vllm-bitblas that referenced this pull request Mar 26, 2025

[Model] Adding support for MiniCPM-V (vllm-project#4087)

a411329

Signed-off-by: LeiWang1999 <leiwang1999@outlook.com>

Uh oh!

[Model] Adding support for MiniCPM-V #4087

[Model] Adding support for MiniCPM-V #4087

Uh oh!

Conversation

HwwwwwwwH commented Apr 15, 2024 • edited by ywang96 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HwwwwwwwH commented Apr 16, 2024

Uh oh!

youkaichao commented Apr 23, 2024

Uh oh!

youkaichao commented Apr 23, 2024

Uh oh!

HwwwwwwwH commented Apr 26, 2024

Uh oh!

esmeetu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jeejeelee commented May 16, 2024

Uh oh!

HwwwwwwwH commented May 27, 2024

Uh oh!

jeejeelee commented May 27, 2024

Uh oh!

ZHANG-SH97 commented Aug 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Howe-Young commented Aug 1, 2024

Uh oh!

DarkLight1337 commented Aug 1, 2024

Uh oh!

whyiug commented Aug 1, 2024

Uh oh!

Howe-Young commented Aug 1, 2024

Uh oh!

Howe-Young commented Aug 1, 2024

Uh oh!

1223243 commented Aug 2, 2024

Uh oh!

DarkLight1337 commented Aug 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ywang96 commented Aug 2, 2024

Uh oh!

PancakeAwesome commented Aug 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ywang96 commented Aug 6, 2024

Uh oh!

whyiug commented Aug 7, 2024

Uh oh!

PancakeAwesome commented Aug 7, 2024

Uh oh!

PancakeAwesome commented Aug 7, 2024

Uh oh!

PancakeAwesome commented Aug 7, 2024

Uh oh!

PancakeAwesome commented Aug 7, 2024

Uh oh!

PancakeAwesome commented Aug 7, 2024

Uh oh!

PancakeAwesome commented Aug 7, 2024

Uh oh!

xyfZzz commented Aug 7, 2024

Uh oh!

AlphaINF commented Aug 7, 2024

Uh oh!

AlphaINF commented Aug 7, 2024

Uh oh!

sfyumi commented Aug 7, 2024

Uh oh!

AlphaINF commented Aug 7, 2024

Uh oh!

HwwwwwwwH commented Apr 15, 2024 •

edited by ywang96

Loading

ZHANG-SH97 commented Aug 1, 2024 •

edited

Loading

DarkLight1337 commented Aug 2, 2024 •

edited

Loading

PancakeAwesome commented Aug 6, 2024 •

edited

Loading