使用vllm部署qwen2-vl 72Bint4报错 #260

bank010 · 2024-09-24T06:46:16Z

运行命令：
python -m vllm.entrypoints.openai.api_server --served-model-name Qwen2-VL-7B-Instruct --model /data1/MLLM/qwen/Qwen2-VL-72B-Instruct-GPTQ-Int4 --tensor-parallel-size 8 --gpu-memory-utilization 0.8 --cpu-offload-gb 10 --port 5001 --host 0.0.0.0 --quantization gptq

ValueError: The input size is not aligned with the quantized weight shape. This can be caused by too large tensor parallel size.

fyabc · 2024-09-24T07:00:41Z

@bank010 目前量化暂不支持tensor-parallel-size=8，近期会更新相关支持，请关注 #231 的相关进展。

bank010 · 2024-09-24T07:25:36Z

@bank010 目前量化暂不支持，近期会更新相关支持，请关注 #231 的相关进展。tensor-parallel-size=8

@fyabc 我通过修改intermediate_size解决了这个问题，使用vllm成功部署上了

curl http://127.0.0.1:5001/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "Qwen2-VL-7B-Instruct",
"stream":1,
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": [
{"type": "image_url", "image_url": {"url": "https://modelscope.oss-cn-beijing.aliyuncs.com/resource/qwen.png"}},
{"type": "text", "text": "What is the text in the illustrate?"}
]}
]
}'
推理过程中发现图片type是image_url是url链接，我怎么通过本地图片去推理呢？

fyabc · 2024-09-24T07:54:20Z

@bank010 您好，可以使用base64编码方式上传本地图片：

import base64
from openai import OpenAI

# Set OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

image_path = "/path/to/local/image.png"
with open(image_path, "rb") as f:
    encoded_image = base64.b64encode(f.read())
encoded_image_text = encoded_image.decode("utf-8")
base64_qwen = f"data:image;base64,{encoded_image_text}"


chat_response = client.chat.completions.create(
    model="Qwen2-7B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": base64_qwen
                    },
                },
                {"type": "text", "text": "What is the text in the illustrate?"},
            ],
        },
    ],
)
print("Chat response:", chat_response)

bank010 · 2024-09-24T08:02:57Z

@bank010 您好，可以使用base64编码方式上传本地图片：

import base64
from openai import OpenAI

# Set OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

image_path = "/path/to/local/image.png"
with open(image_path, "rb") as f:
    encoded_image = base64.b64encode(f.read())
encoded_image_text = encoded_image.decode("utf-8")
base64_qwen = f"data:image;base64,{encoded_image_text}"


chat_response = client.chat.completions.create(
    model="Qwen2-7B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": base64_qwen
                    },
                },
                {"type": "text", "text": "What is the text in the illustrate?"},
            ],
        },
    ],
)
print("Chat response:", chat_response)

感谢你回复我，但我想的是通过curl这种方式，可以直接用本地图片吗？

kq-chen · 2024-09-24T10:03:02Z

Based on the suggestion #231 from aabbccddwasd, we have adjusted the intermediate size to 29696 and re-quantized the model. The updated 72B AWQ/GPTQ-Int4/GPTQ-Int8 checkpoints have been uploaded to Hugging Face. To utilize the new checkpoints, please download them again from Hugging Face.

You can use the following command to perform inference on the quantized 72B model with VLLM tensor-parallel:

Server:

VLLM_WORKER_MULTIPROC_METHOD=spawn python -m vllm.entrypoints.openai.api_server \
  --served-model-name qwen2vl \
  --model Qwen/Qwen2-VL-72B-Instruct-AWQ \
  --tensor-parallel-size 4 \
  --max_num_seqs 16

Client:

curl http://localhost:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
    "model": "qwen2vl",
    "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": [
        {"type": "image_url", "image_url": {"url": "https://modelscope.oss-cn-beijing.aliyuncs.com/resource/qwen.png"}},
        {"type": "text", "text": "What is the text in the illustration?"}
    ]}
    ]
    }'

ZHUHF123 · 2024-09-26T02:32:11Z

@bank010您好，可以使用base64编码方式上传本地图片：

import base64
from openai import OpenAI

# Set OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

image_path = "/path/to/local/image.png"
with open(image_path, "rb") as f:
    encoded_image = base64.b64encode(f.read())
encoded_image_text = encoded_image.decode("utf-8")
base64_qwen = f"data:image;base64,{encoded_image_text}"


chat_response = client.chat.completions.create(
    model="Qwen2-7B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": base64_qwen
                    },
                },
                {"type": "text", "text": "What is the text in the illustrate?"},
            ],
        },
    ],
)
print("Chat response:", chat_response)

谢谢你回复我，但我想的是通过curl的方式，可以直接用本地图片吗？

你好我也有一样的问题，使用curl的方式可以使用本地图片吗，还是只能base64和url，感谢回复

plancktree · 2024-09-28T09:07:13Z

intermediate_size

如何修改intermediate_size呢

bank010 mentioned this issue Sep 24, 2024

官方镜像运行量化版模型出现环境冲突问题 #254

Open

fyabc assigned kq-chen Sep 24, 2024

bank010 closed this as completed Sep 24, 2024

bank010 reopened this Sep 24, 2024

kq-chen closed this as completed Sep 24, 2024

kq-chen mentioned this issue Oct 1, 2024

qwen2vl-72b 多卡推理 #295

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

使用vllm部署qwen2-vl 72Bint4报错 #260

使用vllm部署qwen2-vl 72Bint4报错 #260

bank010 commented Sep 24, 2024

fyabc commented Sep 24, 2024

bank010 commented Sep 24, 2024 •

edited

Loading

fyabc commented Sep 24, 2024

bank010 commented Sep 24, 2024

kq-chen commented Sep 24, 2024

ZHUHF123 commented Sep 26, 2024

plancktree commented Sep 28, 2024

使用vllm部署qwen2-vl 72Bint4报错 #260

使用vllm部署qwen2-vl 72Bint4报错 #260

Comments

bank010 commented Sep 24, 2024

fyabc commented Sep 24, 2024

bank010 commented Sep 24, 2024 • edited Loading

fyabc commented Sep 24, 2024

bank010 commented Sep 24, 2024

kq-chen commented Sep 24, 2024

ZHUHF123 commented Sep 26, 2024

plancktree commented Sep 28, 2024

bank010 commented Sep 24, 2024 •

edited

Loading