Skip to content

[Usage]: how to request a qwen2.5-VL-7B classify model served by vllm using openai SDK? #27413

@muziyongshixin

Description

@muziyongshixin

Your current environment

The output of `python collect_env.py`

How would you like to use vllm

I launch a server with the following command to serving a Qwen2.5-VL-7B model finetued for seqence classification. (this model replaced the lm_head with a 2 classes score_head)

The launch command is :

vllm serve --model=//video_classification/qwenvl_7b_video_cls/v5-20251011-121851/2340_vllm_format --served_model_name Qwen2.5-7B-shenhe --task=classify --port=8080 --tensor-parallel-size=2

I don't know how to request the server with the openAI sdk.
I use the code snnipet showed below which works well with pure text, but it got 400 bad request when I put the video url into the prompt

this works well:

# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
"""Example Python client for classification API using vLLM API server
NOTE:
    start a supported classification model server with `vllm serve`, e.g.
    vllm serve jason9693/Qwen2.5-1.5B-apeach
"""

import argparse
import pprint

import requests


def post_http_request(payload: dict, api_url: str) -> requests.Response:
    headers = {"User-Agent": "Test Client"}
    response = requests.post(api_url, headers=headers, json=payload)
    return response


def parse_args():
    parse = argparse.ArgumentParser()
    parse.add_argument("--host", type=str, default="localhost")
    parse.add_argument("--port", type=int, default=8000)
    parse.add_argument("--model", type=str, default="jason9693/Qwen2.5-1.5B-apeach")
    return parse.parse_args()


def main(args):
    host = args.host
    port = args.port
    model_name = args.model

    api_url = f"http://{host}:{port}/classify"
    prompts = [
        "Hello, my name is",
        "The president of the United States is",
        "The capital of France is",
        "The future of AI is",
    ]

    payload = {
        "model": model_name,
        "input": prompts,
    }

    classify_response = post_http_request(payload=payload, api_url=api_url)
    pprint.pprint(classify_response.json())


if __name__ == "__main__":
    args = parse_args()
    main(args)

but if I replace the prompts with multimodal data, the server doesn't work.

video_url =  "https://js-ad.a.yximgs.com/bs2/ad_nieuwland-material/t2i2v/videos/3525031242883943515-140276939618048_24597237897733_v0_1759927515165406_3.mp4"

    prompts =  [
        {"role": "user", "content": [
                {"type": "text", "text": "你是一个专业的视频质量分析师,请你仔细判断下方提供的视频是否存在质量问题\n质量问题包括但不限于:\n1.画面质量差,画面模糊,亮度闪烁\n2.画面中文字存在模糊问题\n3.视频画面不符合真实物理逻辑,例如凭空产生的人物肢体、头像、手指手臂数量不对,腿部不自然等问题\n4.画面运动不符合物理规律,例如凭空产生的物体,画面卡顿、晃动、抖动、跳动等\n\n如果视频存在问题请返回0,如果视频不存在问题请返回1。\n## 视频内容如下\n"},
                {"type": "video", "video": f"{video_url}"},
            ]
        }
    ]

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions