Skip to content

[Installation]: Kimi-VL-A3B failed to be deployed using vllm mirroring #16715

@nigthDust

Description

@nigthDust

Your current environment

model:Kimi-VL-A3B-Thinking
image:vllm-openai:latest
vllm version:0.8.4

1.docker pull vllm/vllm-openai
里面的vllm version:0.8.4
2.docker run --gpus all -v /mnt/data1/LargeLanguageModels/qwen:/model --ipc=host --network=host --name kimi-vl -it --entrypoint vllm/vllm-openai :latest bash
3. 在容器中如下命令启动大模型

CUDA_VISIBLE_DEVICES=0 python -m vllm.entrypoints.openai.api_server \

--port 3000 \
--served-model-name kimi-vl \
--trust-remote-code \
--model /models/Kimi-VL-A3B-Thinking/Kimi-VL-A3B-Thinking \
--tensor-parallel-size 1 \
--max-num-batched-tokens 131072 \
--max-model-len 131072 \
--max-num-seqs 512 \
--limit-mm-per-prompt image=256 \
--disable-mm-preprocessor-cache

出现报错

Traceback (most recent call last):
File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/gwm-tmp/kimi_vl/vllm/vllm/entrypoints/openai/api_server.py", line 1121, in
uvloop.run(run_server(args))
File "/gwm-tmp/kimi_vl/venv/lib/python3.10/site-packages/uvloop/init.py", line 82, in run
return loop.run_until_complete(wrapper())
File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
File "/gwm-tmp/kimi_vl/venv/lib/python3.10/site-packages/uvloop/init.py", line 61, in wrapper
return await main
File "/gwm-tmp/kimi_vl/vllm/vllm/entrypoints/openai/api_server.py", line 1069, in run_server
async with build_async_engine_client(args) as engine_client:
File "/usr/local/lib/python3.10/contextlib.py", line 199, in aenter
return await anext(self.gen)
File "/gwm-tmp/kimi_vl/vllm/vllm/entrypoints/openai/api_server.py", line 146, in build_async_engine_client
async with build_async_engine_client_from_engine_args(
File "/usr/local/lib/python3.10/contextlib.py", line 199, in aenter
return await anext(self.gen)
File "/gwm-tmp/kimi_vl/vllm/vllm/entrypoints/openai/api_server.py", line 166, in build_async_engine_client_from_engine_args
vllm_config = engine_args.create_engine_config(usage_context=usage_context)
File "/gwm-tmp/kimi_vl/vllm/vllm/engine/arg_utils.py", line 1169, in create_engine_config
model_config = self.create_model_config()
File "/gwm-tmp/kimi_vl/vllm/vllm/engine/arg_utils.py", line 1057, in create_model_config
return ModelConfig(
File "/gwm-tmp/kimi_vl/vllm/vllm/config.py", line 413, in init
self.multimodal_config = self._init_multimodal_config(
File "/gwm-tmp/kimi_vl/vllm/vllm/config.py", line 486, in _init_multimodal_config
raise ValueError("limit_mm_per_prompt is only supported for "
ValueError: limit_mm_per_prompt is only supported for multimodal models.

4.我去掉了limit_mm_per_prompt参数后用如下命令多出尝试全是如下报错
首先是vllm镜像缺少blobfile,手动导入安装解决

CUDA_VISIBLE_DEVICES=3 python3 -m vllm.entrypoints.openai.api_server --port 8888 --served-model-name kimi-vl --trust-remote-code --model moonshotai/Kimi-VL-A3B-Instruct --tensor-parallel-size 1 --max-num-batched-tokens 131072 --max-model-len 131072 --max-num-seqs 512

VLLM_USE_V1_0 CUDA_VISIBLE_DEVICES=3 python3 -m vllm.entrypoints.openai.api_server --port 8888 --served-model-name kimi-vl --trust-remote-code --model moonshotai/Kimi-VL-A3B-Instruct --tensor-parallel-size 1 --max-num-batched-tokens 131072 --max-model-len 131072 --max-num-seqs 512

VLLM_USE_V1_0 CUDA_VISIBLE_DEVICES=3 python3 -m vllm.entrypoints.openai.api_server --port 8888 --served-model-name kimi-vl --trust-remote-code --model moonshotai/Kimi-VL-A3B-Instruct --tensor-parallel-size 1

CUDA_VISIBLE_DEVICES=3 python3 -m vllm.entrypoints.openai.api_server --port 8888 --served-model-name kimi-vl --trust-remote-code --model moonshotai/Kimi-VL-A3B-Instruct --tensor-parallel-size 1

报错如下

ERROR 04-16 02:31:40[engine.py:448] Traceback (most recent call last):
ERROR 04-16 02:31:40[engine.py:448]File "/usr/local/lib/python3.12/dist-packages/vllm/engine/mult iprocess ing/engine .py", line 436, in run_mp_engine
ERROR 04-16 02:31:40[engine.py:448]engine = MQLLMEngine,from_yllm_config(
ERROR 04-16 02:31:40[engine.py:448]^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-16 02:31:40[engine.py:448]File "/usr/local/lib/python3.12/dist-packages/vllm/engine/mult iprocess ing/engine .py", line 128, in from_vllm_config
ERROR 04-16 02:31:40[engine.py:448] return cls(^^^ ERROR 04-16 02:31:40[engine.py:448]
ERROR 04-16 02:31:40[engine.py:448]3File"/usr/local/lib/python3.12/dist-packages/vllm/engine/mult iprocess ing/engine.py",line 82, in _init ERROR 04-16 02:31:40 ERROR 04-16 02:31:40[engine.py:448[engine.py:448] self,engine = LLMEngine(*args, **kwargs)
2025-04-16 17:33215000-0813815712-0014101E8791
ERROR 04-16 02:31:40[engine.py:448]File "/usr/local/lib/python3.12/dist-packages/vllm/engine/lln_engine .py", line 282, in init
ERROR 04-16 02:31:40 ERROR 04-16 02:31:40[engine.py:448][engine.py:448] self.model_executor = executor_class(vllm config=viim canfia.)
ERROR 04-16 02:31:40 ERROR 04-16 02:31:40[engine.py:448][engine.py:448] self._init_executor( ) File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 52, in init
ERROR 04-16 02:31:40[engine.py:448]File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 47, in _init_executor
ERROR 04-16 02:31:40[engine.py:448]answerself.collective_rpc("load_model" )File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor .py", line 56, in collective_rpc
ERROR 04-16 02:31:40[engine.py:448]
ERROR 04-16 02:31:40[engine.py:448]run method(self.driver_worker, method, args,kwargs )
ERROR 04-16 02:31:40[engine.py:448]
ERROR 04-16 02:31:40[engine.py:448]File "/usr/local/lib/python3.12/dist-packages/vllm/utiis.py . line 2378, in run_method
ERROR 04-16 02:31:40[engine.py:448]return func(*args,**kwargs)
ERROR 04-16 02:31:40[engine.py:448
ERROR 04-16 02:31:40[engine.py:448]File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker.py", line 183, in load_model
ERROR 04-16 02:31:40[engine.py:448self.model_runner.load model()
ERROR 04-16 02:31:40[engine.py:448]File "/usr/local/lib/python3.12/dist-packages/vllm/worker/mddel_runner.py", line 1113, in load_model
ERROR 04-16 02:31:40[engine.py:448]self.model = get_model(vllm_config=solf.vllm_config)
ERROR 04-16 02:31:40ERROR 04-16 02:31:40ERROR 04-16 02:31:40[engine,py:448][engine.py:448[engine.py:448return loader.load model(vilm config=vllm config)File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/init.py". line 14, in get_model
843315712-001410188791
-04-16 17:33
ERROR 04-16 02:31:40ERROR 04-16 02:31:40[engine.py:448[engine.py:448「engine.py:448File "/usr/local/lib/python3. 12/dist-packages/vllm/model_executor/model_loader/loader.py". line 452. in load_model
ERROR 04-16 02:31:40model=initiallize model(yllm config=vlim config)
ERROR 04-16 02:31:40[engine,py:448
ERROR 04-16 02:31:40[engine.py:448File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/loader.py", line 123, in initialize_model
ERROR 04-16 02:31:40[engine.py:448model_class,
= get_model_architecture(modol_config)
ERROR 04-16 02:31:40ERROR 04-16 02:31:40ERROR 04-16 02:31:40ERROR 04-16 02:31:40[eng ine.py:448[engine.py:448[engine.py:448[engine.py:448n^^^^^^~^^^^^^^^^^^^^^^^^^^^^^^^^architectures = resolve_transformers_arch(model_config, architectures)File */usr/local/lib/python3.12/dist-packages/vllm/model_executor/mode?_loader/utils.py", line 104, in get_model_architecture
ERROR 04-16 02:31:40[engine .py:448File */usr/local/lib/python3. 12/dist-packages/vllm/model_executor/model_loader/utils.py", line 72, in resolve_transformers_arch
ERROR 04-16 02:31:40ERROR 04-16 02:31:40[engine,py:448][engine.py:448]raise ValueError(valueError: KimiyLForConditionalGeneration has no vLLM implementation and the Transformers implementation is not compatible with vLLM. Try settin
9 VLLM USE_V1=0.

How you are installing vllm

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions