-
-
Couldn't load subscription status.
- Fork 10.8k
Description
Your current environment
model:Kimi-VL-A3B-Thinking
image:vllm-openai:latest
vllm version:0.8.4
1.docker pull vllm/vllm-openai
里面的vllm version:0.8.4
2.docker run --gpus all -v /mnt/data1/LargeLanguageModels/qwen:/model --ipc=host --network=host --name kimi-vl -it --entrypoint vllm/vllm-openai :latest bash
3. 在容器中如下命令启动大模型
CUDA_VISIBLE_DEVICES=0 python -m vllm.entrypoints.openai.api_server \
--port 3000 \
--served-model-name kimi-vl \
--trust-remote-code \
--model /models/Kimi-VL-A3B-Thinking/Kimi-VL-A3B-Thinking \
--tensor-parallel-size 1 \
--max-num-batched-tokens 131072 \
--max-model-len 131072 \
--max-num-seqs 512 \
--limit-mm-per-prompt image=256 \
--disable-mm-preprocessor-cache
出现报错
Traceback (most recent call last):
File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/gwm-tmp/kimi_vl/vllm/vllm/entrypoints/openai/api_server.py", line 1121, in
uvloop.run(run_server(args))
File "/gwm-tmp/kimi_vl/venv/lib/python3.10/site-packages/uvloop/init.py", line 82, in run
return loop.run_until_complete(wrapper())
File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
File "/gwm-tmp/kimi_vl/venv/lib/python3.10/site-packages/uvloop/init.py", line 61, in wrapper
return await main
File "/gwm-tmp/kimi_vl/vllm/vllm/entrypoints/openai/api_server.py", line 1069, in run_server
async with build_async_engine_client(args) as engine_client:
File "/usr/local/lib/python3.10/contextlib.py", line 199, in aenter
return await anext(self.gen)
File "/gwm-tmp/kimi_vl/vllm/vllm/entrypoints/openai/api_server.py", line 146, in build_async_engine_client
async with build_async_engine_client_from_engine_args(
File "/usr/local/lib/python3.10/contextlib.py", line 199, in aenter
return await anext(self.gen)
File "/gwm-tmp/kimi_vl/vllm/vllm/entrypoints/openai/api_server.py", line 166, in build_async_engine_client_from_engine_args
vllm_config = engine_args.create_engine_config(usage_context=usage_context)
File "/gwm-tmp/kimi_vl/vllm/vllm/engine/arg_utils.py", line 1169, in create_engine_config
model_config = self.create_model_config()
File "/gwm-tmp/kimi_vl/vllm/vllm/engine/arg_utils.py", line 1057, in create_model_config
return ModelConfig(
File "/gwm-tmp/kimi_vl/vllm/vllm/config.py", line 413, in init
self.multimodal_config = self._init_multimodal_config(
File "/gwm-tmp/kimi_vl/vllm/vllm/config.py", line 486, in _init_multimodal_config
raise ValueError("limit_mm_per_promptis only supported for "
ValueError:limit_mm_per_promptis only supported for multimodal models.
4.我去掉了limit_mm_per_prompt参数后用如下命令多出尝试全是如下报错
首先是vllm镜像缺少blobfile,手动导入安装解决
CUDA_VISIBLE_DEVICES=3 python3 -m vllm.entrypoints.openai.api_server --port 8888 --served-model-name kimi-vl --trust-remote-code --model moonshotai/Kimi-VL-A3B-Instruct --tensor-parallel-size 1 --max-num-batched-tokens 131072 --max-model-len 131072 --max-num-seqs 512
VLLM_USE_V1_0 CUDA_VISIBLE_DEVICES=3 python3 -m vllm.entrypoints.openai.api_server --port 8888 --served-model-name kimi-vl --trust-remote-code --model moonshotai/Kimi-VL-A3B-Instruct --tensor-parallel-size 1 --max-num-batched-tokens 131072 --max-model-len 131072 --max-num-seqs 512
VLLM_USE_V1_0 CUDA_VISIBLE_DEVICES=3 python3 -m vllm.entrypoints.openai.api_server --port 8888 --served-model-name kimi-vl --trust-remote-code --model moonshotai/Kimi-VL-A3B-Instruct --tensor-parallel-size 1
CUDA_VISIBLE_DEVICES=3 python3 -m vllm.entrypoints.openai.api_server --port 8888 --served-model-name kimi-vl --trust-remote-code --model moonshotai/Kimi-VL-A3B-Instruct --tensor-parallel-size 1
报错如下
ERROR 04-16 02:31:40[engine.py:448] Traceback (most recent call last):
ERROR 04-16 02:31:40[engine.py:448]File "/usr/local/lib/python3.12/dist-packages/vllm/engine/mult iprocess ing/engine .py", line 436, in run_mp_engine
ERROR 04-16 02:31:40[engine.py:448]engine = MQLLMEngine,from_yllm_config(
ERROR 04-16 02:31:40[engine.py:448]^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-16 02:31:40[engine.py:448]File "/usr/local/lib/python3.12/dist-packages/vllm/engine/mult iprocess ing/engine .py", line 128, in from_vllm_config
ERROR 04-16 02:31:40[engine.py:448] return cls(^^^ ERROR 04-16 02:31:40[engine.py:448]
ERROR 04-16 02:31:40[engine.py:448]3File"/usr/local/lib/python3.12/dist-packages/vllm/engine/mult iprocess ing/engine.py",line 82, in _init ERROR 04-16 02:31:40 ERROR 04-16 02:31:40[engine.py:448[engine.py:448] self,engine = LLMEngine(*args, **kwargs)
2025-04-16 17:33215000-0813815712-0014101E8791
ERROR 04-16 02:31:40[engine.py:448]File "/usr/local/lib/python3.12/dist-packages/vllm/engine/lln_engine .py", line 282, in init
ERROR 04-16 02:31:40 ERROR 04-16 02:31:40[engine.py:448][engine.py:448] self.model_executor = executor_class(vllm config=viim canfia.)
ERROR 04-16 02:31:40 ERROR 04-16 02:31:40[engine.py:448][engine.py:448] self._init_executor( ) File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 52, in init
ERROR 04-16 02:31:40[engine.py:448]File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 47, in _init_executor
ERROR 04-16 02:31:40[engine.py:448]answerself.collective_rpc("load_model" )File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor .py", line 56, in collective_rpc
ERROR 04-16 02:31:40[engine.py:448]
ERROR 04-16 02:31:40[engine.py:448]run method(self.driver_worker, method, args,kwargs )
ERROR 04-16 02:31:40[engine.py:448]
ERROR 04-16 02:31:40[engine.py:448]File "/usr/local/lib/python3.12/dist-packages/vllm/utiis.py . line 2378, in run_method
ERROR 04-16 02:31:40[engine.py:448]return func(*args,**kwargs)
ERROR 04-16 02:31:40[engine.py:448
ERROR 04-16 02:31:40[engine.py:448]File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker.py", line 183, in load_model
ERROR 04-16 02:31:40[engine.py:448self.model_runner.load model()
ERROR 04-16 02:31:40[engine.py:448]File "/usr/local/lib/python3.12/dist-packages/vllm/worker/mddel_runner.py", line 1113, in load_model
ERROR 04-16 02:31:40[engine.py:448]self.model = get_model(vllm_config=solf.vllm_config)
ERROR 04-16 02:31:40ERROR 04-16 02:31:40ERROR 04-16 02:31:40[engine,py:448][engine.py:448[engine.py:448return loader.load model(vilm config=vllm config)File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/init.py". line 14, in get_model
843315712-001410188791
-04-16 17:33
ERROR 04-16 02:31:40ERROR 04-16 02:31:40[engine.py:448[engine.py:448「engine.py:448File "/usr/local/lib/python3. 12/dist-packages/vllm/model_executor/model_loader/loader.py". line 452. in load_model
ERROR 04-16 02:31:40model=initiallize model(yllm config=vlim config)
ERROR 04-16 02:31:40[engine,py:448
ERROR 04-16 02:31:40[engine.py:448File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/loader.py", line 123, in initialize_model
ERROR 04-16 02:31:40[engine.py:448model_class, = get_model_architecture(modol_config)
ERROR 04-16 02:31:40ERROR 04-16 02:31:40ERROR 04-16 02:31:40ERROR 04-16 02:31:40[eng ine.py:448[engine.py:448[engine.py:448[engine.py:448n^^^^^^~^^^^^^^^^^^^^^^^^^^^^^^^^architectures = resolve_transformers_arch(model_config, architectures)File */usr/local/lib/python3.12/dist-packages/vllm/model_executor/mode?_loader/utils.py", line 104, in get_model_architecture
ERROR 04-16 02:31:40[engine .py:448File */usr/local/lib/python3. 12/dist-packages/vllm/model_executor/model_loader/utils.py", line 72, in resolve_transformers_arch
ERROR 04-16 02:31:40ERROR 04-16 02:31:40[engine,py:448][engine.py:448]raise ValueError(valueError: KimiyLForConditionalGeneration has no vLLM implementation and the Transformers implementation is not compatible with vLLM. Try settin
9 VLLM USE_V1=0.
How you are installing vllm
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.