-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Your current environment
The output of `python collect_env.py`
Your output of `python collect_env.py` here
🐛 Describe the bug
INFO 04-27 14:39:59 [config.py:3574] cudagraph sizes specified by model runner [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 1
68, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256, 264, 272, 280, 288, 296, 304, 312, 320, 328, 336, 344, 352, 360, 368, 376, 384, 392, 400, 408, 416, 424, 432, 440, 448
, 456, 464, 472, 480, 488, 496, 504, 512] is overridden by config [512, 384, 256, 128, 4, 2, 1, 392, 264, 136, 8, 400, 272, 144, 16, 408, 280, 152, 24, 416, 288, 160, 32, 424, 2
96, 168, 40, 432, 304, 176, 48, 440, 312, 184, 56, 448, 320, 192, 64, 456, 328, 200, 72, 464, 336, 208, 80, 472, 344, 216, 88, 120, 480, 352, 248, 224, 96, 488, 504, 360, 232, 1
04, 496, 368, 240, 112, 376]
INFO 04-27 14:39:59 [weight_utils.py:265] Using model weights format ['*.safetensors']
Loading safetensors checkpoint shards: 0% Completed | 0/2 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 50% Completed | 1/2 [00:01<00:01, 1.55s/it]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:03<00:00, 1.69s/it]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:03<00:00, 1.67s/it]
INFO 04-27 14:40:03 [loader.py:458] Loading weights took 3.43 seconds
INFO 04-27 14:40:03 [gpu_model_runner.py:1339] Model loading took 7.1557 GiB and 4.703990 seconds
INFO 04-27 14:43:48 [gpu_model_runner.py:1612] Encoder cache will be initialized with a budget of 98304 tokens, and profiled with 1 video items of the maximum feature size.
ERROR 04-27 14:43:54 [core.py:396] EngineCore failed to start.
ERROR 04-27 14:43:54 [core.py:396] Traceback (most recent call last):
ERROR 04-27 14:43:54 [core.py:396] File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/_dynamo/symbolic_convert.py", line 1275, in IMPORT_NAME
ERROR 04-27 14:43:54 [core.py:396] value = __import__(
ERROR 04-27 14:43:54 [core.py:396] ^^^^^^^^^^^
ERROR 04-27 14:43:54 [core.py:396] ModuleNotFoundError: No module named 'vllm.vllm_flash_attn.layers'
ERROR 04-27 14:43:54 [core.py:396]
ERROR 04-27 14:43:54 [core.py:396] During handling of the above exception, another exception occurred:
ERROR 04-27 14:43:54 [core.py:396]
ERROR 04-27 14:43:54 [core.py:396] Traceback (most recent call last):
ERROR 04-27 14:43:54 [core.py:396] File "/home/zhiyuan/vllm/vllm/v1/engine/core.py", line 387, in run_engine_core
ERROR 04-27 14:43:54 [core.py:396] engine_core = EngineCoreProc(*args, **kwargs)
ERROR 04-27 14:43:54 [core.py:396] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-27 14:43:54 [core.py:396] File "/home/zhiyuan/vllm/vllm/v1/engine/core.py", line 329, in __init__
ERROR 04-27 14:43:54 [core.py:396] super().__init__(vllm_config, executor_class, log_stats,
ERROR 04-27 14:43:54 [core.py:396] File "/home/zhiyuan/vllm/vllm/v1/engine/core.py", line 71, in __init__
ERROR 04-27 14:43:54 [core.py:396] self._initialize_kv_caches(vllm_config)
ERROR 04-27 14:43:54 [core.py:396] File "/home/zhiyuan/vllm/vllm/v1/engine/core.py", line 129, in _initialize_kv_caches
ERROR 04-27 14:43:54 [core.py:396] available_gpu_memory = self.model_executor.determine_available_memory()
This only happens with Qwen/Qwen2.5-VL-7B-Instruct. I tried deepseek-ai/DeepSeek-R1-Distill-Qwen-7B, and it works fine.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working