-
-
Notifications
You must be signed in to change notification settings - Fork 11.2k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Your current environment
Docker image 0.9.2 on NVidia L40S.
(The docker image has to be modified, because librosa dependency is missing.)
services:
vllm-whisper-large-v3:
# Must modify image for <= v0.9.2
# ImportError: Please install vllm[audio] for audio support
# image: vllm/vllm-openai:v0.9.2
image: vllm/vllm-openai-audio:v0.9.2
build:
context: .
container_name: vllm-whisper-large-v3
environment:
- HF_TOKEN=$HF_TOKEN
- VLLM_NO_USAGE_STATS=1
ipc: host
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ['2']
capabilities: [ gpu ]
network_mode: host
volumes:
- /mnt/sda/huggingface:/root/.cache/huggingface
- .:/opt/vllm
command:
- --port=8006
- --disable-log-requests
- --model=openai/whisper-large-v3
- --gpu-memory-utilization=0.40
- --swap-space=5
restart: unless-stopped
# Use the base vLLM image
FROM vllm/vllm-openai:v0.9.2
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
libsndfile1 \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
RUN pip install --upgrade --no-cache-dir \
# "git+https://github.com/huggingface/transformers.git" \
"librosa>=0.10,<0.11"
🐛 Describe the bug
- Cannot start image, see log below.
- Whisper worked on 0.9.1 (regression)
- Would also be nice to add librosa to standard image - it doesnt make the image any larger (in relation to >20 GB...)
Log:
$ docker logs -f vllm-whisper-large-v3
INFO 07-09 02:08:41 [__init__.py:244] Automatically detected platform cuda.
INFO 07-09 02:08:44 [api_server.py:1395] vLLM API server version 0.9.2
INFO 07-09 02:08:44 [cli_args.py:325] non-default args: {'port': 8006, 'model': 'openai/whisper-large-v3', 'gpu_memory_utilization': 0.4, 'swap_space': 5.0, 'disable_log_requests': True}
INFO 07-09 02:08:49 [config.py:841] This model supports multiple tasks: {'generate', 'classify', 'reward', 'embed', 'transcription'}. Defaulting to 'transcription'.
INFO 07-09 02:08:49 [config.py:1472] Using max model len 448
WARNING 07-09 02:08:49 [arg_utils.py:1735] ['WhisperForConditionalGeneration'] is not supported by the V1 Engine. Falling back to V0.
INFO 07-09 02:08:50 [api_server.py:268] Started engine process with PID 266
INFO 07-09 02:08:53 [__init__.py:244] Automatically detected platform cuda.
INFO 07-09 02:08:54 [llm_engine.py:230] Initializing a V0 LLM engine (v0.9.2) with config: model='openai/whisper-large-v3', speculative_config=None, tokenizer='openai/whisper-large-v3', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=448, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=openai/whisper-large-v3, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=None, chunked_prefill_enabled=False, use_async_output_proc=True, pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":[],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":0,"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":256,"local_cache_dir":null}, use_cached_outputs=True,
INFO 07-09 02:08:55 [cuda.py:363] Using Flash Attention backend.
INFO 07-09 02:08:55 [parallel_state.py:1076] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
INFO 07-09 02:08:55 [model_runner.py:1171] Starting to load model openai/whisper-large-v3...
INFO 07-09 02:08:56 [weight_utils.py:292] Using model weights format ['*.safetensors']
INFO 07-09 02:08:57 [weight_utils.py:345] No model.safetensors.index.json found in remote.
Loading safetensors checkpoint shards: 0% Completed | 0/3 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 33% Completed | 1/3 [00:03<00:07, 3.90s/it]
Loading safetensors checkpoint shards: 67% Completed | 2/3 [00:10<00:05, 5.75s/it]
Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:12<00:00, 3.98s/it]
Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:12<00:00, 4.27s/it]
INFO 07-09 02:09:09 [default_loader.py:272] Loading weights took 12.84 seconds
INFO 07-09 02:09:10 [model_runner.py:1203] Model loading took 2.8764 GiB and 13.830811 seconds
Process SpawnProcess-1:
ERROR 07-09 02:09:12 [engine.py:458] Received a CachedWhisperTokenizerFast for argument tokenizer, but a WhisperTokenizer was expected.
ERROR 07-09 02:09:12 [engine.py:458] Traceback (most recent call last):
ERROR 07-09 02:09:12 [engine.py:458] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 446, in run_mp_engine
ERROR 07-09 02:09:12 [engine.py:458] engine = MQLLMEngine.from_vllm_config(
ERROR 07-09 02:09:12 [engine.py:458] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 133, in from_vllm_config
ERROR 07-09 02:09:12 [engine.py:458] return cls(
ERROR 07-09 02:09:12 [engine.py:458] ^^^^
ERROR 07-09 02:09:12 [engine.py:458] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 87, in __init__
ERROR 07-09 02:09:12 [engine.py:458] self.engine = LLMEngine(*args, **kwargs)
ERROR 07-09 02:09:12 [engine.py:458] ^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/llm_engine.py", line 268, in __init__
ERROR 07-09 02:09:12 [engine.py:458] self._initialize_kv_caches()
ERROR 07-09 02:09:12 [engine.py:458] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/llm_engine.py", line 413, in _initialize_kv_caches
ERROR 07-09 02:09:12 [engine.py:458] self.model_executor.determine_num_available_blocks())
ERROR 07-09 02:09:12 [engine.py:458] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 104, in determine_num_available_blocks
ERROR 07-09 02:09:12 [engine.py:458] results = self.collective_rpc("determine_num_available_blocks")
ERROR 07-09 02:09:12 [engine.py:458] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 57, in collective_rpc
ERROR 07-09 02:09:12 [engine.py:458] answer = run_method(self.driver_worker, method, args, kwargs)
ERROR 07-09 02:09:12 [engine.py:458] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458] File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 2736, in run_method
ERROR 07-09 02:09:12 [engine.py:458] return func(*args, **kwargs)
ERROR 07-09 02:09:12 [engine.py:458] ^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 07-09 02:09:12 [engine.py:458] return func(*args, **kwargs)
ERROR 07-09 02:09:12 [engine.py:458] ^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker.py", line 256, in determine_num_available_blocks
ERROR 07-09 02:09:12 [engine.py:458] self.model_runner.profile_run()
ERROR 07-09 02:09:12 [engine.py:458] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 07-09 02:09:12 [engine.py:458] return func(*args, **kwargs)
ERROR 07-09 02:09:12 [engine.py:458] ^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/enc_dec_model_runner.py", line 312, in profile_run
ERROR 07-09 02:09:12 [engine.py:458] max_mm_tokens = self.mm_registry.get_max_multimodal_tokens(
ERROR 07-09 02:09:12 [engine.py:458] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458] File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/registry.py", line 183, in get_max_multimodal_tokens
ERROR 07-09 02:09:12 [engine.py:458] return sum(self.get_max_tokens_by_modality(model_config).values())
ERROR 07-09 02:09:12 [engine.py:458] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458] File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/registry.py", line 170, in get_max_tokens_by_modality
ERROR 07-09 02:09:12 [engine.py:458] mm_limits = self.get_mm_limits_per_prompt(model_config)
ERROR 07-09 02:09:12 [engine.py:458] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458] File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/registry.py", line 206, in get_mm_limits_per_prompt
ERROR 07-09 02:09:12 [engine.py:458] processor = self.create_processor(model_config, disable_cache=False)
ERROR 07-09 02:09:12 [engine.py:458] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458] File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/registry.py", line 281, in create_processor
ERROR 07-09 02:09:12 [engine.py:458] return factories.build_processor(ctx, cache=cache)
ERROR 07-09 02:09:12 [engine.py:458] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458] File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/registry.py", line 88, in build_processor
ERROR 07-09 02:09:12 [engine.py:458] return self.processor(info, dummy_inputs_builder, cache=cache)
ERROR 07-09 02:09:12 [engine.py:458] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458] File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/processing.py", line 1152, in __init__
ERROR 07-09 02:09:12 [engine.py:458] self.data_parser = self._get_data_parser()
ERROR 07-09 02:09:12 [engine.py:458] ^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper.py", line 680, in _get_data_parser
ERROR 07-09 02:09:12 [engine.py:458] feature_extractor = self.info.get_feature_extractor()
ERROR 07-09 02:09:12 [engine.py:458] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper.py", line 643, in get_feature_extractor
ERROR 07-09 02:09:12 [engine.py:458] hf_processor = self.get_hf_processor()
ERROR 07-09 02:09:12 [engine.py:458] ^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper.py", line 637, in get_hf_processor
ERROR 07-09 02:09:12 [engine.py:458] return self.ctx.get_hf_processor(WhisperProcessor)
ERROR 07-09 02:09:12 [engine.py:458] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458] File "/usr/local/lib/python3.12/dist-packages/vllm/inputs/registry.py", line 138, in get_hf_processor
ERROR 07-09 02:09:12 [engine.py:458] return super().get_hf_processor(
ERROR 07-09 02:09:12 [engine.py:458] ^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458] File "/usr/local/lib/python3.12/dist-packages/vllm/inputs/registry.py", line 96, in get_hf_processor
ERROR 07-09 02:09:12 [engine.py:458] return cached_processor_from_config(
ERROR 07-09 02:09:12 [engine.py:458] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458] File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/processor.py", line 110, in cached_processor_from_config
ERROR 07-09 02:09:12 [engine.py:458] return cached_get_processor(
ERROR 07-09 02:09:12 [engine.py:458] ^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458] File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/processor.py", line 72, in get_processor
ERROR 07-09 02:09:12 [engine.py:458] processor = processor_factory.from_pretrained(
ERROR 07-09 02:09:12 [engine.py:458] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458] File "/usr/local/lib/python3.12/dist-packages/transformers/processing_utils.py", line 1308, in from_pretrained
ERROR 07-09 02:09:12 [engine.py:458] return cls.from_args_and_dict(args, processor_dict, **kwargs)
ERROR 07-09 02:09:12 [engine.py:458] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458] File "/usr/local/lib/python3.12/dist-packages/transformers/processing_utils.py", line 1109, in from_args_and_dict
ERROR 07-09 02:09:12 [engine.py:458] processor = cls(*args, **valid_kwargs)
ERROR 07-09 02:09:12 [engine.py:458] ^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458] File "/usr/local/lib/python3.12/dist-packages/transformers/models/whisper/processing_whisper.py", line 41, in __init__
ERROR 07-09 02:09:12 [engine.py:458] super().__init__(feature_extractor, tokenizer)
ERROR 07-09 02:09:12 [engine.py:458] File "/usr/local/lib/python3.12/dist-packages/transformers/processing_utils.py", line 551, in __init__
ERROR 07-09 02:09:12 [engine.py:458] self.check_argument_for_proper_class(attribute_name, arg)
ERROR 07-09 02:09:12 [engine.py:458] File "/usr/local/lib/python3.12/dist-packages/transformers/processing_utils.py", line 569, in check_argument_for_proper_class
ERROR 07-09 02:09:12 [engine.py:458] raise TypeError(
ERROR 07-09 02:09:12 [engine.py:458] TypeError: Received a CachedWhisperTokenizerFast for argument tokenizer, but a WhisperTokenizer was expected.
Traceback (most recent call last):
File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 460, in run_mp_engine
raise e from None
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 446, in run_mp_engine
engine = MQLLMEngine.from_vllm_config(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 133, in from_vllm_config
return cls(
^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 87, in __init__
self.engine = LLMEngine(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/llm_engine.py", line 268, in __init__
self._initialize_kv_caches()
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/llm_engine.py", line 413, in _initialize_kv_caches
self.model_executor.determine_num_available_blocks())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 104, in determine_num_available_blocks
results = self.collective_rpc("determine_num_available_blocks")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 57, in collective_rpc
answer = run_method(self.driver_worker, method, args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 2736, in run_method
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker.py", line 256, in determine_num_available_blocks
self.model_runner.profile_run()
File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/worker/enc_dec_model_runner.py", line 312, in profile_run
max_mm_tokens = self.mm_registry.get_max_multimodal_tokens(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/registry.py", line 183, in get_max_multimodal_tokens
return sum(self.get_max_tokens_by_modality(model_config).values())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/registry.py", line 170, in get_max_tokens_by_modality
mm_limits = self.get_mm_limits_per_prompt(model_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/registry.py", line 206, in get_mm_limits_per_prompt
processor = self.create_processor(model_config, disable_cache=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/registry.py", line 281, in create_processor
return factories.build_processor(ctx, cache=cache)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/registry.py", line 88, in build_processor
return self.processor(info, dummy_inputs_builder, cache=cache)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/processing.py", line 1152, in __init__
self.data_parser = self._get_data_parser()
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper.py", line 680, in _get_data_parser
feature_extractor = self.info.get_feature_extractor()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper.py", line 643, in get_feature_extractor
hf_processor = self.get_hf_processor()
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper.py", line 637, in get_hf_processor
return self.ctx.get_hf_processor(WhisperProcessor)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/inputs/registry.py", line 138, in get_hf_processor
return super().get_hf_processor(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/inputs/registry.py", line 96, in get_hf_processor
return cached_processor_from_config(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/processor.py", line 110, in cached_processor_from_config
return cached_get_processor(
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/processor.py", line 72, in get_processor
processor = processor_factory.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/transformers/processing_utils.py", line 1308, in from_pretrained
return cls.from_args_and_dict(args, processor_dict, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/transformers/processing_utils.py", line 1109, in from_args_and_dict
processor = cls(*args, **valid_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/transformers/models/whisper/processing_whisper.py", line 41, in __init__
super().__init__(feature_extractor, tokenizer)
File "/usr/local/lib/python3.12/dist-packages/transformers/processing_utils.py", line 551, in __init__
self.check_argument_for_proper_class(attribute_name, arg)
File "/usr/local/lib/python3.12/dist-packages/transformers/processing_utils.py", line 569, in check_argument_for_proper_class
raise TypeError(
TypeError: Received a CachedWhisperTokenizerFast for argument tokenizer, but a WhisperTokenizer was expected.
[rank0]:[W709 02:09:13.530824767 ProcessGroupNCCL.cpp:1476] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1495, in <module>
uvloop.run(run_server(args))
File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 109, in run
return __asyncio.run(
^^^^^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 61, in wrapper
return await main
^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1431, in run_server
await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1451, in run_server_worker
async with build_async_engine_client(args, client_config) as engine_client:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 158, in build_async_engine_client
async with build_async_engine_client_from_engine_args(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 291, in build_async_engine_client_from_engine_args
raise RuntimeError(
RuntimeError: Engine process failed to start. See stack trace for the root cause.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working