[Model] Initialize support for Deepseek-VL2 models #11578

Isotr0py · 2024-12-28T05:20:14Z

FIX #11236

Initialize support for deepseek-vl2 series models
Note that deepseek-ai/deepseek-vl2-tiny is not supported yet because it doesn't use MLA attention.

Signed-off-by: Isotr0py <2037008807@qq.com>

github-actions · 2024-12-28T05:20:25Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: Isotr0py <2037008807@qq.com>

mergify · 2024-12-28T16:56:54Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Isotr0py.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Isotr0py <2037008807@qq.com>

Isotr0py · 2025-01-10T12:49:57Z

@csdY123 Added a check for num_nextn_predict_layers's existence before self.config.num_nextn_predict_layers, so the model should be able to load.

(Don't have device to test the full Deepseek-VL2 model right now, so your feedback is very valuable!) :)

Signed-off-by: Isotr0py <2037008807@qq.com>

Isotr0py · 2025-01-10T14:20:50Z

The DeepSeek-V3 based deepseek-vl2 model should also work now.

Outputs

$ python examples/offline_inference/offline_inference_vision_language.py -m deepseek_vl_v2
INFO 01-10 15:08:08 __init__.py:179] Automatically detected platform cuda.
INFO 01-10 15:08:10 config.py:285] Overriding HF config with {'architectures': ['DeepseekVLV2ForCausalLM']}
INFO 01-10 15:08:17 config.py:516] This model supports multiple tasks: {'generate', 'reward', 'score', 'classify', 'embed'}. Defaulting to 'generate'.
INFO 01-10 15:08:17 config.py:1022] Using fp8 data type to store kv cache. It reduces the GPU memory footprint and boosts the performance. Meanwhile, it may cause accuracy drop without a proper scaling factor
INFO 01-10 15:08:17 llm_engine.py:234] Initializing an LLM engine (v0.1.dev3959+g8d9b672) with config: model='deepseek-ai/deepseek-vl2', speculative_config=None, tokenizer='deepseek-ai/deepseek-vl2', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=4096, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=fp8, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=deepseek-ai/deepseek-vl2, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=False, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output"],"candidate_compile_sizes":[],"compile_sizes":[],"capture_sizes":[2,1],"max_capture_size":2}, use_cached_outputs=False, 
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
INFO 01-10 15:08:19 cuda.py:176] Cannot use FlashAttention-2 backend for FP8 KV cache.
WARNING 01-10 15:08:19 cuda.py:178] Please use FlashInfer backend with FP8 KV Cache for better performance by setting environment variable  VLLM_ATTENTION_BACKEND=FLASHINFER
INFO 01-10 15:08:19 cuda.py:213] Using XFormers backend.
INFO 01-10 15:08:27 model_runner.py:1094] Starting to load model deepseek-ai/deepseek-vl2...
INFO 01-10 15:08:36 weight_utils.py:253] Using model weights format ['*.safetensors']
Loading safetensors checkpoint shards:   0% Completed | 0/8 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  12% Completed | 1/8 [00:01<00:07,  1.13s/it]
Loading safetensors checkpoint shards:  25% Completed | 2/8 [00:02<00:07,  1.32s/it]
Loading safetensors checkpoint shards:  38% Completed | 3/8 [00:04<00:06,  1.37s/it]
Loading safetensors checkpoint shards:  50% Completed | 4/8 [00:04<00:04,  1.07s/it]
Loading safetensors checkpoint shards:  62% Completed | 5/8 [00:05<00:03,  1.13s/it]
Loading safetensors checkpoint shards:  75% Completed | 6/8 [00:07<00:02,  1.25s/it]
Loading safetensors checkpoint shards:  88% Completed | 7/8 [00:08<00:01,  1.32s/it]
Loading safetensors checkpoint shards: 100% Completed | 8/8 [00:09<00:00,  1.10s/it]
Loading safetensors checkpoint shards: 100% Completed | 8/8 [00:09<00:00,  1.18s/it]

INFO 01-10 15:08:46 model_runner.py:1099] Loading model weights took 51.2323 GB
WARNING 01-10 15:08:46 model_runner.py:1162] Using FP8 KV cache but no scaling factors provided. Defaulting to scaling factors of 1.0. This may lead to less accurate results!
Python version is above 3.10, patching the collections module.
Some kwargs in processor config are unused and will not have any effect: image_std, sft_format, downsample_ratio, normalize, candidate_resolutions, patch_size, image_token, add_special_token, ignore_id, image_mean, mask_prompt, pad_token. 
Add pad token = ['<｜▁pad▁｜>'] to the tokenizer
<｜▁pad▁｜>:2
Add image token = ['<image>'] to the tokenizer
<image>:128815
Add grounding-related tokens = ['<|ref|>', '<|/ref|>', '<|det|>', '<|/det|>', '<|grounding|>'] to the tokenizer with input_ids
<|ref|>:128816
<|/ref|>:128817
<|det|>:128818
<|/det|>:128819
<|grounding|>:128820
Add chat tokens = ['<|User|>', '<|Assistant|>'] to the tokenizer with input_ids
<|User|>:128821
<|Assistant|>:128822

Some kwargs in processor config are unused and will not have any effect: image_std, sft_format, downsample_ratio, normalize, candidate_resolutions, patch_size, image_token, add_special_token, ignore_id, image_mean, mask_prompt, pad_token. 
Add pad token = ['<｜▁pad▁｜>'] to the tokenizer
<｜▁pad▁｜>:2
Add image token = ['<image>'] to the tokenizer
<image>:128815
Add grounding-related tokens = ['<|ref|>', '<|/ref|>', '<|det|>', '<|/det|>', '<|grounding|>'] to the tokenizer with input_ids
<|ref|>:128816
<|/ref|>:128817
<|det|>:128818
<|/det|>:128819
<|grounding|>:128820
Add chat tokens = ['<|User|>', '<|Assistant|>'] to the tokenizer with input_ids
<|User|>:128821
<|Assistant|>:128822

You're using a CachedLlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.

INFO 01-10 15:08:58 worker.py:241] Memory profiling takes 12.19 seconds
INFO 01-10 15:08:58 worker.py:241] the current vLLM instance can use total_gpu_memory (79.15GiB) x gpu_memory_utilization (0.90) = 71.24GiB
INFO 01-10 15:08:58 worker.py:241] model weights take 51.23GiB; non_torch_memory takes 0.15GiB; PyTorch activation peak memory takes 1.18GiB; the rest of the memory reserved for KV Cache is 18.67GiB.
INFO 01-10 15:08:58 gpu_executor.py:76] # GPU blocks: 2549, # CPU blocks: 546
INFO 01-10 15:08:58 gpu_executor.py:80] Maximum concurrency for 4096 tokens per request: 9.96x
INFO 01-10 15:09:17 model_runner.py:1416] Capturing cudagraphs for decoding. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI. If out-of-memory error occurs during cudagraph capture, consider decreasing `gpu_memory_utilization` or switching to eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.
Capturing CUDA graph shapes: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:04<00:00,  2.42s/it]
INFO 01-10 15:09:22 model_runner.py:1542] Graph capturing finished in 5 secs, took 0.24 GiB
INFO 01-10 15:09:22 llm_engine.py:431] init engine (profile, create kv cache, warmup model) took 36.41 seconds
Some kwargs in processor config are unused and will not have any effect: image_std, sft_format, downsample_ratio, normalize, candidate_resolutions, patch_size, image_token, add_special_token, ignore_id, image_mean, mask_prompt, pad_token. 
Add pad token = ['<｜▁pad▁｜>'] to the tokenizer
<｜▁pad▁｜>:2
Add image token = ['<image>'] to the tokenizer
<image>:128815
Add grounding-related tokens = ['<|ref|>', '<|/ref|>', '<|det|>', '<|/det|>', '<|grounding|>'] to the tokenizer with input_ids
<|ref|>:128816
<|/ref|>:128817
<|det|>:128818
<|/det|>:128819
<|grounding|>:128820
Add chat tokens = ['<|User|>', '<|Assistant|>'] to the tokenizer with input_ids
<|User|>:128821
<|Assistant|>:128822

You're using a CachedLlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
Processed prompts: 100%|████████████████████████████████████████████████████████████████████████████████| 4/4 [00:07<00:00,  1.78s/it, est. speed input: 802.93 toks/s, output: 25.69 toks/s]
The image shows a view of a tall tower, likely a communications or observation tower, surrounded by cherry blossom trees in full bloom. The sky is clear and blue, providing a beautiful backdrop to the scene.
The image features a tall tower, likely a communications or observation tower, surrounded by blooming cherry blossoms. The blossoms are in the foreground, with the tower rising into the background. The sky is clear and blue, providing a vibrant backdrop.
The image shows a view of a tall tower, likely a skyscraper or observation tower, with cherry blossoms in the foreground. The tower is surrounded by a clear blue sky, and the cherry blossoms are in full bloom, creating a beautiful and vibrant scene.
The image shows a view of a tall tower with a blue sky in the background. The foreground is filled with pink cherry blossoms, creating a beautiful contrast between the natural and man-made elements.

mergify · 2025-01-10T15:51:16Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Isotr0py.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Isotr0py <2037008807@qq.com>

docs/source/models/supported_models.md

DarkLight1337

Otherwise LGTM. As per offline discussion, we can work on deepseek-ai/deepseek-vl2-tiny and the inner timm model in another PR.

tests/models/registry.py

Signed-off-by: Isotr0py <2037008807@qq.com>

Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

Signed-off-by: Isotr0py <2037008807@qq.com>

Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

Swipe4057 · 2025-01-14T07:23:22Z

CUDA_VISIBLE_DEVICES=1 python -m vllm.entrypoints.openai.api_server --model /data/models/deepseek-vl2 --served-model-name deepseek-vl2 --gpu_memory_utilization 0.9 --quantization fp8 --max-model-len 4096 --disable-log-requests

Result:

Process SpawnProcess-1:
Traceback (most recent call last):
File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/data/venvs/lib/vllm/vllm/engine/multiprocessing/engine.py", line 389, in run_mp_engine
raise e
File "/data/venvs/lib/vllm/vllm/engine/multiprocessing/engine.py", line 378, in run_mp_engine
engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/venvs/lib/vllm/vllm/engine/multiprocessing/engine.py", line 116, in from_engine_args
engine_config = engine_args.create_engine_config(usage_context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/venvs/lib/vllm/vllm/engine/arg_utils.py", line 1043, in create_engine_config
model_config = self.create_model_config()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/venvs/lib/vllm/vllm/engine/arg_utils.py", line 969, in create_model_config
return ModelConfig(
^^^^^^^^^^^^
File "/data/venvs/lib/vllm/vllm/config.py", line 342, in init
self.multimodal_config = self._init_multimodal_config(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/venvs/lib/vllm/vllm/config.py", line 398, in _init_multimodal_config
if ModelRegistry.is_multimodal_model(architectures):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/venvs/lib/vllm/vllm/model_executor/models/registry.py", line 429, in is_multimodal_model
model_cls, _ = self.inspect_model_cls(architectures)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/venvs/lib/vllm/vllm/model_executor/models/registry.py", line 384, in inspect_model_cls
for arch in architectures:
TypeError: 'NoneType' object is not iterable

DarkLight1337 · 2025-01-14T07:28:09Z

CUDA_VISIBLE_DEVICES=1 python -m vllm.entrypoints.openai.api_server --model /data/models/deepseek-vl2 --served-model-name deepseek-vl2 --gpu_memory_utilization 0.9 --quantization fp8 --max-model-len 4096 --disable-log-requests

Result:

Process SpawnProcess-1: Traceback (most recent call last): File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/data/venvs/lib/vllm/vllm/engine/multiprocessing/engine.py", line 389, in run_mp_engine raise e File "/data/venvs/lib/vllm/vllm/engine/multiprocessing/engine.py", line 378, in run_mp_engine engine = MQLLMEngine.from_engine_args(engine_args=engine_args, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/venvs/lib/vllm/vllm/engine/multiprocessing/engine.py", line 116, in from_engine_args engine_config = engine_args.create_engine_config(usage_context) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/venvs/lib/vllm/vllm/engine/arg_utils.py", line 1043, in create_engine_config model_config = self.create_model_config() ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/venvs/lib/vllm/vllm/engine/arg_utils.py", line 969, in create_model_config return ModelConfig( ^^^^^^^^^^^^ File "/data/venvs/lib/vllm/vllm/config.py", line 342, in init self.multimodal_config = self._init_multimodal_config( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/venvs/lib/vllm/vllm/config.py", line 398, in _init_multimodal_config if ModelRegistry.is_multimodal_model(architectures): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/venvs/lib/vllm/vllm/model_executor/models/registry.py", line 429, in is_multimodal_model model_cls, _ = self.inspect_model_cls(architectures) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/venvs/lib/vllm/vllm/model_executor/models/registry.py", line 384, in inspect_model_cls for arch in architectures: TypeError: 'NoneType' object is not iterable

Can you show the full logs?

Isotr0py · 2025-01-14T09:58:42Z

File "/data/venvs/lib/vllm/vllm/model_executor/models/registry.py", line 384, in inspect_model_cls
for arch in architectures:
TypeError: 'NoneType' object is not iterable

The config.json in Deepseek-VL2's model repos are all missing the architectures field, so you need to specify --hf_overrides '{"architectures": ["DeepseekVLV2ForCausalLM"]}' or add architectures": ["DeepseekVLV2ForCausalLM"] to the config file manually.

iamweiliu · 2025-01-14T10:12:26Z

File "/data/venvs/lib/vllm/vllm/model_executor/models/registry.py", line 384, in inspect_model_cls
for arch in architectures:
TypeError: 'NoneType' object is not iterable

The config.json in Deepseek-VL2's model repos are all missing the architectures field, so you need to specify --hf_overrides '{"architectures": ["DeepseekVLV2ForCausalLM"]}' or add architectures": ["DeepseekVLV2ForCausalLM"] to the config file manually.

Save my life!

iamweiliu · 2025-01-14T10:15:58Z

ERROR 01-14 18:14:14 engine.py:387] AttributeError: 'DeepseekVLV2Config' object has no attribute 'hidden_size'

Isotr0py · 2025-01-14T10:32:04Z

@iamweiliu Can you provide the full logs? hidden_size should not be got from DeepseekVLV2Config because it doesn't have this field.

iamweiliu · 2025-01-14T14:59:45Z

@iamweiliu Can you provide the full logs? hidden_size should not be got from DeepseekVLV2Config because it doesn't have this field.

I already fix it. Just install https://github.com/Isotr0py/DeepSeek-VL2.

* [Bugfix][V1] Fix molmo text-only inputs (vllm-project#11676) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> * [Kernel] Move attn_type to Attention.__init__() (vllm-project#11690) Signed-off-by: Chen Zhang <zhangch99@outlook.com> * [V1] Extend beyond image modality and support mixed-modality inference with Llava-OneVision (vllm-project#11685) Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Bugfix] Fix LLaVA-NeXT feature size precision error (for real) (vllm-project#11772) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Model] Future-proof Qwen2-Audio multi-modal processor (vllm-project#11776) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [XPU] Make pp group initilized for pipeline-parallelism (vllm-project#11648) Signed-off-by: yisheng <yi.sheng@intel.com> * [Doc][3/N] Reorganize Serving section (vllm-project#11766) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Kernel][LoRA]Punica prefill kernels fusion (vllm-project#11234) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: Abatom <abzhonghua@gmail.com> Co-authored-by: Zhonghua Deng <abatom@163.com> * [Bugfix] Update attention interface in `Whisper` (vllm-project#11784) Signed-off-by: Roger Wang <ywang@roblox.com> * [CI] Fix neuron CI and run offline tests (vllm-project#11779) Signed-off-by: Liangfu Chen <liangfc@amazon.com> * fix init error for MessageQueue when n_local_reader is zero (vllm-project#11768) * [Doc] Create a vulnerability management team (vllm-project#9925) Signed-off-by: Russell Bryant <rbryant@redhat.com> * [CI][CPU] adding build number to docker image name (vllm-project#11788) Signed-off-by: Yuan Zhou <yuan.zhou@intel.com> * [V1][Doc] Update V1 support for `LLaVa-NeXT-Video` (vllm-project#11798) Signed-off-by: Roger Wang <ywang@roblox.com> * [Bugfix] Comprehensively test and fix LLaVA-NeXT feature size calculation (vllm-project#11800) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [doc] add doc to explain how to use uv (vllm-project#11773) Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> * [V1] Support audio language models on V1 (vllm-project#11733) Signed-off-by: Roger Wang <ywang@roblox.com> * [doc] update how pip can install nightly wheels (vllm-project#11806) Signed-off-by: youkaichao <youkaichao@gmail.com> * [Doc] Add note to `gte-Qwen2` models (vllm-project#11808) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [optimization] remove python function call for custom op (vllm-project#11750) Signed-off-by: youkaichao <youkaichao@gmail.com> * [Bugfix] update the prefix for qwen2 (vllm-project#11795) Co-authored-by: jiadi.jjd <jiadi.jjd@antgroup.com> * [Doc]Add documentation for using EAGLE in vLLM (vllm-project#11417) Signed-off-by: Sourashis Roy <sroy@roblox.com> * [Bugfix] Significant performance drop on CPUs with --num-scheduler-steps > 1 (vllm-project#11794) * [Doc] Group examples into categories (vllm-project#11782) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> * [Bugfix] Fix image input for Pixtral-HF (vllm-project#11741) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Misc] sort torch profiler table by kernel timing (vllm-project#11813) * Remove the duplicate imports of MultiModalKwargs and PlaceholderRange… (vllm-project#11824) * Fixed docker build for ppc64le (vllm-project#11518) Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com> * [OpenVINO] Fixed Docker.openvino build (vllm-project#11732) Signed-off-by: Ilya Lavrenov <ilya.lavrenov@intel.com> * [Bugfix] Add checks for LoRA and CPU offload (vllm-project#11810) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> * [Docs] reorganize sponsorship page (vllm-project#11639) Signed-off-by: simon-mo <simon.mo@hey.com> * [Bug] Fix pickling of `ModelConfig` when RunAI Model Streamer is used (vllm-project#11825) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [misc] improve memory profiling (vllm-project#11809) Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> * [doc] update wheels url (vllm-project#11830) Signed-off-by: youkaichao <youkaichao@gmail.com> * [Docs] Update sponsor name: 'Novita' to 'Novita AI' (vllm-project#11833) * [Hardware][Apple] Native support for macOS Apple Silicon (vllm-project#11696) Signed-off-by: Wallas Santos <wallashss@ibm.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> * [torch.compile] consider relevant code in compilation cache (vllm-project#11614) Signed-off-by: youkaichao <youkaichao@gmail.com> * [VLM] Reorganize profiling/processing-related code (vllm-project#11812) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Doc] Move examples into categories (vllm-project#11840) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> * [Doc][4/N] Reorganize API Reference (vllm-project#11843) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [CI/Build][Bugfix] Fix CPU CI image clean up (vllm-project#11836) Signed-off-by: jiang1.li <jiang1.li@intel.com> * [Bugfix][XPU] fix silu_and_mul (vllm-project#11823) Signed-off-by: yan ma <yan.ma@intel.com> * [Misc] Move some model utils into vision file (vllm-project#11848) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Doc] Expand Multimodal API Reference (vllm-project#11852) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Misc]add some explanations for BlockHashType (vllm-project#11847) * [TPU][Quantization] TPU `W8A8` (vllm-project#11785) Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> * [Kernel][Triton][AMD] Use block size heuristic for avg 2.8x speedup for int8 models (vllm-project#11698) Signed-off-by: Randall Smith <Randall.Smith@amd.com> * [Docs] Add Google Cloud Meetup (vllm-project#11864) * [CI] Turn on basic correctness tests for V1 (vllm-project#10864) * treat do_lower_case in the same way as the sentence-transformers library (vllm-project#11815) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> * [Doc] Recommend uv and python 3.12 for quickstart guide (vllm-project#11849) Signed-off-by: mgoin <michael@neuralmagic.com> * [Misc] Move `print_*_once` from utils to logger (vllm-project#11298) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com> Co-authored-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com> * [Doc] Intended links Python multiprocessing library (vllm-project#11878) * [perf]fix current stream (vllm-project#11870) Signed-off-by: youkaichao <youkaichao@gmail.com> * [Bugfix] Override dunder methods of placeholder modules (vllm-project#11882) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Bugfix] fix beam search input errors and latency benchmark script (vllm-project#11875) Signed-off-by: Ye Qi <yeq@meta.com> Co-authored-by: yeq <yeq@devgpu004.lla3.facebook.com> * [Doc] Add model development API Reference (vllm-project#11884) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [platform] Allow platform specify attention backend (vllm-project#11609) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Mengqing Cao <cmq0113@163.com> Co-authored-by: Mengqing Cao <cmq0113@163.com> * [ci]try to fix flaky multi-step tests (vllm-project#11894) Signed-off-by: youkaichao <youkaichao@gmail.com> * [Misc] Provide correct Pixtral-HF chat template (vllm-project#11891) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Docs] Add Modal to deployment frameworks (vllm-project#11907) * [Doc][5/N] Move Community and API Reference to the bottom (vllm-project#11896) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Simon Mo <simon.mo@hey.com> * [VLM] Enable tokenized inputs for merged multi-modal processor (vllm-project#11900) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Doc] Show default pooling method in a table (vllm-project#11904) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [torch.compile] Hide KV cache behind torch.compile boundary (vllm-project#11677) Signed-off-by: Chen Zhang <zhangch99@outlook.com> * [Bugfix] Validate lora adapters to avoid crashing server (vllm-project#11727) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> * [BUGFIX] Fix `UnspecifiedPlatform` package name (vllm-project#11916) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * [ci] fix gh200 tests (vllm-project#11919) Signed-off-by: youkaichao <youkaichao@gmail.com> * [misc] remove python function call for custom activation op (vllm-project#11885) Co-authored-by: youkaichao <youkaichao@gmail.com> * [platform] support pytorch custom op pluggable (vllm-project#11328) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> * Replace "online inference" with "online serving" (vllm-project#11923) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> * [ci] Fix sampler tests (vllm-project#11922) Signed-off-by: youkaichao <youkaichao@gmail.com> * [Doc] [1/N] Initial guide for merged multi-modal processor (vllm-project#11925) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [platform] support custom torch.compile backend key (vllm-project#11318) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com> * [Doc] Rename offline inference examples (vllm-project#11927) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> * [Docs] Fix docstring in `get_ip` function (vllm-project#11932) Signed-off-by: Kuntai Du <kuntai@uchicago.edu> * Doc fix in `benchmark_long_document_qa_throughput.py` (vllm-project#11933) Signed-off-by: Kuntai Du <kuntai@uchicago.edu> * [Hardware][CPU] Support MOE models on x86 CPU (vllm-project#11831) Signed-off-by: jiang1.li <jiang1.li@intel.com> * [Misc] Clean up debug code in Deepseek-V3 (vllm-project#11930) Signed-off-by: Isotr0py <2037008807@qq.com> * [Misc] Update benchmark_prefix_caching.py fixed example usage (vllm-project#11920) Signed-off-by: Ren MinMin <renmm6@chinaunicom.cn> Co-authored-by: Ren MinMin <renmm6@chinaunicom.cn> * [Bugfix] Check that number of images matches number of <|image|> tokens with mllama (vllm-project#11939) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> * [mypy] Fix mypy warnings in api_server.py (vllm-project#11941) Signed-off-by: Fred Reiss <frreiss@us.ibm.com> * [ci] fix broken distributed-tests-4-gpus (vllm-project#11937) Signed-off-by: youkaichao <youkaichao@gmail.com> * [Bugfix][SpecDecode] Adjust Eagle model architecture to align with intended design (vllm-project#11672) Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com> * [Bugfix] fused_experts_impl wrong compute type for float32 (vllm-project#11921) Signed-off-by: shaochangxu.scx <shaochangxu.scx@antgroup.com> Co-authored-by: shaochangxu.scx <shaochangxu.scx@antgroup.com> * [CI/Build] Move model-specific multi-modal processing tests (vllm-project#11934) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Doc] Basic guide for writing unit tests for new models (vllm-project#11951) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Bugfix] Fix RobertaModel loading (vllm-project#11940) Signed-off-by: NickLucche <nlucches@redhat.com> * [Model] Add cogagent model support vLLM (vllm-project#11742) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com> * [V1] Avoid sending text prompt to core engine (vllm-project#11963) Signed-off-by: Roger Wang <ywang@roblox.com> * [CI/Build] Add markdown linter (vllm-project#11857) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> * [Model] Initialize support for Deepseek-VL2 models (vllm-project#11578) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> * [Hardware][CPU] Multi-LoRA implementation for the CPU backend (vllm-project#11100) Signed-off-by: Akshat Tripathi <akshat@krai.ai> Signed-off-by: Oleg Mosalov <oleg@krai.ai> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Oleg Mosalov <oleg@krai.ai> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Isotr0py <2037008807@qq.com> * [Hardware][TPU] workaround fix for MoE on TPU (vllm-project#11764) * [V1][Core][1/n] Logging and Metrics (vllm-project#11962) Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> * [Model] Support GGUF models newly added in `transformers` 4.46.0 (vllm-project#9685) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> * [V1] [2/n] Logging and Metrics - `OutputProcessor` Abstraction (vllm-project#11973) Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> * [MISC] fix typo in kv transfer send recv test (vllm-project#11983) * [Bug] Fix usage of `.transpose()` and `.view()` consecutively. (vllm-project#11979) * [CI][Spec Decode] fix: broken test for EAGLE model (vllm-project#11972) Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com> * [Misc] Fix Deepseek V2 fp8 kv-scale remapping (vllm-project#11947) Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu> * [Misc]Minor Changes about Worker (vllm-project#11555) Signed-off-by: Chenguang Li <757486878@qq.com> * [platform] add ray_device_key (vllm-project#11948) Signed-off-by: youkaichao <youkaichao@gmail.com> * Fix Max Token ID for Qwen-VL-Chat (vllm-project#11980) Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> * [Kernel] unified_attention for Attention.forward (vllm-project#11967) Signed-off-by: Chen Zhang <zhangch99@outlook.com> * [Doc][V1] Update model implementation guide for V1 support (vllm-project#11998) Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> * [Doc] Organise installation documentation into categories and tabs (vllm-project#11935) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> * [platform] add device_control env var (vllm-project#12009) Signed-off-by: youkaichao <youkaichao@gmail.com> * [Platform] Move get_punica_wrapper() function to Platform (vllm-project#11516) Signed-off-by: Shanshan Shen <467638484@qq.com> * bugfix: Fix signature mismatch in benchmark's `get_tokenizer` function (vllm-project#11982) Signed-off-by: elijah <f1renze.142857@gmail.com> * Using list * Revert "[misc] improve memory profiling (vllm-project#11809)" This reverts commit 889e662. * Trying to make scales work with compileable attention * Docs lint --------- Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: yisheng <yi.sheng@intel.com> Signed-off-by: Abatom <abzhonghua@gmail.com> Signed-off-by: Liangfu Chen <liangfc@amazon.com> Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: Yuan Zhou <yuan.zhou@intel.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Sourashis Roy <sroy@roblox.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com> Signed-off-by: Ilya Lavrenov <ilya.lavrenov@intel.com> Signed-off-by: simon-mo <simon.mo@hey.com> Signed-off-by: Wallas Santos <wallashss@ibm.com> Signed-off-by: jiang1.li <jiang1.li@intel.com> Signed-off-by: yan ma <yan.ma@intel.com> Signed-off-by: Randall Smith <Randall.Smith@amd.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com> Signed-off-by: Ye Qi <yeq@meta.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Mengqing Cao <cmq0113@163.com> Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Kuntai Du <kuntai@uchicago.edu> Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: Ren MinMin <renmm6@chinaunicom.cn> Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Signed-off-by: Fred Reiss <frreiss@us.ibm.com> Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com> Signed-off-by: shaochangxu.scx <shaochangxu.scx@antgroup.com> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> Signed-off-by: Akshat Tripathi <akshat@krai.ai> Signed-off-by: Oleg Mosalov <oleg@krai.ai> Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu> Signed-off-by: Chenguang Li <757486878@qq.com> Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> Signed-off-by: Shanshan Shen <467638484@qq.com> Signed-off-by: elijah <f1renze.142857@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: YiSheng5 <yi.sheng@intel.com> Co-authored-by: Zhonghua Deng <abatom@163.com> Co-authored-by: Liangfu Chen <liangfc@amazon.com> Co-authored-by: XiaobingZhang <xiaobingzhangupc@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Yuan <yuan.zhou@intel.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: jiangjiadi <34134495+jiangjiadi@users.noreply.github.com> Co-authored-by: jiadi.jjd <jiadi.jjd@antgroup.com> Co-authored-by: sroy745 <142070531+sroy745@users.noreply.github.com> Co-authored-by: Jie Fu (傅杰) <jiefu@tencent.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Divakar Verma <137818590+divakar-amd@users.noreply.github.com> Co-authored-by: WangErXiao <863579016@qq.com> Co-authored-by: Nishidha <nishidha.panpaliya@partner.ibm.com> Co-authored-by: Ilya Lavrenov <ilya.lavrenov@intel.com> Co-authored-by: Simon Mo <simon.mo@hey.com> Co-authored-by: Wallas Henrique <wallashss@users.noreply.github.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> Co-authored-by: Li, Jiang <jiang1.li@intel.com> Co-authored-by: Yan Ma <yan.ma@intel.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: rasmith <Randall.Smith@amd.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Maximilien de Bayser <mbayser@br.ibm.com> Co-authored-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com> Co-authored-by: Guspan Tanadi <36249910+guspan-tanadi@users.noreply.github.com> Co-authored-by: Ye (Charlotte) Qi <ye.charlotte.qi@gmail.com> Co-authored-by: yeq <yeq@devgpu004.lla3.facebook.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: Mengqing Cao <cmq0113@163.com> Co-authored-by: Charles Frye <cfrye59@gmail.com> Co-authored-by: Joe Runde <Joseph.Runde@ibm.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> Co-authored-by: cennn <61925104+cennn@users.noreply.github.com> Co-authored-by: Kuntai Du <kuntai@uchicago.edu> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: minmin <rmm0811@gmail.com> Co-authored-by: Ren MinMin <renmm6@chinaunicom.cn> Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by: Fred Reiss <frreiss@us.ibm.com> Co-authored-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com> Co-authored-by: shaochangxu <85155497+shaochangxu@users.noreply.github.com> Co-authored-by: shaochangxu.scx <shaochangxu.scx@antgroup.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Co-authored-by: sixgod <evethwillbeok@outlook.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Rafael Vasquez <rafvasq21@gmail.com> Co-authored-by: Akshat Tripathi <Akshat.tripathi6568@gmail.com> Co-authored-by: Oleg Mosalov <oleg@krai.ai> Co-authored-by: Avshalom Manevich <12231371+avshalomman@users.noreply.github.com> Co-authored-by: Yangcheng Li <liyangcheng.lyc@alibaba-inc.com> Co-authored-by: Siyuan Li <94890248+liaoyanqing666@users.noreply.github.com> Co-authored-by: Concurrensee <yida.wu@amd.com> Co-authored-by: Chenguang Li <757486878@qq.com> Co-authored-by: Alex Brooks <alex.brooks@ibm.com> Co-authored-by: Shanshan Shen <467638484@qq.com> Co-authored-by: elijah <30852919+e1ijah1@users.noreply.github.com>

* [Misc] Move weights mapper (vllm-project#11443) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> * [Bugfix] Fix issues in CPU build Dockerfile. Fixes vllm-project#9182 (vllm-project#11435) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * [Model] Automatic conversion of classification and reward models (vllm-project#11469) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [V1] Unify VLLM_ENABLE_V1_MULTIPROCESSING handling in RayExecutor (vllm-project#11472) * [Misc] Update disaggregation benchmark scripts and test logs (vllm-project#11456) Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com> * [Frontend] Enable decord to load video from base64 (vllm-project#11492) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Doc] Improve GitHub links (vllm-project#11491) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Misc] Move some multimodal utils to modality-specific modules (vllm-project#11494) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * Mypy checking for vllm/compilation (vllm-project#11496) Signed-off-by: lucast2021 <lucast2021@headroyce.org> Co-authored-by: lucast2021 <lucast2021@headroyce.org> * [Misc][LoRA] Fix LoRA weight mapper (vllm-project#11495) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> * [Doc] Add `QVQ` and `QwQ` to the list of supported models (vllm-project#11509) Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> * [V1] Adding min tokens/repetition/presence/frequence penalties to V1 sampler (vllm-project#10681) Signed-off-by: Sourashis Roy <sroy@roblox.com> Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> * [Model] Modify MolmoForCausalLM MLP (vllm-project#11510) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> * [Misc] Add placeholder module (vllm-project#11501) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Doc] Add video example to openai client for multimodal (vllm-project#11521) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> * [1/N] API Server (Remove Proxy) (vllm-project#11529) * [Model] [Quantization] Support deepseek_v3 w8a8 fp8 block-wise quantization (vllm-project#11523) Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: simon-mo <simon.mo@hey.com> Signed-off-by: simon-mo <xmo@berkeley.edu> Co-authored-by: simon-mo <simon.mo@hey.com> Co-authored-by: simon-mo <xmo@berkeley.edu> Co-authored-by: HandH1998 <1335248067@qq.com> * [2/N] API Server: Avoid ulimit footgun (vllm-project#11530) * Deepseek v3 (vllm-project#11502) Signed-off-by: mgoin <michael@neuralmagic.com> Co-authored-by: mgoin <michael@neuralmagic.com> Co-authored-by: robertgshaw2-neuralmagic <rshaw@neuralmagic.com> * [Docs] Document Deepseek V3 support (vllm-project#11535) Signed-off-by: simon-mo <simon.mo@hey.com> * Update openai_compatible_server.md (vllm-project#11536) Co-authored-by: Simon Mo <simon.mo@hey.com> * [V1] Use FlashInfer Sampling Kernel for Top-P & Top-K Sampling (vllm-project#11394) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> * [V1] Fix yapf (vllm-project#11538) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> * [CI] Fix broken CI (vllm-project#11543) * [misc] fix typing (vllm-project#11540) Signed-off-by: youkaichao <youkaichao@gmail.com> * [V1][3/N] API Server: Reduce Task Switching + Handle Abort Properly (vllm-project#11534) * [BugFix] Fix quantization for all other methods (vllm-project#11547) * [Platform] Move model arch check to platform (vllm-project#11503) Signed-off-by: Mengqing Cao <cmq0113@163.com> * Update deploying_with_k8s.md with AMD ROCm GPU example (vllm-project#11465) Signed-off-by: Alex He <alehe@amd.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> * [Bugfix] Fix TeleChat2ForCausalLM weights mapper (vllm-project#11546) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> * [Misc] Abstract the logic for reading and writing media content (vllm-project#11527) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Doc] Add xgrammar in doc (vllm-project#11549) Signed-off-by: ccjincong <chenjincong11@gmail.com> * [VLM] Support caching in merged multi-modal processor (vllm-project#11396) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [MODEL] LoRA support for Jamba model (vllm-project#11209) Signed-off-by: Erez Schwartz <erezs@ai21.com> * [Misc]Add BNB quantization for MolmoForCausalLM (vllm-project#11551) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> * [Misc] Improve BNB loader to handle mixture of sharded and merged weights with same suffix (vllm-project#11566) Signed-off-by: Isotr0py <2037008807@qq.com> * [Bugfix] Fix for ROCM compressed tensor support (vllm-project#11561) * [Doc] Update mllama example based on official doc (vllm-project#11567) Signed-off-by: Chen Zhang <zhangch99@outlook.com> * [V1] [4/N] API Server: ZMQ/MP Utilities (vllm-project#11541) * [Bugfix] Last token measurement fix (vllm-project#11376) Signed-off-by: rajveerb <46040700+rajveerb@users.noreply.github.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> * [Model] Support InternLM2 Reward models (vllm-project#11571) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> * [Model] Remove hardcoded image tokens ids from Pixtral (vllm-project#11582) Signed-off-by: Roger Wang <ywang@roblox.com> * [Hardware][AMD]: Replace HIPCC version with more precise ROCm version (vllm-project#11515) Signed-off-by: hjwei <hjwei_xd@163.com> * [V1][Minor] Set pin_memory=False for token_ids_cpu tensor (vllm-project#11581) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> * [Doc] Minor documentation fixes (vllm-project#11580) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [bugfix] interleaving sliding window for cohere2 model (vllm-project#11583) Signed-off-by: youkaichao <youkaichao@gmail.com> * [V1] [5/N] API Server: unify `Detokenizer` and `EngineCore` input (vllm-project#11545) Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> * [Doc] Convert list tables to MyST (vllm-project#11594) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [v1][bugfix] fix cudagraph with inplace buffer assignment (vllm-project#11596) Signed-off-by: youkaichao <youkaichao@gmail.com> * [Misc] KV cache transfer connector registry (vllm-project#11481) Signed-off-by: KuntaiDu <kuntai@uchicago.edu> * Remove print statement in DeepseekScalingRotaryEmbedding (vllm-project#11604) * [v1] fix compilation cache (vllm-project#11598) Signed-off-by: youkaichao <youkaichao@gmail.com> * [Docker] bump up neuron sdk v2.21 (vllm-project#11593) Signed-off-by: Liangfu Chen <liangfc@amazon.com> * [Build][Kernel] Update CUTLASS to v3.6.0 (vllm-project#11607) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * [CI/Build][CPU] Fix CPU CI by lazy importing triton FP8 kernels (vllm-project#11618) Signed-off-by: jiang1.li <jiang1.li@intel.com> * [platforms] enable platform plugins (vllm-project#11602) Signed-off-by: youkaichao <youkaichao@gmail.com> * [VLM] Abstract out multi-modal data parsing in merged processor (vllm-project#11620) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [V1] [6/N] API Server: Better Shutdown (vllm-project#11586) * [Bugfix] Validate and concatenate image embeddings in MiniCPMVBaseModel (vllm-project#11631) * [benchmark] Remove dependency for H100 benchmark step (vllm-project#11572) * [Model][LoRA]LoRA support added for MolmoForCausalLM (vllm-project#11439) Signed-off-by: Matthias Vogler <matthias.vogler@joesecurity.org> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Matthias Vogler <matthias.vogler@joesecurity.org> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> * [Bugfix] Fix OpenAI parallel sampling when using xgrammar (vllm-project#11637) Signed-off-by: mgoin <michael@neuralmagic.com> * [Misc][LoRA] Support Rank Stabilized LoRA (RSLoRA) (vllm-project#6909) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> * [Bugfix] Move the _touch(computed_blocks) call in the allocate_slots method to after the check for allocating new blocks. (vllm-project#11565) * [V1] Simpify vision block hash for prefix caching by removing offset from hash (vllm-project#11646) * [V1][VLM] V1 support for selected single-image models. (vllm-project#11632) Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Isotr0py <2037008807@qq.com> * [Benchmark] Add benchmark script for CPU offloading (vllm-project#11533) Signed-off-by: ApostaC <yihua98@uchicago.edu> Co-authored-by: KuntaiDu <kuntai@uchicago.edu> * [Bugfix][Refactor] Unify model management in frontend (vllm-project#11660) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> * [VLM] Add max-count checking in data parser for single image models (vllm-project#11661) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Roger Wang <ywang@roblox.com> * [Misc] Optimize Qwen2-VL LoRA test (vllm-project#11663) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> * [Misc] Replace space with - in the file names (vllm-project#11667) Signed-off-by: Lu Fang <lufang@fb.com> * [Doc] Fix typo (vllm-project#11666) Signed-off-by: Kazuhiro Serizawa <nserihiro@gmail.com> * [V1] Implement Cascade Attention (vllm-project#11635) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> * [VLM] Move supported limits and max tokens to merged multi-modal processor (vllm-project#11669) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com> * [VLM][Bugfix] Multi-modal processor compatible with V1 multi-input (vllm-project#11674) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [mypy] Pass type checking in vllm/inputs (vllm-project#11680) Signed-off-by: Tobias Pitters <tobias.pitters@gmail.com> * [VLM] Merged multi-modal processor for LLaVA-NeXT (vllm-project#11682) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * According to vllm.EngineArgs, the name should be distributed_executor_backend (vllm-project#11689) * [Bugfix] Free cross attention block table for preempted-for-recompute sequence group. (vllm-project#10013) Signed-off-by: Kathy Yu <feiyangyu@google.com> * [V1][Minor] Optimize token_ids_cpu copy (vllm-project#11692) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> * [Bugfix] Change kv scaling factor by param json on nvidia gpu (vllm-project#11688) Signed-off-by: bjmsong <bjmsong@126.com> Co-authored-by: bjmsong <bjmsong@126.com> * Resolve race conditions in Marlin kernel (vllm-project#11493) Signed-off-by: wchen61 <wchen61@foxmail.com> * [Misc] Minimum requirements for SageMaker compatibility (vllm-project#11576) * Update default max_num_batch_tokens for chunked prefill (vllm-project#11694) * [Bugfix] Check chain_speculative_sampling before calling it (vllm-project#11673) Signed-off-by: Lu Fang <lufang@fb.com> * [perf-benchmark] Fix dependency for steps in benchmark pipeline (vllm-project#11710) * [Model] Whisper model implementation (vllm-project#11280) Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com> * [V1] Simplify Shutdown (vllm-project#11659) * [Bugfix] Fix ColumnParallelLinearWithLoRA slice (vllm-project#11708) Signed-off-by: ZincCat <zincchloride@outlook.com> * [V1] Improve TP>1 Error Handling + Stack Trace (vllm-project#11721) Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> * [Misc]Add BNB quantization for Qwen2VL (vllm-project#11719) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com> * Update requirements-tpu.txt to support python 3.9 and 3.11 (vllm-project#11695) Signed-off-by: mgoin <michael@neuralmagic.com> * [V1] Chore: cruft removal (vllm-project#11724) * [V1] log GPU blocks num for MultiprocExecutor (vllm-project#11656) * Update tool_calling.md (vllm-project#11701) * Update bnb.md with example for OpenAI (vllm-project#11718) * [V1] Add `RayExecutor` support for `AsyncLLM` (api server) (vllm-project#11712) * [V1] Add kv cache utils tests. (vllm-project#11513) Signed-off-by: xcnick <xcnick0412@gmail.com> * [Core][Bugfix] Use correct device to initialize GPU data during CUDA-graph-capture (vllm-project#11233) Signed-off-by: Yan Burman <yanburman@users.noreply.github.com> Signed-off-by: Ido Asraff <idoa@atero.ai> * [VLM] Merged multi-modal processors for LLaVA-NeXT-Video and LLaVA-OneVision (vllm-project#11717) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Bugfix] Fix precision error in LLaVA-NeXT (vllm-project#11735) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Model] Remove unnecessary weight initialization logic (vllm-project#11736) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com> * [Bugfix][V1] Fix test_kv_cache_utils.py (vllm-project#11738) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> * [MISC] Replace c10::optional with std::optional (vllm-project#11730) Signed-off-by: Lu Fang <lufang@fb.com> * [distributed] remove pynccl's redundant stream (vllm-project#11744) * fix: [doc] fix typo (vllm-project#11751) Co-authored-by: Lancer <maruixiang6688@gmail.com> * [Frontend] Improve `StreamingResponse` Exception Handling (vllm-project#11752) * [distributed] remove pynccl's redundant change_state (vllm-project#11749) * [Doc] [1/N] Reorganize Getting Started section (vllm-project#11645) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Bugfix] Remove block size constraint (vllm-project#11723) * [V1] Add BlockTable class (vllm-project#11693) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> * [Misc] Fix typo for valid_tool_parses (vllm-project#11753) Signed-off-by: Rui Qiao <ruisearch42@gmail.com> * [V1] Refactor get_executor_cls (vllm-project#11754) * [mypy] Forward pass function type hints in lora (vllm-project#11740) Signed-off-by: lucast2021 <lucast2021@headroyce.org> Co-authored-by: lucast2021 <lucast2021@headroyce.org> * k8s-config: Update the secret to use stringData (vllm-project#11679) Signed-off-by: Suraj Deshmukh <surajd.service@gmail.com> * [VLM] Separate out profiling-related logic (vllm-project#11746) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Doc][2/N] Reorganize Models and Usage sections (vllm-project#11755) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Bugfix] Fix max image size for LLaVA-Onevision (vllm-project#11769) Signed-off-by: Roger Wang <ywang@roblox.com> * [doc] explain how to add interleaving sliding window support (vllm-project#11771) Signed-off-by: youkaichao <youkaichao@gmail.com> * [Bugfix][V1] Fix molmo text-only inputs (vllm-project#11676) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> * [Kernel] Move attn_type to Attention.__init__() (vllm-project#11690) Signed-off-by: Chen Zhang <zhangch99@outlook.com> * format * [V1] Extend beyond image modality and support mixed-modality inference with Llava-OneVision (vllm-project#11685) Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> * deepseek overflow fix (#349) * [Bugfix] Fix LLaVA-NeXT feature size precision error (for real) (vllm-project#11772) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Model] Future-proof Qwen2-Audio multi-modal processor (vllm-project#11776) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [XPU] Make pp group initilized for pipeline-parallelism (vllm-project#11648) Signed-off-by: yisheng <yi.sheng@intel.com> * [Doc][3/N] Reorganize Serving section (vllm-project#11766) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Kernel][LoRA]Punica prefill kernels fusion (vllm-project#11234) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: Abatom <abzhonghua@gmail.com> Co-authored-by: Zhonghua Deng <abatom@163.com> * [Bugfix] Update attention interface in `Whisper` (vllm-project#11784) Signed-off-by: Roger Wang <ywang@roblox.com> * [CI] Fix neuron CI and run offline tests (vllm-project#11779) Signed-off-by: Liangfu Chen <liangfc@amazon.com> * fix init error for MessageQueue when n_local_reader is zero (vllm-project#11768) * [Doc] Create a vulnerability management team (vllm-project#9925) Signed-off-by: Russell Bryant <rbryant@redhat.com> * [CI][CPU] adding build number to docker image name (vllm-project#11788) Signed-off-by: Yuan Zhou <yuan.zhou@intel.com> * [V1][Doc] Update V1 support for `LLaVa-NeXT-Video` (vllm-project#11798) Signed-off-by: Roger Wang <ywang@roblox.com> * [Bugfix] Comprehensively test and fix LLaVA-NeXT feature size calculation (vllm-project#11800) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [doc] add doc to explain how to use uv (vllm-project#11773) Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> * [V1] Support audio language models on V1 (vllm-project#11733) Signed-off-by: Roger Wang <ywang@roblox.com> * [doc] update how pip can install nightly wheels (vllm-project#11806) Signed-off-by: youkaichao <youkaichao@gmail.com> * [Doc] Add note to `gte-Qwen2` models (vllm-project#11808) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [optimization] remove python function call for custom op (vllm-project#11750) Signed-off-by: youkaichao <youkaichao@gmail.com> * [Bugfix] update the prefix for qwen2 (vllm-project#11795) Co-authored-by: jiadi.jjd <jiadi.jjd@antgroup.com> * [Doc]Add documentation for using EAGLE in vLLM (vllm-project#11417) Signed-off-by: Sourashis Roy <sroy@roblox.com> * [Bugfix] Significant performance drop on CPUs with --num-scheduler-steps > 1 (vllm-project#11794) * [Doc] Group examples into categories (vllm-project#11782) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> * [Bugfix] Fix image input for Pixtral-HF (vllm-project#11741) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Misc] sort torch profiler table by kernel timing (vllm-project#11813) * Remove the duplicate imports of MultiModalKwargs and PlaceholderRange… (vllm-project#11824) * Fixed docker build for ppc64le (vllm-project#11518) Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com> * [OpenVINO] Fixed Docker.openvino build (vllm-project#11732) Signed-off-by: Ilya Lavrenov <ilya.lavrenov@intel.com> * [Bugfix] Add checks for LoRA and CPU offload (vllm-project#11810) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> * [Docs] reorganize sponsorship page (vllm-project#11639) Signed-off-by: simon-mo <simon.mo@hey.com> * [Bug] Fix pickling of `ModelConfig` when RunAI Model Streamer is used (vllm-project#11825) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [misc] improve memory profiling (vllm-project#11809) Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> * [doc] update wheels url (vllm-project#11830) Signed-off-by: youkaichao <youkaichao@gmail.com> * [Docs] Update sponsor name: 'Novita' to 'Novita AI' (vllm-project#11833) * [Hardware][Apple] Native support for macOS Apple Silicon (vllm-project#11696) Signed-off-by: Wallas Santos <wallashss@ibm.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> * [torch.compile] consider relevant code in compilation cache (vllm-project#11614) Signed-off-by: youkaichao <youkaichao@gmail.com> * [VLM] Reorganize profiling/processing-related code (vllm-project#11812) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Doc] Move examples into categories (vllm-project#11840) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> * [Doc][4/N] Reorganize API Reference (vllm-project#11843) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [CI/Build][Bugfix] Fix CPU CI image clean up (vllm-project#11836) Signed-off-by: jiang1.li <jiang1.li@intel.com> * [Bugfix][XPU] fix silu_and_mul (vllm-project#11823) Signed-off-by: yan ma <yan.ma@intel.com> * [Misc] Move some model utils into vision file (vllm-project#11848) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Doc] Expand Multimodal API Reference (vllm-project#11852) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Misc]add some explanations for BlockHashType (vllm-project#11847) * [TPU][Quantization] TPU `W8A8` (vllm-project#11785) Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> * [Kernel][Triton][AMD] Use block size heuristic for avg 2.8x speedup for int8 models (vllm-project#11698) Signed-off-by: Randall Smith <Randall.Smith@amd.com> * [Docs] Add Google Cloud Meetup (vllm-project#11864) * Revert nccl changes (#351) * Revert "[distributed] remove pynccl's redundant change_state (vllm-project#11749)" This reverts commit 9e764e7. * Revert "[distributed] remove pynccl's redundant stream (vllm-project#11744)" This reverts commit 635b897. * [CI] Turn on basic correctness tests for V1 (vllm-project#10864) * treat do_lower_case in the same way as the sentence-transformers library (vllm-project#11815) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> * [Doc] Recommend uv and python 3.12 for quickstart guide (vllm-project#11849) Signed-off-by: mgoin <michael@neuralmagic.com> * [Misc] Move `print_*_once` from utils to logger (vllm-project#11298) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com> Co-authored-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com> * [Doc] Intended links Python multiprocessing library (vllm-project#11878) * [perf]fix current stream (vllm-project#11870) Signed-off-by: youkaichao <youkaichao@gmail.com> * [Bugfix] Override dunder methods of placeholder modules (vllm-project#11882) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Bugfix] fix beam search input errors and latency benchmark script (vllm-project#11875) Signed-off-by: Ye Qi <yeq@meta.com> Co-authored-by: yeq <yeq@devgpu004.lla3.facebook.com> * [Doc] Add model development API Reference (vllm-project#11884) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [platform] Allow platform specify attention backend (vllm-project#11609) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Mengqing Cao <cmq0113@163.com> Co-authored-by: Mengqing Cao <cmq0113@163.com> * [ci]try to fix flaky multi-step tests (vllm-project#11894) Signed-off-by: youkaichao <youkaichao@gmail.com> * [Misc] Provide correct Pixtral-HF chat template (vllm-project#11891) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * fp8 support (#352) Co-authored-by: Yida Wu <yidawu@amd.com> * [Docs] Add Modal to deployment frameworks (vllm-project#11907) * [Doc][5/N] Move Community and API Reference to the bottom (vllm-project#11896) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Simon Mo <simon.mo@hey.com> * [VLM] Enable tokenized inputs for merged multi-modal processor (vllm-project#11900) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Doc] Show default pooling method in a table (vllm-project#11904) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [torch.compile] Hide KV cache behind torch.compile boundary (vllm-project#11677) Signed-off-by: Chen Zhang <zhangch99@outlook.com> * [Bugfix] Validate lora adapters to avoid crashing server (vllm-project#11727) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> * [BUGFIX] Fix `UnspecifiedPlatform` package name (vllm-project#11916) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * [ci] fix gh200 tests (vllm-project#11919) Signed-off-by: youkaichao <youkaichao@gmail.com> * [misc] remove python function call for custom activation op (vllm-project#11885) Co-authored-by: youkaichao <youkaichao@gmail.com> * [platform] support pytorch custom op pluggable (vllm-project#11328) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> * Replace "online inference" with "online serving" (vllm-project#11923) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> * [ci] Fix sampler tests (vllm-project#11922) Signed-off-by: youkaichao <youkaichao@gmail.com> * [Doc] [1/N] Initial guide for merged multi-modal processor (vllm-project#11925) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [platform] support custom torch.compile backend key (vllm-project#11318) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com> * [Doc] Rename offline inference examples (vllm-project#11927) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> * [Docs] Fix docstring in `get_ip` function (vllm-project#11932) Signed-off-by: Kuntai Du <kuntai@uchicago.edu> * Doc fix in `benchmark_long_document_qa_throughput.py` (vllm-project#11933) Signed-off-by: Kuntai Du <kuntai@uchicago.edu> * [Hardware][CPU] Support MOE models on x86 CPU (vllm-project#11831) Signed-off-by: jiang1.li <jiang1.li@intel.com> * [Misc] Clean up debug code in Deepseek-V3 (vllm-project#11930) Signed-off-by: Isotr0py <2037008807@qq.com> * [Misc] Update benchmark_prefix_caching.py fixed example usage (vllm-project#11920) Signed-off-by: Ren MinMin <renmm6@chinaunicom.cn> Co-authored-by: Ren MinMin <renmm6@chinaunicom.cn> * [Bugfix] Check that number of images matches number of <|image|> tokens with mllama (vllm-project#11939) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> * [mypy] Fix mypy warnings in api_server.py (vllm-project#11941) Signed-off-by: Fred Reiss <frreiss@us.ibm.com> * [ci] fix broken distributed-tests-4-gpus (vllm-project#11937) Signed-off-by: youkaichao <youkaichao@gmail.com> * [Bugfix][SpecDecode] Adjust Eagle model architecture to align with intended design (vllm-project#11672) Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com> * [Bugfix] fused_experts_impl wrong compute type for float32 (vllm-project#11921) Signed-off-by: shaochangxu.scx <shaochangxu.scx@antgroup.com> Co-authored-by: shaochangxu.scx <shaochangxu.scx@antgroup.com> * [CI/Build] Move model-specific multi-modal processing tests (vllm-project#11934) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Doc] Basic guide for writing unit tests for new models (vllm-project#11951) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Bugfix] Fix RobertaModel loading (vllm-project#11940) Signed-off-by: NickLucche <nlucches@redhat.com> * [Model] Add cogagent model support vLLM (vllm-project#11742) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com> * [V1] Avoid sending text prompt to core engine (vllm-project#11963) Signed-off-by: Roger Wang <ywang@roblox.com> * [CI/Build] Add markdown linter (vllm-project#11857) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> * [Model] Initialize support for Deepseek-VL2 models (vllm-project#11578) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> * [Hardware][CPU] Multi-LoRA implementation for the CPU backend (vllm-project#11100) Signed-off-by: Akshat Tripathi <akshat@krai.ai> Signed-off-by: Oleg Mosalov <oleg@krai.ai> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Oleg Mosalov <oleg@krai.ai> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Isotr0py <2037008807@qq.com> * [Hardware][TPU] workaround fix for MoE on TPU (vllm-project#11764) * [V1][Core][1/n] Logging and Metrics (vllm-project#11962) Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> * [Model] Support GGUF models newly added in `transformers` 4.46.0 (vllm-project#9685) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> * [V1] [2/n] Logging and Metrics - `OutputProcessor` Abstraction (vllm-project#11973) Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> * [MISC] fix typo in kv transfer send recv test (vllm-project#11983) * [Bug] Fix usage of `.transpose()` and `.view()` consecutively. (vllm-project#11979) * [CI][Spec Decode] fix: broken test for EAGLE model (vllm-project#11972) Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com> * [Misc] Fix Deepseek V2 fp8 kv-scale remapping (vllm-project#11947) Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu> * [Misc]Minor Changes about Worker (vllm-project#11555) Signed-off-by: Chenguang Li <757486878@qq.com> * [platform] add ray_device_key (vllm-project#11948) Signed-off-by: youkaichao <youkaichao@gmail.com> * Fix Max Token ID for Qwen-VL-Chat (vllm-project#11980) Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> * [Kernel] unified_attention for Attention.forward (vllm-project#11967) Signed-off-by: Chen Zhang <zhangch99@outlook.com> * [Doc][V1] Update model implementation guide for V1 support (vllm-project#11998) Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> * [Doc] Organise installation documentation into categories and tabs (vllm-project#11935) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> * [platform] add device_control env var (vllm-project#12009) Signed-off-by: youkaichao <youkaichao@gmail.com> * [Platform] Move get_punica_wrapper() function to Platform (vllm-project#11516) Signed-off-by: Shanshan Shen <467638484@qq.com> * bugfix: Fix signature mismatch in benchmark's `get_tokenizer` function (vllm-project#11982) Signed-off-by: elijah <f1renze.142857@gmail.com> * Using list * Revert "[misc] improve memory profiling (vllm-project#11809)" This reverts commit 889e662. * Multi-lingual P3L (#356) * Commiting the *multilingual* P3L test. * Created a *multi-lingual* P3L test. * Making ruff happy. * . * Added a reference to the language-scripture Confluence table. * Typo fixing. * Harmonizing naming. * Fixing comments in the header. --------- Co-authored-by: Alexei V. Ivanov <alivanov@banff-cyxtera-s65-4.amd.com> Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com> * Trying to make scales work with compileable attention * Docs lint * linter formatting bug fixes * inherit config file updates under fused_moe from main branch. * match tests for the MOE layers with main. --------- Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com> Signed-off-by: lucast2021 <lucast2021@headroyce.org> Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: Sourashis Roy <sroy@roblox.com> Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: simon-mo <simon.mo@hey.com> Signed-off-by: simon-mo <xmo@berkeley.edu> Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Mengqing Cao <cmq0113@163.com> Signed-off-by: Alex He <alehe@amd.com> Signed-off-by: ccjincong <chenjincong11@gmail.com> Signed-off-by: Erez Schwartz <erezs@ai21.com> Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: rajveerb <46040700+rajveerb@users.noreply.github.com> Signed-off-by: hjwei <hjwei_xd@163.com> Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Signed-off-by: KuntaiDu <kuntai@uchicago.edu> Signed-off-by: Liangfu Chen <liangfc@amazon.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: jiang1.li <jiang1.li@intel.com> Signed-off-by: Matthias Vogler <matthias.vogler@joesecurity.org> Signed-off-by: ApostaC <yihua98@uchicago.edu> Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> Signed-off-by: Lu Fang <lufang@fb.com> Signed-off-by: Kazuhiro Serizawa <nserihiro@gmail.com> Signed-off-by: Tobias Pitters <tobias.pitters@gmail.com> Signed-off-by: Kathy Yu <feiyangyu@google.com> Signed-off-by: bjmsong <bjmsong@126.com> Signed-off-by: wchen61 <wchen61@foxmail.com> Signed-off-by: ZincCat <zincchloride@outlook.com> Signed-off-by: xcnick <xcnick0412@gmail.com> Signed-off-by: Yan Burman <yanburman@users.noreply.github.com> Signed-off-by: Ido Asraff <idoa@atero.ai> Signed-off-by: Rui Qiao <ruisearch42@gmail.com> Signed-off-by: Suraj Deshmukh <surajd.service@gmail.com> Signed-off-by: yisheng <yi.sheng@intel.com> Signed-off-by: Abatom <abzhonghua@gmail.com> Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: Yuan Zhou <yuan.zhou@intel.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com> Signed-off-by: Ilya Lavrenov <ilya.lavrenov@intel.com> Signed-off-by: Wallas Santos <wallashss@ibm.com> Signed-off-by: yan ma <yan.ma@intel.com> Signed-off-by: Randall Smith <Randall.Smith@amd.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com> Signed-off-by: Ye Qi <yeq@meta.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Kuntai Du <kuntai@uchicago.edu> Signed-off-by: Ren MinMin <renmm6@chinaunicom.cn> Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Signed-off-by: Fred Reiss <frreiss@us.ibm.com> Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com> Signed-off-by: shaochangxu.scx <shaochangxu.scx@antgroup.com> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> Signed-off-by: Akshat Tripathi <akshat@krai.ai> Signed-off-by: Oleg Mosalov <oleg@krai.ai> Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu> Signed-off-by: Chenguang Li <757486878@qq.com> Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> Signed-off-by: Shanshan Shen <467638484@qq.com> Signed-off-by: elijah <f1renze.142857@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Yuan Tang <terrytangyuan@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Co-authored-by: Rui Qiao <161574667+ruisearch42@users.noreply.github.com> Co-authored-by: Jiaxin Shan <seedjeffwan@gmail.com> Co-authored-by: Lucas Tucker <47258766+lucas-tucker@users.noreply.github.com> Co-authored-by: lucast2021 <lucast2021@headroyce.org> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by: sroy745 <142070531+sroy745@users.noreply.github.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> Co-authored-by: simon-mo <simon.mo@hey.com> Co-authored-by: simon-mo <xmo@berkeley.edu> Co-authored-by: HandH1998 <1335248067@qq.com> Co-authored-by: robertgshaw2-neuralmagic <rshaw@neuralmagic.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Mengqing Cao <cmq0113@163.com> Co-authored-by: AlexHe99 <alehe@amd.com> Co-authored-by: Chen1022 <112855051+ccjincong@users.noreply.github.com> Co-authored-by: ErezSC42 <erezs@ai21.com> Co-authored-by: Selali <selali.adobor@gmail.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Rajveer Bachkaniwala <46040700+rajveerb@users.noreply.github.com> Co-authored-by: hj-wei <hjwei_xd@163.com> Co-authored-by: Kuntai Du <kuntai@uchicago.edu> Co-authored-by: Liangfu Chen <liangfc@amazon.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Li, Jiang <jiang1.li@intel.com> Co-authored-by: whyiug <whyiug@hotmail.com> Co-authored-by: Kevin H. Luu <kevin@anyscale.com> Co-authored-by: Matthias Vogler <60004995+ayylemao@users.noreply.github.com> Co-authored-by: Matthias Vogler <matthias.vogler@joesecurity.org> Co-authored-by: John Giorgi <johnmgiorgi@gmail.com> Co-authored-by: sakunkun <zhou.qianjun@zte.com.cn> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Yihua Cheng <yihua98@uchicago.edu> Co-authored-by: Joe Runde <Joseph.Runde@ibm.com> Co-authored-by: Roger Wang <ywang@roblox.com> Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com> Co-authored-by: Kazuhiro Serizawa <nserihiro@gmail.com> Co-authored-by: Tobias Pitters <31857876+CloseChoice@users.noreply.github.com> Co-authored-by: Chunyang Wen <chunyang.wen@gmail.com> Co-authored-by: Kathy Yu <143133934+kathyyu-google@users.noreply.github.com> Co-authored-by: bjmsong <wq.songbob@gmail.com> Co-authored-by: bjmsong <bjmsong@126.com> Co-authored-by: wchen61 <wchen61@foxmail.com> Co-authored-by: Nathan Azrak <42650258+nathan-az@users.noreply.github.com> Co-authored-by: Sachin Varghese <sachin.mathew31@gmail.com> Co-authored-by: Aurick Qiao <aurickq@users.noreply.github.com> Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com> Co-authored-by: ZincCat <52513999+zinccat@users.noreply.github.com> Co-authored-by: WangErXiao <863579016@qq.com> Co-authored-by: Hust_YangXian <bryceyx@gmail.com> Co-authored-by: Alberto Ferrer <albertof@barrahome.org> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> Co-authored-by: xcnick <xcnick0412@gmail.com> Co-authored-by: Yan Burman <yanburman@users.noreply.github.com> Co-authored-by: cennn <61925104+cennn@users.noreply.github.com> Co-authored-by: Lancer <402430575@qq.com> Co-authored-by: Lancer <maruixiang6688@gmail.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> Co-authored-by: Suraj Deshmukh <surajd.service@gmail.com> Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Co-authored-by: Concurrensee <yida.wu@amd.com> Co-authored-by: YiSheng5 <yi.sheng@intel.com> Co-authored-by: Zhonghua Deng <abatom@163.com> Co-authored-by: XiaobingZhang <xiaobingzhangupc@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Yuan <yuan.zhou@intel.com> Co-authored-by: jiangjiadi <34134495+jiangjiadi@users.noreply.github.com> Co-authored-by: jiadi.jjd <jiadi.jjd@antgroup.com> Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com> Co-authored-by: Jie Fu (傅杰) <jiefu@tencent.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Divakar Verma <137818590+divakar-amd@users.noreply.github.com> Co-authored-by: Nishidha <nishidha.panpaliya@partner.ibm.com> Co-authored-by: Ilya Lavrenov <ilya.lavrenov@intel.com> Co-authored-by: Wallas Henrique <wallashss@users.noreply.github.com> Co-authored-by: Yan Ma <yan.ma@intel.com> Co-authored-by: rasmith <Randall.Smith@amd.com> Co-authored-by: Maximilien de Bayser <mbayser@br.ibm.com> Co-authored-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com> Co-authored-by: Guspan Tanadi <36249910+guspan-tanadi@users.noreply.github.com> Co-authored-by: Ye (Charlotte) Qi <ye.charlotte.qi@gmail.com> Co-authored-by: yeq <yeq@devgpu004.lla3.facebook.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: Yida Wu <yidawu@amd.com> Co-authored-by: Charles Frye <cfrye59@gmail.com> Co-authored-by: minmin <rmm0811@gmail.com> Co-authored-by: Ren MinMin <renmm6@chinaunicom.cn> Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by: Fred Reiss <frreiss@us.ibm.com> Co-authored-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com> Co-authored-by: shaochangxu <85155497+shaochangxu@users.noreply.github.com> Co-authored-by: shaochangxu.scx <shaochangxu.scx@antgroup.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Co-authored-by: sixgod <evethwillbeok@outlook.com> Co-authored-by: Rafael Vasquez <rafvasq21@gmail.com> Co-authored-by: Akshat Tripathi <Akshat.tripathi6568@gmail.com> Co-authored-by: Oleg Mosalov <oleg@krai.ai> Co-authored-by: Avshalom Manevich <12231371+avshalomman@users.noreply.github.com> Co-authored-by: Yangcheng Li <liyangcheng.lyc@alibaba-inc.com> Co-authored-by: Siyuan Li <94890248+liaoyanqing666@users.noreply.github.com> Co-authored-by: Chenguang Li <757486878@qq.com> Co-authored-by: Alex Brooks <alex.brooks@ibm.com> Co-authored-by: Shanshan Shen <467638484@qq.com> Co-authored-by: elijah <30852919+e1ijah1@users.noreply.github.com> Co-authored-by: Alexei-V-Ivanov-AMD <156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com> Co-authored-by: Alexei V. Ivanov <alivanov@banff-cyxtera-s65-4.amd.com> Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>

Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

Isotr0py added 7 commits December 28, 2024 02:01

init deepseekvl2

b7f3a3b

Signed-off-by: Isotr0py <2037008807@qq.com>

port config

9846268

Signed-off-by: Isotr0py <2037008807@qq.com>

code format

0fdc10b

Signed-off-by: Isotr0py <2037008807@qq.com>

process image

19cf5e7

Signed-off-by: Isotr0py <2037008807@qq.com>

init processor

550ed2e

Signed-off-by: Isotr0py <2037008807@qq.com>

clean up

f2159c4

Signed-off-by: Isotr0py <2037008807@qq.com>

handle image embedding inputs

e20aba5

Signed-off-by: Isotr0py <2037008807@qq.com>

DarkLight1337 mentioned this pull request Dec 28, 2024

[RFC]: Multi-modality Support on vLLM #4194

Open

73 tasks

Isotr0py added 4 commits December 28, 2024 16:31

add multimodal processor

54c92fc

Signed-off-by: Isotr0py <2037008807@qq.com>

add max tokens implement

dd19a5d

Signed-off-by: Isotr0py <2037008807@qq.com>

implement embeddings merge

391ba13

Signed-off-by: Isotr0py <2037008807@qq.com>

add deepseek-vl2 example

bb88307

Signed-off-by: Isotr0py <2037008807@qq.com>

mergify bot added the frontend label Dec 28, 2024

Isotr0py added 5 commits December 28, 2024 20:52

register model

0ec661c

Signed-off-by: Isotr0py <2037008807@qq.com>

override example arch

847cb03

Signed-off-by: Isotr0py <2037008807@qq.com>

fix processor

bec7a43

Signed-off-by: Isotr0py <2037008807@qq.com>

Merge branch 'vllm-project:main' into deepseek-vl2

e417b98

fix config name

acc89f6

Signed-off-by: Isotr0py <2037008807@qq.com>

mergify bot added the needs-rebase label Dec 28, 2024

Isotr0py added 8 commits December 29, 2024 01:21

fix processor dtype

b9f2d4b

Signed-off-by: Isotr0py <2037008807@qq.com>

fix a typo

632c77c

Signed-off-by: Isotr0py <2037008807@qq.com>

fix vit

d97849d

Signed-off-by: Isotr0py <2037008807@qq.com>

fix a typo

6fb3845

Signed-off-by: Isotr0py <2037008807@qq.com>

add normal rope rotary

01a5316

Signed-off-by: Isotr0py <2037008807@qq.com>

code format

d5ebfcb

Signed-off-by: Isotr0py <2037008807@qq.com>

fix image token

d787200

Signed-off-by: Isotr0py <2037008807@qq.com>

update docs

d491ff0

Signed-off-by: Isotr0py <2037008807@qq.com>

mergify bot added the documentation Improvements or additions to documentation label Dec 30, 2024

fix deepseek-v3 based model

6519a66

Signed-off-by: Isotr0py <2037008807@qq.com>

mergify bot added the needs-rebase label Jan 10, 2025

Merge remote-tracking branch 'upstream/main' into deepseek-vl2

f5ae01f

Signed-off-by: Isotr0py <2037008807@qq.com>

mergify bot removed the needs-rebase label Jan 11, 2025

DarkLight1337 reviewed Jan 11, 2025

View reviewed changes

docs/source/models/supported_models.md Outdated Show resolved Hide resolved

DarkLight1337 approved these changes Jan 11, 2025

View reviewed changes

DarkLight1337 reviewed Jan 11, 2025

View reviewed changes

tests/models/registry.py Outdated Show resolved Hide resolved

Isotr0py and others added 3 commits January 12, 2025 00:04

update docs

92ab7fc

Signed-off-by: Isotr0py <2037008807@qq.com>

Update tests/models/registry.py

80a23ee

Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

update todos

5a6d0d6

Signed-off-by: Isotr0py <2037008807@qq.com>

Isotr0py added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 11, 2025

add hf_overrides to initialize test

fc38412

Signed-off-by: Isotr0py <2037008807@qq.com>

simon-mo merged commit f967e51 into vllm-project:main Jan 12, 2025
74 of 77 checks passed

Isotr0py deleted the deepseek-vl2 branch January 12, 2025 17:58

Isotr0py mentioned this pull request Jan 15, 2025

[Model] Add support for deepseek-vl2-tiny model #12068

Merged

1 task

joennlae pushed a commit to 44ai-labs/vllm that referenced this pull request Jan 19, 2025

[Model] Initialize support for Deepseek-VL2 models (vllm-project#11578)

0df1050

Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

joennlae pushed a commit to 44ai-labs/vllm that referenced this pull request Jan 19, 2025

[Model] Initialize support for Deepseek-VL2 models (vllm-project#11578)

b59fcaa

Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Model] Initialize support for Deepseek-VL2 models #11578

[Model] Initialize support for Deepseek-VL2 models #11578

Isotr0py commented Dec 28, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Dec 28, 2024

mergify bot commented Dec 28, 2024

Isotr0py commented Jan 10, 2025 •

edited

Loading

Isotr0py commented Jan 10, 2025

mergify bot commented Jan 10, 2025

DarkLight1337 left a comment •

edited

Loading

Swipe4057 commented Jan 14, 2025 •

edited

Loading

DarkLight1337 commented Jan 14, 2025 •

edited

Loading

Isotr0py commented Jan 14, 2025

iamweiliu commented Jan 14, 2025

iamweiliu commented Jan 14, 2025

Isotr0py commented Jan 14, 2025 •

edited

Loading

iamweiliu commented Jan 14, 2025

[Model] Initialize support for Deepseek-VL2 models #11578

[Model] Initialize support for Deepseek-VL2 models #11578

Conversation

Isotr0py commented Dec 28, 2024 • edited by github-actions bot Loading

github-actions bot commented Dec 28, 2024

mergify bot commented Dec 28, 2024

Isotr0py commented Jan 10, 2025 • edited Loading

Isotr0py commented Jan 10, 2025

mergify bot commented Jan 10, 2025

DarkLight1337 left a comment • edited Loading

Choose a reason for hiding this comment

Swipe4057 commented Jan 14, 2025 • edited Loading

DarkLight1337 commented Jan 14, 2025 • edited Loading

Isotr0py commented Jan 14, 2025

iamweiliu commented Jan 14, 2025

iamweiliu commented Jan 14, 2025

Isotr0py commented Jan 14, 2025 • edited Loading

iamweiliu commented Jan 14, 2025

Isotr0py commented Dec 28, 2024 •

edited by github-actions bot

Loading

Isotr0py commented Jan 10, 2025 •

edited

Loading

DarkLight1337 left a comment •

edited

Loading

Swipe4057 commented Jan 14, 2025 •

edited

Loading

DarkLight1337 commented Jan 14, 2025 •

edited

Loading

Isotr0py commented Jan 14, 2025 •

edited

Loading