[Misc][LoRA] Support Rank Stabilized LoRA (RSLoRA) #6909

JohnGiorgi · 2024-07-29T19:10:33Z

vLLM will produce slightly incorrect outputs if a user provides a LoRA adapter trained with Rank Stabilized LoRA. I think the only change needed is to detect when a LoRA adapter has set use_rslora and then scale the lora_alpha accordingly, i.e.

lora_alpha = config["lora_alpha"] * math.sqrt(rank) if config["use_rslora"] else config["lora_alpha"]

This one-line PR does exactly that!

I confirmed for my own problem, where adapters are trained with RS-LoRA, this change leads to parity in model outputs between vanilla HuggingFace and vLLM, and without it, the outputs from vLLM are significantly different (and in my estimation, worse). Unfortunately I can't share this model but I could find or create a public example if need be. However, given that all RS LoRA does is scale alpha by sqrt(r) I think this change is de-risked.

I'd be happy to add a unit test with a little guidance!

FIX: #10798

github-actions · 2024-07-29T19:10:50Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

Comment /ready on the PR
Add ready label to the PR
Enable auto-merge.

🚀

JohnGiorgi · 2024-07-30T15:10:40Z

This fix works for my setup, but on second inspection it's a little awkward:

It will only work when LoRA checkpoints are loaded with LoRAModel.from_local_checkpoint (not LoRAModel.from_lora_tensors) --> Is this okay?
It is technically performing (alpha * sqrt(r)) / r, instead of simply alpha / sqrt(r), because it applies the * in LoRAModel.from_local_checkpoint here:

vllm/vllm/lora/models.py

Line 371 in c66c7f8

lora_alpha = config["lora_alpha"]

and the / in LoRALayerWeights here:

vllm/vllm/lora/lora.py

Lines 30 to 33 in c66c7f8

    
           if scaling is None: 
        
               self.scaling = self.lora_alpha / self.rank 
        
           else: 
        
               self.scaling = scaling

I think the solution to both problems is to apply the RS LoRA scaling once, in LoRALayerWeights.__init__. However this class currently has no way of knowing if use_rslora=True. It would be straightforward to provide that from LoRAModel.from_local_checkpoint, but I am not sure how to pass it from LoRAModel.from_lora_tensors

github-actions · 2024-11-01T02:05:22Z

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

JohnGiorgi · 2024-11-01T02:39:24Z

Ping

JohnGiorgi · 2024-11-19T15:36:48Z

Is there anyone who could take a look at this? AFAICT models fine-tuned with RSLoRA still aren't working correctly in vLLM due to this bug (cc @Yard1)

jeejeelee · 2024-12-05T02:41:12Z

Is there anyone who could take a look at this? AFAICT models fine-tuned with RSLoRA still aren't working correctly in vLLM due to this bug (cc @Yard1)

Thanks for your contribution.
Sorry for the delayed response. Would you still be interested in completing this feature? If so, please let me know.

JohnGiorgi · 2024-12-06T00:15:59Z

Is there anyone who could take a look at this? AFAICT models fine-tuned with RSLoRA still aren't working correctly in vLLM due to this bug (cc @Yard1)

Thanks for your contribution. Sorry for the delayed response. Would you still be interested in completing this feature? If so, please let me know.

Yes! Happy to help get this over the finish line, just had a couple of concerns I listed above, mainly:

Solution will only work when LoRA checkpoints are loaded with LoRAModel.from_local_checkpoint (not LoRAModel.from_lora_tensors) --> Is this okay?
It is technically performing (alpha * sqrt(r)) / r, instead of simply alpha / sqrt(r) (both give you the same answer)

jeejeelee · 2024-12-06T01:37:36Z

I really appreciate your response.

For RSLoRA specifically, your concerns are all okay . However, considering our future plans to support DoRA(see: #10849), I think we need to give this more thought. I suggest we could implement a class that stores some lora information(e.g. rank, use_rslora,lora_alpha and so on) and pass it to from_lora_tensors. This might be a more suitable approach

RonanKMcGovern · 2024-12-06T03:12:43Z

Is it a valid fix to change the alpha value on the saved Lora to

alpha_new = alpha * sqrt(r)

Should be mathematically equivalent when the Lora is loaded then right?

[Perhaps I am wrong but it seems an odd convention for the alpha or r to be applied at inference time. They are really learning rate hyperparameters, so the convention of rescaling delta W instead of just rescaling the adapter learning rate seems cumbersome. Not that this library has any say on this.]

jeejeelee · 2024-12-06T06:43:07Z

Is it a valid fix to change the alpha value on the saved Lora to

alpha_new = alpha * sqrt(r)

Should be mathematically equivalent when the Lora is loaded then right?

[Perhaps I am wrong but it seems an odd convention for the alpha or r to be applied at inference time. They are really learning rate hyperparameters, so the convention of rescaling delta W instead of just rescaling the adapter learning rate seems cumbersome. Not that this library has any say on this.]

THB, I haven't looked into this implementation yet. We don't have control over how the LoRA weights are saved. What we can do is to align with PEFT.

RonanKMcGovern · 2024-12-06T16:55:06Z

Confirming that a hacky fix is to update the alpha in the adapter config as I described above:
alpha_new = sqrt(rank) * alpha_old

This will result in the same generation as peft.

The better fix would indeed to read in the rank and do that scaling if use_rslora is true.

jeejeelee · 2024-12-11T01:15:23Z

@JohnGiorgi FYI: #11003 has been merged. I think we can modify some code based on this to complete this PR

JohnGiorgi · 2024-12-11T04:10:31Z

Thanks @jeejeelee, hoping to dig into this tomorrow!

nielsrolf · 2024-12-23T10:38:29Z

Would be really cool if this gets merged soon! :)

jeejeelee · 2024-12-23T10:55:51Z

@JohnGiorgi ,If you don't have the bandwidth, I can complete this over the weekend

JohnGiorgi · 2024-12-23T14:20:55Z

@jeejeelee Whoops, sorry got pulled away on something else. Will do this today!

JohnGiorgi · 2024-12-23T14:45:49Z

vllm/lora/peft_helper.py

+        if self.use_rslora:
+            self.vllm_scaling_factor = self.lora_alpha / math.sqrt(self.r)
        if self.context_length:
            if self.vllm_max_position_embeddings is None:
                self.vllm_max_position_embeddings = self.context_length


@jeejeelee I think this is the cleanest way to support the rsLoRA scaling logic using the new PEFTHelper! My main outstanding worry is I am not sure how the scaling applied for rsLoRA and the scaling applied when self.context_length is provided should interact.

The way this is written, if both self.use_rslora and self.context_length, the custom scaling factor logic for self.context_length will take precedence.

My main outstanding worry is I am not sure how the scaling applied for rsLoRA and the scaling applied when self.context_length is provided should interact.

IMHO, These two scaling are distinct and operate independently of each other.

We shouldn't use vllm_scaling_factor - instead, we can add a variable called scaling

Fine by me, but its gets a little confusing as LoRAModel already defines scaling_factor and uses it for long context support:

class LoRAModel(AdapterModel): """A LoRA fine-tuned model.""" def __init__( self, lora_model_id: int, rank: int, loras: Dict[str, LoRALayerWeights], scaling_factor: Optional[float] = None, ) -> None: """ Args: lora_model_id: The integer id for the lora model. rank: lora rank. loras: module name -> weights for lora-replaced layers. scaling_factor: Scaling factor to support long context lora model. None if the lora is not tuned for long context support. """

which is populated by PEFTHelper.vllm_scaling_factor,

return cls(lora_model_id, peft_helper.r, loras, scaling_factor=peft_helper.vllm_scaling_factor)

Is there an argument to sync LoRAModel and PEFTHelper so each has a scaling_factor and vllm_scaling_factor argument? (Also, if vllm_scaling_factor is strictly used for long context support, maybe a more explicit name is in order, e.g. long_context_scaling_factor or something)

long_context_scaling_factor looks reasonable. Let me verify this tomorrow and confirm

Sounds good! I'll hang tight until then

I have implemented the related logic, please check if it's reasonable

This looks good to me @jeejeelee! Thanks for adding tests

Could you please sync with the main branch?

Done. It is just the DCO check that is failing but I am unsure how to fix, it suggests a rebase but I don't want to do that as there's multiple commit authors on this branch

There is a known issue in the current lora test. Let's check if other lora tests can pass first. If they do, we can consider force merging

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

jeejeelee

@JohnGiorgi Thank you for your contribution and patience.

JohnGiorgi · 2024-12-30T16:12:17Z

@JohnGiorgi Thank you for your contribution and patience.

Woohoo! Thanks for all the guidance

* [Misc] Move weights mapper (vllm-project#11443) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> * [Bugfix] Fix issues in CPU build Dockerfile. Fixes vllm-project#9182 (vllm-project#11435) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * [Model] Automatic conversion of classification and reward models (vllm-project#11469) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [V1] Unify VLLM_ENABLE_V1_MULTIPROCESSING handling in RayExecutor (vllm-project#11472) * [Misc] Update disaggregation benchmark scripts and test logs (vllm-project#11456) Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com> * [Frontend] Enable decord to load video from base64 (vllm-project#11492) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Doc] Improve GitHub links (vllm-project#11491) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Misc] Move some multimodal utils to modality-specific modules (vllm-project#11494) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * Mypy checking for vllm/compilation (vllm-project#11496) Signed-off-by: lucast2021 <lucast2021@headroyce.org> Co-authored-by: lucast2021 <lucast2021@headroyce.org> * [Misc][LoRA] Fix LoRA weight mapper (vllm-project#11495) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> * [Doc] Add `QVQ` and `QwQ` to the list of supported models (vllm-project#11509) Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> * [V1] Adding min tokens/repetition/presence/frequence penalties to V1 sampler (vllm-project#10681) Signed-off-by: Sourashis Roy <sroy@roblox.com> Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> * [Model] Modify MolmoForCausalLM MLP (vllm-project#11510) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> * [Misc] Add placeholder module (vllm-project#11501) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Doc] Add video example to openai client for multimodal (vllm-project#11521) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> * [1/N] API Server (Remove Proxy) (vllm-project#11529) * [Model] [Quantization] Support deepseek_v3 w8a8 fp8 block-wise quantization (vllm-project#11523) Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: simon-mo <simon.mo@hey.com> Signed-off-by: simon-mo <xmo@berkeley.edu> Co-authored-by: simon-mo <simon.mo@hey.com> Co-authored-by: simon-mo <xmo@berkeley.edu> Co-authored-by: HandH1998 <1335248067@qq.com> * [2/N] API Server: Avoid ulimit footgun (vllm-project#11530) * Deepseek v3 (vllm-project#11502) Signed-off-by: mgoin <michael@neuralmagic.com> Co-authored-by: mgoin <michael@neuralmagic.com> Co-authored-by: robertgshaw2-neuralmagic <rshaw@neuralmagic.com> * [Docs] Document Deepseek V3 support (vllm-project#11535) Signed-off-by: simon-mo <simon.mo@hey.com> * Update openai_compatible_server.md (vllm-project#11536) Co-authored-by: Simon Mo <simon.mo@hey.com> * [V1] Use FlashInfer Sampling Kernel for Top-P & Top-K Sampling (vllm-project#11394) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> * [V1] Fix yapf (vllm-project#11538) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> * [CI] Fix broken CI (vllm-project#11543) * [misc] fix typing (vllm-project#11540) Signed-off-by: youkaichao <youkaichao@gmail.com> * [V1][3/N] API Server: Reduce Task Switching + Handle Abort Properly (vllm-project#11534) * [BugFix] Fix quantization for all other methods (vllm-project#11547) * [Platform] Move model arch check to platform (vllm-project#11503) Signed-off-by: Mengqing Cao <cmq0113@163.com> * Update deploying_with_k8s.md with AMD ROCm GPU example (vllm-project#11465) Signed-off-by: Alex He <alehe@amd.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> * [Bugfix] Fix TeleChat2ForCausalLM weights mapper (vllm-project#11546) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> * [Misc] Abstract the logic for reading and writing media content (vllm-project#11527) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Doc] Add xgrammar in doc (vllm-project#11549) Signed-off-by: ccjincong <chenjincong11@gmail.com> * [VLM] Support caching in merged multi-modal processor (vllm-project#11396) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [MODEL] LoRA support for Jamba model (vllm-project#11209) Signed-off-by: Erez Schwartz <erezs@ai21.com> * [Misc]Add BNB quantization for MolmoForCausalLM (vllm-project#11551) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> * [Misc] Improve BNB loader to handle mixture of sharded and merged weights with same suffix (vllm-project#11566) Signed-off-by: Isotr0py <2037008807@qq.com> * [Bugfix] Fix for ROCM compressed tensor support (vllm-project#11561) * [Doc] Update mllama example based on official doc (vllm-project#11567) Signed-off-by: Chen Zhang <zhangch99@outlook.com> * [V1] [4/N] API Server: ZMQ/MP Utilities (vllm-project#11541) * [Bugfix] Last token measurement fix (vllm-project#11376) Signed-off-by: rajveerb <46040700+rajveerb@users.noreply.github.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> * [Model] Support InternLM2 Reward models (vllm-project#11571) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> * [Model] Remove hardcoded image tokens ids from Pixtral (vllm-project#11582) Signed-off-by: Roger Wang <ywang@roblox.com> * [Hardware][AMD]: Replace HIPCC version with more precise ROCm version (vllm-project#11515) Signed-off-by: hjwei <hjwei_xd@163.com> * [V1][Minor] Set pin_memory=False for token_ids_cpu tensor (vllm-project#11581) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> * [Doc] Minor documentation fixes (vllm-project#11580) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [bugfix] interleaving sliding window for cohere2 model (vllm-project#11583) Signed-off-by: youkaichao <youkaichao@gmail.com> * [V1] [5/N] API Server: unify `Detokenizer` and `EngineCore` input (vllm-project#11545) Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> * [Doc] Convert list tables to MyST (vllm-project#11594) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [v1][bugfix] fix cudagraph with inplace buffer assignment (vllm-project#11596) Signed-off-by: youkaichao <youkaichao@gmail.com> * [Misc] KV cache transfer connector registry (vllm-project#11481) Signed-off-by: KuntaiDu <kuntai@uchicago.edu> * Remove print statement in DeepseekScalingRotaryEmbedding (vllm-project#11604) * [v1] fix compilation cache (vllm-project#11598) Signed-off-by: youkaichao <youkaichao@gmail.com> * [Docker] bump up neuron sdk v2.21 (vllm-project#11593) Signed-off-by: Liangfu Chen <liangfc@amazon.com> * [Build][Kernel] Update CUTLASS to v3.6.0 (vllm-project#11607) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> * [CI/Build][CPU] Fix CPU CI by lazy importing triton FP8 kernels (vllm-project#11618) Signed-off-by: jiang1.li <jiang1.li@intel.com> * [platforms] enable platform plugins (vllm-project#11602) Signed-off-by: youkaichao <youkaichao@gmail.com> * [VLM] Abstract out multi-modal data parsing in merged processor (vllm-project#11620) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [V1] [6/N] API Server: Better Shutdown (vllm-project#11586) * [Bugfix] Validate and concatenate image embeddings in MiniCPMVBaseModel (vllm-project#11631) * [benchmark] Remove dependency for H100 benchmark step (vllm-project#11572) * [Model][LoRA]LoRA support added for MolmoForCausalLM (vllm-project#11439) Signed-off-by: Matthias Vogler <matthias.vogler@joesecurity.org> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Matthias Vogler <matthias.vogler@joesecurity.org> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> * [Bugfix] Fix OpenAI parallel sampling when using xgrammar (vllm-project#11637) Signed-off-by: mgoin <michael@neuralmagic.com> * [Misc][LoRA] Support Rank Stabilized LoRA (RSLoRA) (vllm-project#6909) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> * [Bugfix] Move the _touch(computed_blocks) call in the allocate_slots method to after the check for allocating new blocks. (vllm-project#11565) * [V1] Simpify vision block hash for prefix caching by removing offset from hash (vllm-project#11646) * [V1][VLM] V1 support for selected single-image models. (vllm-project#11632) Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Isotr0py <2037008807@qq.com> * [Benchmark] Add benchmark script for CPU offloading (vllm-project#11533) Signed-off-by: ApostaC <yihua98@uchicago.edu> Co-authored-by: KuntaiDu <kuntai@uchicago.edu> * [Bugfix][Refactor] Unify model management in frontend (vllm-project#11660) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> * [VLM] Add max-count checking in data parser for single image models (vllm-project#11661) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Roger Wang <ywang@roblox.com> * [Misc] Optimize Qwen2-VL LoRA test (vllm-project#11663) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> * [Misc] Replace space with - in the file names (vllm-project#11667) Signed-off-by: Lu Fang <lufang@fb.com> * [Doc] Fix typo (vllm-project#11666) Signed-off-by: Kazuhiro Serizawa <nserihiro@gmail.com> * [V1] Implement Cascade Attention (vllm-project#11635) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> * [VLM] Move supported limits and max tokens to merged multi-modal processor (vllm-project#11669) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com> * [VLM][Bugfix] Multi-modal processor compatible with V1 multi-input (vllm-project#11674) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [mypy] Pass type checking in vllm/inputs (vllm-project#11680) Signed-off-by: Tobias Pitters <tobias.pitters@gmail.com> * [VLM] Merged multi-modal processor for LLaVA-NeXT (vllm-project#11682) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * According to vllm.EngineArgs, the name should be distributed_executor_backend (vllm-project#11689) * [Bugfix] Free cross attention block table for preempted-for-recompute sequence group. (vllm-project#10013) Signed-off-by: Kathy Yu <feiyangyu@google.com> * [V1][Minor] Optimize token_ids_cpu copy (vllm-project#11692) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> * [Bugfix] Change kv scaling factor by param json on nvidia gpu (vllm-project#11688) Signed-off-by: bjmsong <bjmsong@126.com> Co-authored-by: bjmsong <bjmsong@126.com> * Resolve race conditions in Marlin kernel (vllm-project#11493) Signed-off-by: wchen61 <wchen61@foxmail.com> * [Misc] Minimum requirements for SageMaker compatibility (vllm-project#11576) * Update default max_num_batch_tokens for chunked prefill (vllm-project#11694) * [Bugfix] Check chain_speculative_sampling before calling it (vllm-project#11673) Signed-off-by: Lu Fang <lufang@fb.com> * [perf-benchmark] Fix dependency for steps in benchmark pipeline (vllm-project#11710) * [Model] Whisper model implementation (vllm-project#11280) Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com> * [V1] Simplify Shutdown (vllm-project#11659) * [Bugfix] Fix ColumnParallelLinearWithLoRA slice (vllm-project#11708) Signed-off-by: ZincCat <zincchloride@outlook.com> * [V1] Improve TP>1 Error Handling + Stack Trace (vllm-project#11721) Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> * [Misc]Add BNB quantization for Qwen2VL (vllm-project#11719) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com> * Update requirements-tpu.txt to support python 3.9 and 3.11 (vllm-project#11695) Signed-off-by: mgoin <michael@neuralmagic.com> * [V1] Chore: cruft removal (vllm-project#11724) * [V1] log GPU blocks num for MultiprocExecutor (vllm-project#11656) * Update tool_calling.md (vllm-project#11701) * Update bnb.md with example for OpenAI (vllm-project#11718) * [V1] Add `RayExecutor` support for `AsyncLLM` (api server) (vllm-project#11712) * [V1] Add kv cache utils tests. (vllm-project#11513) Signed-off-by: xcnick <xcnick0412@gmail.com> * [Core][Bugfix] Use correct device to initialize GPU data during CUDA-graph-capture (vllm-project#11233) Signed-off-by: Yan Burman <yanburman@users.noreply.github.com> Signed-off-by: Ido Asraff <idoa@atero.ai> * [VLM] Merged multi-modal processors for LLaVA-NeXT-Video and LLaVA-OneVision (vllm-project#11717) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Bugfix] Fix precision error in LLaVA-NeXT (vllm-project#11735) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Model] Remove unnecessary weight initialization logic (vllm-project#11736) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com> * [Bugfix][V1] Fix test_kv_cache_utils.py (vllm-project#11738) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> * [MISC] Replace c10::optional with std::optional (vllm-project#11730) Signed-off-by: Lu Fang <lufang@fb.com> * [distributed] remove pynccl's redundant stream (vllm-project#11744) * fix: [doc] fix typo (vllm-project#11751) Co-authored-by: Lancer <maruixiang6688@gmail.com> * [Frontend] Improve `StreamingResponse` Exception Handling (vllm-project#11752) * [distributed] remove pynccl's redundant change_state (vllm-project#11749) * [Doc] [1/N] Reorganize Getting Started section (vllm-project#11645) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Bugfix] Remove block size constraint (vllm-project#11723) * [V1] Add BlockTable class (vllm-project#11693) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> * [Misc] Fix typo for valid_tool_parses (vllm-project#11753) Signed-off-by: Rui Qiao <ruisearch42@gmail.com> * [V1] Refactor get_executor_cls (vllm-project#11754) * [mypy] Forward pass function type hints in lora (vllm-project#11740) Signed-off-by: lucast2021 <lucast2021@headroyce.org> Co-authored-by: lucast2021 <lucast2021@headroyce.org> * k8s-config: Update the secret to use stringData (vllm-project#11679) Signed-off-by: Suraj Deshmukh <surajd.service@gmail.com> * [VLM] Separate out profiling-related logic (vllm-project#11746) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Doc][2/N] Reorganize Models and Usage sections (vllm-project#11755) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Bugfix] Fix max image size for LLaVA-Onevision (vllm-project#11769) Signed-off-by: Roger Wang <ywang@roblox.com> * [doc] explain how to add interleaving sliding window support (vllm-project#11771) Signed-off-by: youkaichao <youkaichao@gmail.com> * [Bugfix][V1] Fix molmo text-only inputs (vllm-project#11676) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> * [Kernel] Move attn_type to Attention.__init__() (vllm-project#11690) Signed-off-by: Chen Zhang <zhangch99@outlook.com> * format * [V1] Extend beyond image modality and support mixed-modality inference with Llava-OneVision (vllm-project#11685) Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> * deepseek overflow fix (#349) * [Bugfix] Fix LLaVA-NeXT feature size precision error (for real) (vllm-project#11772) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Model] Future-proof Qwen2-Audio multi-modal processor (vllm-project#11776) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [XPU] Make pp group initilized for pipeline-parallelism (vllm-project#11648) Signed-off-by: yisheng <yi.sheng@intel.com> * [Doc][3/N] Reorganize Serving section (vllm-project#11766) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Kernel][LoRA]Punica prefill kernels fusion (vllm-project#11234) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: Abatom <abzhonghua@gmail.com> Co-authored-by: Zhonghua Deng <abatom@163.com> * [Bugfix] Update attention interface in `Whisper` (vllm-project#11784) Signed-off-by: Roger Wang <ywang@roblox.com> * [CI] Fix neuron CI and run offline tests (vllm-project#11779) Signed-off-by: Liangfu Chen <liangfc@amazon.com> * fix init error for MessageQueue when n_local_reader is zero (vllm-project#11768) * [Doc] Create a vulnerability management team (vllm-project#9925) Signed-off-by: Russell Bryant <rbryant@redhat.com> * [CI][CPU] adding build number to docker image name (vllm-project#11788) Signed-off-by: Yuan Zhou <yuan.zhou@intel.com> * [V1][Doc] Update V1 support for `LLaVa-NeXT-Video` (vllm-project#11798) Signed-off-by: Roger Wang <ywang@roblox.com> * [Bugfix] Comprehensively test and fix LLaVA-NeXT feature size calculation (vllm-project#11800) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [doc] add doc to explain how to use uv (vllm-project#11773) Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> * [V1] Support audio language models on V1 (vllm-project#11733) Signed-off-by: Roger Wang <ywang@roblox.com> * [doc] update how pip can install nightly wheels (vllm-project#11806) Signed-off-by: youkaichao <youkaichao@gmail.com> * [Doc] Add note to `gte-Qwen2` models (vllm-project#11808) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [optimization] remove python function call for custom op (vllm-project#11750) Signed-off-by: youkaichao <youkaichao@gmail.com> * [Bugfix] update the prefix for qwen2 (vllm-project#11795) Co-authored-by: jiadi.jjd <jiadi.jjd@antgroup.com> * [Doc]Add documentation for using EAGLE in vLLM (vllm-project#11417) Signed-off-by: Sourashis Roy <sroy@roblox.com> * [Bugfix] Significant performance drop on CPUs with --num-scheduler-steps > 1 (vllm-project#11794) * [Doc] Group examples into categories (vllm-project#11782) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> * [Bugfix] Fix image input for Pixtral-HF (vllm-project#11741) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Misc] sort torch profiler table by kernel timing (vllm-project#11813) * Remove the duplicate imports of MultiModalKwargs and PlaceholderRange… (vllm-project#11824) * Fixed docker build for ppc64le (vllm-project#11518) Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com> * [OpenVINO] Fixed Docker.openvino build (vllm-project#11732) Signed-off-by: Ilya Lavrenov <ilya.lavrenov@intel.com> * [Bugfix] Add checks for LoRA and CPU offload (vllm-project#11810) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> * [Docs] reorganize sponsorship page (vllm-project#11639) Signed-off-by: simon-mo <simon.mo@hey.com> * [Bug] Fix pickling of `ModelConfig` when RunAI Model Streamer is used (vllm-project#11825) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [misc] improve memory profiling (vllm-project#11809) Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> * [doc] update wheels url (vllm-project#11830) Signed-off-by: youkaichao <youkaichao@gmail.com> * [Docs] Update sponsor name: 'Novita' to 'Novita AI' (vllm-project#11833) * [Hardware][Apple] Native support for macOS Apple Silicon (vllm-project#11696) Signed-off-by: Wallas Santos <wallashss@ibm.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> * [torch.compile] consider relevant code in compilation cache (vllm-project#11614) Signed-off-by: youkaichao <youkaichao@gmail.com> * [VLM] Reorganize profiling/processing-related code (vllm-project#11812) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Doc] Move examples into categories (vllm-project#11840) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> * [Doc][4/N] Reorganize API Reference (vllm-project#11843) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [CI/Build][Bugfix] Fix CPU CI image clean up (vllm-project#11836) Signed-off-by: jiang1.li <jiang1.li@intel.com> * [Bugfix][XPU] fix silu_and_mul (vllm-project#11823) Signed-off-by: yan ma <yan.ma@intel.com> * [Misc] Move some model utils into vision file (vllm-project#11848) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Doc] Expand Multimodal API Reference (vllm-project#11852) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Misc]add some explanations for BlockHashType (vllm-project#11847) * [TPU][Quantization] TPU `W8A8` (vllm-project#11785) Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> * [Kernel][Triton][AMD] Use block size heuristic for avg 2.8x speedup for int8 models (vllm-project#11698) Signed-off-by: Randall Smith <Randall.Smith@amd.com> * [Docs] Add Google Cloud Meetup (vllm-project#11864) * Revert nccl changes (#351) * Revert "[distributed] remove pynccl's redundant change_state (vllm-project#11749)" This reverts commit 9e764e7. * Revert "[distributed] remove pynccl's redundant stream (vllm-project#11744)" This reverts commit 635b897. * [CI] Turn on basic correctness tests for V1 (vllm-project#10864) * treat do_lower_case in the same way as the sentence-transformers library (vllm-project#11815) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> * [Doc] Recommend uv and python 3.12 for quickstart guide (vllm-project#11849) Signed-off-by: mgoin <michael@neuralmagic.com> * [Misc] Move `print_*_once` from utils to logger (vllm-project#11298) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com> Co-authored-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com> * [Doc] Intended links Python multiprocessing library (vllm-project#11878) * [perf]fix current stream (vllm-project#11870) Signed-off-by: youkaichao <youkaichao@gmail.com> * [Bugfix] Override dunder methods of placeholder modules (vllm-project#11882) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Bugfix] fix beam search input errors and latency benchmark script (vllm-project#11875) Signed-off-by: Ye Qi <yeq@meta.com> Co-authored-by: yeq <yeq@devgpu004.lla3.facebook.com> * [Doc] Add model development API Reference (vllm-project#11884) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [platform] Allow platform specify attention backend (vllm-project#11609) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Mengqing Cao <cmq0113@163.com> Co-authored-by: Mengqing Cao <cmq0113@163.com> * [ci]try to fix flaky multi-step tests (vllm-project#11894) Signed-off-by: youkaichao <youkaichao@gmail.com> * [Misc] Provide correct Pixtral-HF chat template (vllm-project#11891) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * fp8 support (#352) Co-authored-by: Yida Wu <yidawu@amd.com> * [Docs] Add Modal to deployment frameworks (vllm-project#11907) * [Doc][5/N] Move Community and API Reference to the bottom (vllm-project#11896) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Simon Mo <simon.mo@hey.com> * [VLM] Enable tokenized inputs for merged multi-modal processor (vllm-project#11900) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Doc] Show default pooling method in a table (vllm-project#11904) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [torch.compile] Hide KV cache behind torch.compile boundary (vllm-project#11677) Signed-off-by: Chen Zhang <zhangch99@outlook.com> * [Bugfix] Validate lora adapters to avoid crashing server (vllm-project#11727) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> * [BUGFIX] Fix `UnspecifiedPlatform` package name (vllm-project#11916) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * [ci] fix gh200 tests (vllm-project#11919) Signed-off-by: youkaichao <youkaichao@gmail.com> * [misc] remove python function call for custom activation op (vllm-project#11885) Co-authored-by: youkaichao <youkaichao@gmail.com> * [platform] support pytorch custom op pluggable (vllm-project#11328) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> * Replace "online inference" with "online serving" (vllm-project#11923) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> * [ci] Fix sampler tests (vllm-project#11922) Signed-off-by: youkaichao <youkaichao@gmail.com> * [Doc] [1/N] Initial guide for merged multi-modal processor (vllm-project#11925) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [platform] support custom torch.compile backend key (vllm-project#11318) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com> * [Doc] Rename offline inference examples (vllm-project#11927) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> * [Docs] Fix docstring in `get_ip` function (vllm-project#11932) Signed-off-by: Kuntai Du <kuntai@uchicago.edu> * Doc fix in `benchmark_long_document_qa_throughput.py` (vllm-project#11933) Signed-off-by: Kuntai Du <kuntai@uchicago.edu> * [Hardware][CPU] Support MOE models on x86 CPU (vllm-project#11831) Signed-off-by: jiang1.li <jiang1.li@intel.com> * [Misc] Clean up debug code in Deepseek-V3 (vllm-project#11930) Signed-off-by: Isotr0py <2037008807@qq.com> * [Misc] Update benchmark_prefix_caching.py fixed example usage (vllm-project#11920) Signed-off-by: Ren MinMin <renmm6@chinaunicom.cn> Co-authored-by: Ren MinMin <renmm6@chinaunicom.cn> * [Bugfix] Check that number of images matches number of <|image|> tokens with mllama (vllm-project#11939) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> * [mypy] Fix mypy warnings in api_server.py (vllm-project#11941) Signed-off-by: Fred Reiss <frreiss@us.ibm.com> * [ci] fix broken distributed-tests-4-gpus (vllm-project#11937) Signed-off-by: youkaichao <youkaichao@gmail.com> * [Bugfix][SpecDecode] Adjust Eagle model architecture to align with intended design (vllm-project#11672) Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com> * [Bugfix] fused_experts_impl wrong compute type for float32 (vllm-project#11921) Signed-off-by: shaochangxu.scx <shaochangxu.scx@antgroup.com> Co-authored-by: shaochangxu.scx <shaochangxu.scx@antgroup.com> * [CI/Build] Move model-specific multi-modal processing tests (vllm-project#11934) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Doc] Basic guide for writing unit tests for new models (vllm-project#11951) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> * [Bugfix] Fix RobertaModel loading (vllm-project#11940) Signed-off-by: NickLucche <nlucches@redhat.com> * [Model] Add cogagent model support vLLM (vllm-project#11742) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com> * [V1] Avoid sending text prompt to core engine (vllm-project#11963) Signed-off-by: Roger Wang <ywang@roblox.com> * [CI/Build] Add markdown linter (vllm-project#11857) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> * [Model] Initialize support for Deepseek-VL2 models (vllm-project#11578) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> * [Hardware][CPU] Multi-LoRA implementation for the CPU backend (vllm-project#11100) Signed-off-by: Akshat Tripathi <akshat@krai.ai> Signed-off-by: Oleg Mosalov <oleg@krai.ai> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Oleg Mosalov <oleg@krai.ai> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Isotr0py <2037008807@qq.com> * [Hardware][TPU] workaround fix for MoE on TPU (vllm-project#11764) * [V1][Core][1/n] Logging and Metrics (vllm-project#11962) Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> * [Model] Support GGUF models newly added in `transformers` 4.46.0 (vllm-project#9685) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> * [V1] [2/n] Logging and Metrics - `OutputProcessor` Abstraction (vllm-project#11973) Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> * [MISC] fix typo in kv transfer send recv test (vllm-project#11983) * [Bug] Fix usage of `.transpose()` and `.view()` consecutively. (vllm-project#11979) * [CI][Spec Decode] fix: broken test for EAGLE model (vllm-project#11972) Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com> * [Misc] Fix Deepseek V2 fp8 kv-scale remapping (vllm-project#11947) Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu> * [Misc]Minor Changes about Worker (vllm-project#11555) Signed-off-by: Chenguang Li <757486878@qq.com> * [platform] add ray_device_key (vllm-project#11948) Signed-off-by: youkaichao <youkaichao@gmail.com> * Fix Max Token ID for Qwen-VL-Chat (vllm-project#11980) Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> * [Kernel] unified_attention for Attention.forward (vllm-project#11967) Signed-off-by: Chen Zhang <zhangch99@outlook.com> * [Doc][V1] Update model implementation guide for V1 support (vllm-project#11998) Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> * [Doc] Organise installation documentation into categories and tabs (vllm-project#11935) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> * [platform] add device_control env var (vllm-project#12009) Signed-off-by: youkaichao <youkaichao@gmail.com> * [Platform] Move get_punica_wrapper() function to Platform (vllm-project#11516) Signed-off-by: Shanshan Shen <467638484@qq.com> * bugfix: Fix signature mismatch in benchmark's `get_tokenizer` function (vllm-project#11982) Signed-off-by: elijah <f1renze.142857@gmail.com> * Using list * Revert "[misc] improve memory profiling (vllm-project#11809)" This reverts commit 889e662. * Multi-lingual P3L (#356) * Commiting the *multilingual* P3L test. * Created a *multi-lingual* P3L test. * Making ruff happy. * . * Added a reference to the language-scripture Confluence table. * Typo fixing. * Harmonizing naming. * Fixing comments in the header. --------- Co-authored-by: Alexei V. Ivanov <alivanov@banff-cyxtera-s65-4.amd.com> Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com> * Trying to make scales work with compileable attention * Docs lint * linter formatting bug fixes * inherit config file updates under fused_moe from main branch. * match tests for the MOE layers with main. --------- Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com> Signed-off-by: lucast2021 <lucast2021@headroyce.org> Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: Sourashis Roy <sroy@roblox.com> Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: simon-mo <simon.mo@hey.com> Signed-off-by: simon-mo <xmo@berkeley.edu> Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Mengqing Cao <cmq0113@163.com> Signed-off-by: Alex He <alehe@amd.com> Signed-off-by: ccjincong <chenjincong11@gmail.com> Signed-off-by: Erez Schwartz <erezs@ai21.com> Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: rajveerb <46040700+rajveerb@users.noreply.github.com> Signed-off-by: hjwei <hjwei_xd@163.com> Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Signed-off-by: KuntaiDu <kuntai@uchicago.edu> Signed-off-by: Liangfu Chen <liangfc@amazon.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: jiang1.li <jiang1.li@intel.com> Signed-off-by: Matthias Vogler <matthias.vogler@joesecurity.org> Signed-off-by: ApostaC <yihua98@uchicago.edu> Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> Signed-off-by: Lu Fang <lufang@fb.com> Signed-off-by: Kazuhiro Serizawa <nserihiro@gmail.com> Signed-off-by: Tobias Pitters <tobias.pitters@gmail.com> Signed-off-by: Kathy Yu <feiyangyu@google.com> Signed-off-by: bjmsong <bjmsong@126.com> Signed-off-by: wchen61 <wchen61@foxmail.com> Signed-off-by: ZincCat <zincchloride@outlook.com> Signed-off-by: xcnick <xcnick0412@gmail.com> Signed-off-by: Yan Burman <yanburman@users.noreply.github.com> Signed-off-by: Ido Asraff <idoa@atero.ai> Signed-off-by: Rui Qiao <ruisearch42@gmail.com> Signed-off-by: Suraj Deshmukh <surajd.service@gmail.com> Signed-off-by: yisheng <yi.sheng@intel.com> Signed-off-by: Abatom <abzhonghua@gmail.com> Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: Yuan Zhou <yuan.zhou@intel.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com> Signed-off-by: Ilya Lavrenov <ilya.lavrenov@intel.com> Signed-off-by: Wallas Santos <wallashss@ibm.com> Signed-off-by: yan ma <yan.ma@intel.com> Signed-off-by: Randall Smith <Randall.Smith@amd.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com> Signed-off-by: Ye Qi <yeq@meta.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Kuntai Du <kuntai@uchicago.edu> Signed-off-by: Ren MinMin <renmm6@chinaunicom.cn> Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Signed-off-by: Fred Reiss <frreiss@us.ibm.com> Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com> Signed-off-by: shaochangxu.scx <shaochangxu.scx@antgroup.com> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> Signed-off-by: Akshat Tripathi <akshat@krai.ai> Signed-off-by: Oleg Mosalov <oleg@krai.ai> Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu> Signed-off-by: Chenguang Li <757486878@qq.com> Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> Signed-off-by: Shanshan Shen <467638484@qq.com> Signed-off-by: elijah <f1renze.142857@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Yuan Tang <terrytangyuan@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Co-authored-by: Rui Qiao <161574667+ruisearch42@users.noreply.github.com> Co-authored-by: Jiaxin Shan <seedjeffwan@gmail.com> Co-authored-by: Lucas Tucker <47258766+lucas-tucker@users.noreply.github.com> Co-authored-by: lucast2021 <lucast2021@headroyce.org> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by: sroy745 <142070531+sroy745@users.noreply.github.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> Co-authored-by: simon-mo <simon.mo@hey.com> Co-authored-by: simon-mo <xmo@berkeley.edu> Co-authored-by: HandH1998 <1335248067@qq.com> Co-authored-by: robertgshaw2-neuralmagic <rshaw@neuralmagic.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Mengqing Cao <cmq0113@163.com> Co-authored-by: AlexHe99 <alehe@amd.com> Co-authored-by: Chen1022 <112855051+ccjincong@users.noreply.github.com> Co-authored-by: ErezSC42 <erezs@ai21.com> Co-authored-by: Selali <selali.adobor@gmail.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Rajveer Bachkaniwala <46040700+rajveerb@users.noreply.github.com> Co-authored-by: hj-wei <hjwei_xd@163.com> Co-authored-by: Kuntai Du <kuntai@uchicago.edu> Co-authored-by: Liangfu Chen <liangfc@amazon.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Li, Jiang <jiang1.li@intel.com> Co-authored-by: whyiug <whyiug@hotmail.com> Co-authored-by: Kevin H. Luu <kevin@anyscale.com> Co-authored-by: Matthias Vogler <60004995+ayylemao@users.noreply.github.com> Co-authored-by: Matthias Vogler <matthias.vogler@joesecurity.org> Co-authored-by: John Giorgi <johnmgiorgi@gmail.com> Co-authored-by: sakunkun <zhou.qianjun@zte.com.cn> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Yihua Cheng <yihua98@uchicago.edu> Co-authored-by: Joe Runde <Joseph.Runde@ibm.com> Co-authored-by: Roger Wang <ywang@roblox.com> Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com> Co-authored-by: Kazuhiro Serizawa <nserihiro@gmail.com> Co-authored-by: Tobias Pitters <31857876+CloseChoice@users.noreply.github.com> Co-authored-by: Chunyang Wen <chunyang.wen@gmail.com> Co-authored-by: Kathy Yu <143133934+kathyyu-google@users.noreply.github.com> Co-authored-by: bjmsong <wq.songbob@gmail.com> Co-authored-by: bjmsong <bjmsong@126.com> Co-authored-by: wchen61 <wchen61@foxmail.com> Co-authored-by: Nathan Azrak <42650258+nathan-az@users.noreply.github.com> Co-authored-by: Sachin Varghese <sachin.mathew31@gmail.com> Co-authored-by: Aurick Qiao <aurickq@users.noreply.github.com> Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com> Co-authored-by: ZincCat <52513999+zinccat@users.noreply.github.com> Co-authored-by: WangErXiao <863579016@qq.com> Co-authored-by: Hust_YangXian <bryceyx@gmail.com> Co-authored-by: Alberto Ferrer <albertof@barrahome.org> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> Co-authored-by: xcnick <xcnick0412@gmail.com> Co-authored-by: Yan Burman <yanburman@users.noreply.github.com> Co-authored-by: cennn <61925104+cennn@users.noreply.github.com> Co-authored-by: Lancer <402430575@qq.com> Co-authored-by: Lancer <maruixiang6688@gmail.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> Co-authored-by: Suraj Deshmukh <surajd.service@gmail.com> Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Co-authored-by: Concurrensee <yida.wu@amd.com> Co-authored-by: YiSheng5 <yi.sheng@intel.com> Co-authored-by: Zhonghua Deng <abatom@163.com> Co-authored-by: XiaobingZhang <xiaobingzhangupc@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Yuan <yuan.zhou@intel.com> Co-authored-by: jiangjiadi <34134495+jiangjiadi@users.noreply.github.com> Co-authored-by: jiadi.jjd <jiadi.jjd@antgroup.com> Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com> Co-authored-by: Jie Fu (傅杰) <jiefu@tencent.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Divakar Verma <137818590+divakar-amd@users.noreply.github.com> Co-authored-by: Nishidha <nishidha.panpaliya@partner.ibm.com> Co-authored-by: Ilya Lavrenov <ilya.lavrenov@intel.com> Co-authored-by: Wallas Henrique <wallashss@users.noreply.github.com> Co-authored-by: Yan Ma <yan.ma@intel.com> Co-authored-by: rasmith <Randall.Smith@amd.com> Co-authored-by: Maximilien de Bayser <mbayser@br.ibm.com> Co-authored-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com> Co-authored-by: Guspan Tanadi <36249910+guspan-tanadi@users.noreply.github.com> Co-authored-by: Ye (Charlotte) Qi <ye.charlotte.qi@gmail.com> Co-authored-by: yeq <yeq@devgpu004.lla3.facebook.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: Yida Wu <yidawu@amd.com> Co-authored-by: Charles Frye <cfrye59@gmail.com> Co-authored-by: minmin <rmm0811@gmail.com> Co-authored-by: Ren MinMin <renmm6@chinaunicom.cn> Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by: Fred Reiss <frreiss@us.ibm.com> Co-authored-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com> Co-authored-by: shaochangxu <85155497+shaochangxu@users.noreply.github.com> Co-authored-by: shaochangxu.scx <shaochangxu.scx@antgroup.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Co-authored-by: sixgod <evethwillbeok@outlook.com> Co-authored-by: Rafael Vasquez <rafvasq21@gmail.com> Co-authored-by: Akshat Tripathi <Akshat.tripathi6568@gmail.com> Co-authored-by: Oleg Mosalov <oleg@krai.ai> Co-authored-by: Avshalom Manevich <12231371+avshalomman@users.noreply.github.com> Co-authored-by: Yangcheng Li <liyangcheng.lyc@alibaba-inc.com> Co-authored-by: Siyuan Li <94890248+liaoyanqing666@users.noreply.github.com> Co-authored-by: Chenguang Li <757486878@qq.com> Co-authored-by: Alex Brooks <alex.brooks@ibm.com> Co-authored-by: Shanshan Shen <467638484@qq.com> Co-authored-by: elijah <30852919+e1ijah1@users.noreply.github.com> Co-authored-by: Alexei-V-Ivanov-AMD <156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com> Co-authored-by: Alexei V. Ivanov <alivanov@banff-cyxtera-s65-4.amd.com> Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: Bowen Wang <abmfy@icloud.com>

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>

feat: support rslora

e308f37

JohnGiorgi added 2 commits July 29, 2024 15:15

fix: lint the codebase

6690e8b

fix: default to not using rslora

e2cf420

DarkLight1337 requested a review from Yard1 July 30, 2024 03:45

github-actions bot added the stale label Nov 1, 2024

github-actions bot added unstale and removed stale labels Nov 4, 2024

jeejeelee mentioned this pull request Dec 6, 2024

[Bug]: LoRa adapter responses not matching peft/transformers response #10798

Closed

1 task

RonanKMcGovern mentioned this pull request Dec 6, 2024

LoRa Responses are Inconsistent with peft inference predibase/lorax#700

Closed

4 tasks

JohnGiorgi added 3 commits December 23, 2024 09:34

Merge branch 'main' into support-rslora

b782763

fix: move rslora scaling to peft_helper

a3206bb

fix: set scaling factor directly, instead of modifying alpha

42e04aa

JohnGiorgi commented Dec 23, 2024

View reviewed changes

fix: remove error message about RSLoRA not being supported

f73a3db

JohnGiorgi and others added 2 commits December 23, 2024 09:55

docs: add comments with arxiv links for rsLoRA and DoRA

419a6fb

Done

7ace76c

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

jeejeelee changed the title ~~[Bugfix] Support Rank Stabilized LoRA (RSLoRA)~~ [Misc][LoRA] Support Rank Stabilized LoRA (RSLoRA) Dec 24, 2024

Merge branch 'main' into support-rslora

20c9f14

jeejeelee approved these changes Dec 30, 2024

View reviewed changes

jeejeelee enabled auto-merge (squash) December 30, 2024 16:04

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 30, 2024

simon-mo disabled auto-merge December 31, 2024 06:15

simon-mo merged commit 82c49d3 into vllm-project:main Dec 31, 2024
66 of 69 checks passed

joennlae pushed a commit to 44ai-labs/vllm that referenced this pull request Jan 19, 2025

[Misc][LoRA] Support Rank Stabilized LoRA (RSLoRA) (vllm-project#6909)

93b052b

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>

joennlae pushed a commit to 44ai-labs/vllm that referenced this pull request Jan 19, 2025

[Misc][LoRA] Support Rank Stabilized LoRA (RSLoRA) (vllm-project#6909)

edd48b1

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>

abmfy pushed a commit to abmfy/vllm-flashinfer that referenced this pull request Jan 24, 2025

[Misc][LoRA] Support Rank Stabilized LoRA (RSLoRA) (vllm-project#6909)

7ef9942

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Misc][LoRA] Support Rank Stabilized LoRA (RSLoRA) #6909

[Misc][LoRA] Support Rank Stabilized LoRA (RSLoRA) #6909

JohnGiorgi commented Jul 29, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Jul 29, 2024

JohnGiorgi commented Jul 30, 2024 •

edited

Loading

github-actions bot commented Nov 1, 2024

JohnGiorgi commented Nov 1, 2024

JohnGiorgi commented Nov 19, 2024

jeejeelee commented Dec 5, 2024

JohnGiorgi commented Dec 6, 2024

jeejeelee commented Dec 6, 2024

RonanKMcGovern commented Dec 6, 2024 •

edited

Loading

jeejeelee commented Dec 6, 2024

RonanKMcGovern commented Dec 6, 2024

jeejeelee commented Dec 11, 2024

JohnGiorgi commented Dec 11, 2024

nielsrolf commented Dec 23, 2024

jeejeelee commented Dec 23, 2024

JohnGiorgi commented Dec 23, 2024 •

edited

Loading

JohnGiorgi Dec 23, 2024 •

edited

Loading

jeejeelee Dec 23, 2024

JohnGiorgi Dec 23, 2024 •

edited

Loading

jeejeelee Dec 23, 2024

JohnGiorgi Dec 23, 2024

jeejeelee Dec 24, 2024

JohnGiorgi Dec 30, 2024

jeejeelee Dec 30, 2024

JohnGiorgi Dec 30, 2024

jeejeelee Dec 30, 2024

jeejeelee left a comment

JohnGiorgi commented Dec 30, 2024

[Misc][LoRA] Support Rank Stabilized LoRA (RSLoRA) #6909

[Misc][LoRA] Support Rank Stabilized LoRA (RSLoRA) #6909

Conversation

JohnGiorgi commented Jul 29, 2024 • edited by github-actions bot Loading

github-actions bot commented Jul 29, 2024

JohnGiorgi commented Jul 30, 2024 • edited Loading

github-actions bot commented Nov 1, 2024

JohnGiorgi commented Nov 1, 2024

JohnGiorgi commented Nov 19, 2024

jeejeelee commented Dec 5, 2024

JohnGiorgi commented Dec 6, 2024

jeejeelee commented Dec 6, 2024

RonanKMcGovern commented Dec 6, 2024 • edited Loading

jeejeelee commented Dec 6, 2024

RonanKMcGovern commented Dec 6, 2024

jeejeelee commented Dec 11, 2024

JohnGiorgi commented Dec 11, 2024

nielsrolf commented Dec 23, 2024

jeejeelee commented Dec 23, 2024

JohnGiorgi commented Dec 23, 2024 • edited Loading

JohnGiorgi Dec 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JohnGiorgi Dec 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeejeelee left a comment

Choose a reason for hiding this comment

JohnGiorgi commented Dec 30, 2024

JohnGiorgi commented Jul 29, 2024 •

edited by github-actions bot

Loading

JohnGiorgi commented Jul 30, 2024 •

edited

Loading

RonanKMcGovern commented Dec 6, 2024 •

edited

Loading

JohnGiorgi commented Dec 23, 2024 •

edited

Loading

JohnGiorgi Dec 23, 2024 •

edited

Loading

JohnGiorgi Dec 23, 2024 •

edited

Loading