Skip to content

Conversation

@WoosukKwon
Copy link
Collaborator

This PR adds a custom CUDA kernel for RMS normalization, which is used in LLaMA models. The kernel removes the inefficient data movement in the current PyTorch implementation.

Performance (fp16):

* num_tokens=7, hidden_size=1024
Kernel: 9 us
PyTorch: 93 us

* num_tokens=128, hidden_size=1024
Kernel: 5 us
PyTorch: 84 us

* num_tokens=2048, hidden_size=5120
Kernel: 60 us
PyTorch: 353 us

Tested models:

  • LLaMA-7B
  • LLaMA-13B

Tested GPUs:

  • A100

@WoosukKwon WoosukKwon requested a review from zhuohan123 March 31, 2023 01:25
Copy link
Member

@zhuohan123 zhuohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhuohan123 zhuohan123 merged commit 09e9245 into main Mar 31, 2023
@WoosukKwon WoosukKwon deleted the rms-norm branch March 31, 2023 16:51
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024
AdrianAbeyta added a commit to AdrianAbeyta/vllm that referenced this pull request Mar 8, 2024
Generalizing KV scales JSON to updated schema
luo-cheng2021 pushed a commit to luo-cheng2021/vllm that referenced this pull request Apr 1, 2024
z103cb added a commit to dtrifiro/vllm that referenced this pull request May 8, 2024
fxmarty pushed a commit to fxmarty/vllm-public that referenced this pull request May 31, 2024
yukavio pushed a commit to yukavio/vllm that referenced this pull request Jul 3, 2024
SUMMARY:
- Fix bug whereby 2:4 is not being invoked
- Eschew SparseTensor based implementation

TESTING:
- examples/offline_inference_semi_structured_sparse.py

---------

Co-authored-by: Lucas Wilkinson <wilkinson.lucas@gmail.com>
yukavio pushed a commit to yukavio/vllm that referenced this pull request Jul 3, 2024
SUMMARY:
- Fix bug whereby 2:4 is not being invoked
- Eschew SparseTensor based implementation

TESTING:
- examples/offline_inference_semi_structured_sparse.py

---------

Co-authored-by: Lucas Wilkinson <wilkinson.lucas@gmail.com>
@alixiaodi alixiaodi mentioned this pull request Aug 2, 2024
chaunceyjiang added a commit to chaunceyjiang/vllm that referenced this pull request Apr 23, 2025
robertgshaw2-redhat referenced this pull request in robertgshaw2-redhat/vllm May 1, 2025
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
robertgshaw2-redhat referenced this pull request in robertgshaw2-redhat/vllm May 3, 2025
* [Update] LMcache connector v1 implementation

Signed-off-by: ApostaC <yihua98@uchicago.edu>

* [Add] examples for disaggregated prefill

Signed-off-by: ApostaC <yihua98@uchicago.edu>

* [add] extra information about evns

Signed-off-by: ApostaC <yihua98@uchicago.edu>

* Initial stubs for P/D scheduling changes

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Updates

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Rs branch (#3)

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Rs branch (#5)

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Remove Unneeded Arguments (#7)

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* stash

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* cleanup

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

---------

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Improve disagg-example.sh (#8)

- fix spelling
- CUDA_VISIBLE_DEVICES should be set externally

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* added connector

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* update

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* remove

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* seems to load properly

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Revert "updated"

This reverts commit 97316d9.

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* stash

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* added

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* diffs for local dev on macos

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* update

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updaed

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* Checkpoint.

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* Cleanup

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* WIP

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated on scheduler side

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* Hacking away

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* cleanup

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* ensure request removed from running list

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* Runs E2E. Garbage output. Crashes on 2nd request

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* update

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* rename files

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* update

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* Second request no longer crashes

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Remove gpu_model_runner hacks

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Clean up Justfile

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* [Bugfix] Stale finished requests in EMPTY_MODEL_RUNNER_OUTPUT

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* update

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* justfile edits

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Update

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Fixes - lm_eval gsm8k has correctness

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* "just delete the assert"

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* fixup precommit issues

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Fixes

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* updated (#12)

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Add Accuracy Test (#13)

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

---------

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Preemption Bugfixes (#15)

* stash fixed double free issue

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* fixed issue

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatrd

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatrd

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatrd

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatrd

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatrd

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatrd

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

---------

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated (#16)

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Fix Bad Merge | Fix Memory Leak in Upstream (#18)

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* fix merge

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

---------

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* clean up justfile, examples

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* more cleanup

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* more cleanup

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* more cleanup

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* more cleanup

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* More cleanup

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* more cleanup

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* more cleanup, precommit fixes

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* More cleanup

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* run_accuracy_test.sh UX

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* squash warnings

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* pre-commit

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* update

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Add get_finished to base kv connector

Signed-off-by: mgoin <mgoin64@gmail.com>

* revert test.txt

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* cleanup

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Cleanup

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* review comments

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

---------

Signed-off-by: ApostaC <yihua98@uchicago.edu>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: ApostaC <yihua98@uchicago.edu>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
robertgshaw2-redhat referenced this pull request in robertgshaw2-redhat/vllm May 4, 2025
* [Update] LMcache connector v1 implementation

Signed-off-by: ApostaC <yihua98@uchicago.edu>

* [Add] examples for disaggregated prefill

Signed-off-by: ApostaC <yihua98@uchicago.edu>

* [add] extra information about evns

Signed-off-by: ApostaC <yihua98@uchicago.edu>

* Initial stubs for P/D scheduling changes

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Updates

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Rs branch (#3)

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Rs branch (#5)

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Remove Unneeded Arguments (#7)

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* stash

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* cleanup

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

---------

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Improve disagg-example.sh (#8)

- fix spelling
- CUDA_VISIBLE_DEVICES should be set externally

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* added connector

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* update

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* remove

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* seems to load properly

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Revert "updated"

This reverts commit 97316d9.

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* stash

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* added

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* diffs for local dev on macos

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* update

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updaed

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* Checkpoint.

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* Cleanup

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* WIP

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated on scheduler side

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* Hacking away

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* cleanup

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* ensure request removed from running list

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* Runs E2E. Garbage output. Crashes on 2nd request

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* update

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* rename files

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* update

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* Second request no longer crashes

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Remove gpu_model_runner hacks

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Clean up Justfile

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* [Bugfix] Stale finished requests in EMPTY_MODEL_RUNNER_OUTPUT

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* update

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* justfile edits

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Update

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Fixes - lm_eval gsm8k has correctness

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* "just delete the assert"

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* fixup precommit issues

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Fixes

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* updated (#12)

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Add Accuracy Test (#13)

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

---------

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Preemption Bugfixes (#15)

* stash fixed double free issue

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* fixed issue

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatrd

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatrd

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatrd

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatrd

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatrd

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatrd

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

---------

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated (#16)

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Fix Bad Merge | Fix Memory Leak in Upstream (#18)

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* fix merge

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

---------

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* cleanup code

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* cleanup code

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* stash

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatted

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* revert

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* more spurious changes

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Update vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py

Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>

* Update vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py

Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>

---------

Signed-off-by: ApostaC <yihua98@uchicago.edu>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
Co-authored-by: ApostaC <yihua98@uchicago.edu>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
robertgshaw2-redhat referenced this pull request in robertgshaw2-redhat/vllm May 6, 2025
* [Update] LMcache connector v1 implementation

Signed-off-by: ApostaC <yihua98@uchicago.edu>

* [Add] examples for disaggregated prefill

Signed-off-by: ApostaC <yihua98@uchicago.edu>

* [add] extra information about evns

Signed-off-by: ApostaC <yihua98@uchicago.edu>

* Initial stubs for P/D scheduling changes

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Updates

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Rs branch (#3)

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Rs branch (#5)

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Remove Unneeded Arguments (#7)

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* stash

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* cleanup

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

---------

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Improve disagg-example.sh (#8)

- fix spelling
- CUDA_VISIBLE_DEVICES should be set externally

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* added connector

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* update

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* remove

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* seems to load properly

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Revert "updated"

This reverts commit 97316d9.

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* stash

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* added

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* diffs for local dev on macos

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* update

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updaed

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* Checkpoint.

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* Cleanup

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* WIP

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated on scheduler side

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* Hacking away

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* cleanup

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* ensure request removed from running list

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* Runs E2E. Garbage output. Crashes on 2nd request

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* update

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* rename files

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* updated

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* update

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

* Second request no longer crashes

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Remove gpu_model_runner hacks

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Clean up Justfile

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* [Bugfix] Stale finished requests in EMPTY_MODEL_RUNNER_OUTPUT

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* update

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* justfile edits

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Update

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Fixes - lm_eval gsm8k has correctness

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* "just delete the assert"

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* fixup precommit issues

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Fixes

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* updated (#12)

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Add Accuracy Test (#13)

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

---------

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Preemption Bugfixes (#15)

* stash fixed double free issue

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* fixed issue

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatrd

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatrd

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatrd

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatrd

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatrd

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatrd

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

---------

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated (#16)

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Fix Bad Merge | Fix Memory Leak in Upstream (#18)

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* fix merge

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

---------

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* cleanup code

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* cleanup code

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* stash

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updatted

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* revert

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* more spurious changes

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* updated

Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>

* Support MLA in NIXL connector

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* WIP adding tests

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* wip

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* Fixes

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

---------

Signed-off-by: ApostaC <yihua98@uchicago.edu>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
Co-authored-by: ApostaC <yihua98@uchicago.edu>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
zyongye pushed a commit to zyongye/vllm that referenced this pull request Aug 5, 2025
zyongye pushed a commit to zyongye/vllm that referenced this pull request Aug 6, 2025
Pradyun92 pushed a commit to Pradyun92/vllm that referenced this pull request Sep 29, 2025
Added comprehensive documentation for the critical Eagle draft model quantization
inheritance fix to the MANTLE modifications registry:

REGISTRY ENTRY vllm-project#16: Eagle Draft Model Quantization Inheritance Fix
- Category: CRITICAL - Speculative Decoding Fix
- Issue: Eagle draft model inheriting target model's MxFP4 quantization
- Root Cause: Stricter upstream quantization validation caught config inheritance bug
- Solution: Clone + override pattern with robust attribute checking
- Impact: GPT-OSS (MxFP4) + Llama Eagle head (bf16) combinations

KEY DOCUMENTATION:
- Technical details of copy.deepcopy() clone + override approach
- Robust attribute checking with hasattr() for missing quant_config attributes
- Merge conflict resolution guidance for quantization config separation
- Testing requirements for mixed quantization speculative decoding scenarios

MERGE CONFLICT GUIDANCE:
- Preserve copy.deepcopy() pattern for config separation
- Maintain robust attribute checking for version compatibility
- Verify Eagle speculative decoding with mixed quantization setups
- Test GPT-OSS + Eagle head combinations after quantization merges

This ensures future merges preserve the critical fix that enables mixed
quantization speculative decoding scenarios.

Signed-off-by: Pradyun Ramadorai <pradyunr@amazon.com>
heheda12345 pushed a commit to heheda12345/vllm that referenced this pull request Sep 29, 2025
* and env and MQA path for both prefill and decode

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

* fix shapes

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

---------

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
dcmaddix pushed a commit to dcmaddix/vllm that referenced this pull request Oct 17, 2025
Update to default_act_function and pass as callable
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants