forked from opendatahub-io/vllm
-
Notifications
You must be signed in to change notification settings - Fork 15
nm vllm ent 0.8.5 sync #139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…m-project#16801) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Lu Fang <fanglu@fb.com>
…16796) Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
…ect#16809) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
Signed-off-by: Jonghyun Choe <andy.choe729@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
…llm-project#16829) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>
…nfig info (vllm-project#16857) Signed-off-by: jmho <jaylenho734@gmail.com>
Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>
…ect#15130) Signed-off-by: fyabc <suyang.fy@alibaba-inc.com> Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by: Roger Wang <ywang@roblox.com> Co-authored-by: Xiong Wang <wangxiongts@163.com>
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
…llm-project#16591) Signed-off-by: Jannis Schönleber <joennlae@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: Jannis Schönleber <joennlae@gmail.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
…vllm-project#16460) Signed-off-by: vie-serendipity <2733147505@qq.com>
… V1 (vllm-project#15477) Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>
Signed-off-by: rzou <zou3519@gmail.com>
Signed-off-by: Staszek Pasko <staszek@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: rzou <zou3519@gmail.com>
Signed-off-by: qizixi <qizixi@meta.com>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
- remove build steps/dependencies - allow for installing pre-built flash-attention/vllm wheels - default ROCM_VERSION to 6.3.4, allowing ovverride with env vars - cleanup rocm docker bake, defaults - amdsmi: use setup.py to build - add amdsmi bind mount - remove flashinfer from rocm target - bump vllm-tgis-adapter to 0.7.0 - Dockerfile*.ubi: bump ubi base
Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
- remove build steps/dependencies - allow for installing pre-built flash-attention/vllm wheels - default ROCM_VERSION to 6.3.4, allowing ovverride with env vars - cleanup rocm docker bake, defaults - amdsmi: use setup.py to build - add amdsmi bind mount - remove flashinfer from rocm target - bump vllm-tgis-adapter to 0.7.0 - Dockerfile*.ubi: bump ubi base
…-project#17303) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
…vllm-project#17255) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
…rides are ordered (vllm-project#17256) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
…17197) Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Russell Bryant <rbryant@redhat.com>
…t have shape (metadata_size) (vllm-project#17283) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
…_after_loading`. (vllm-project#16854) Signed-off-by: charlifu <charlifu@amd.com>
Signed-off-by: simon-mo <xmo@berkeley.edu>
…#17328) Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: andy-neuma <andy@neuralmagic.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
…ct results (vllm-project#17574) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
…client' (vllm-project#17434) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: Rahul Tuli <rtuli@redhat.com> Co-authored-by: mgoin <mgoin64@gmail.com>
…7315) Signed-off-by: Lucia Fang <fanglu@fb.com>
Syncing midstream NM fork to Upstream tag of [v0.8.5.post1](https://github.com/vllm-project/vllm/tree/v0.8.5.post1) + cherry pick of vllm-project@be633fb needed for benchmarks + [CP](neuralmagic/nm-vllm-ent@1fe447d) for compressed tensor bump + [CP](vllm-project#17677) for lora on AMD + [CP](vllm-project#17315) for llama4 w/ pure dense layers ``` commit 31c73ba (HEAD -> upstream-v0.8.5, nm-fork/upstream-v0.8.5) Author: Chauncey <chaunceyjiang@gmail.com> Date: Wed Apr 30 15:11:04 2025 +0800 [Bugfix] Fix AttributeError: 'State' object has no attribute 'engine_client' (vllm-project#17434) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> commit f8db0bd Author: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Date: Fri May 2 14:01:38 2025 -0400 [BugFix][Attention] Fix sliding window attention in V1 giving incorrect results (vllm-project#17574) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> commit e335c34 Author: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Date: Fri May 2 04:07:03 2025 -0400 [BugFix] Fix Memory Leak (vllm-project#17567) Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> commit cc463fe Merge: 1e358ff ba41cc9 Author: Selbi Nuryyeva <selbi@redhat.com> Date: Tue Apr 29 12:34:57 2025 -0400 Merge branch 'tag-upstream-v0.8.5' into upstream-v0.8.5 commit ba41cc9 (tag: v0.8.5, tag-upstream-v0.8.5) Author: Michael Goin <mgoin64@gmail.com> Date: Mon Apr 28 16:20:24 2025 -0600 [Model] Add tuned triton fused_moe configs for Qwen3Moe (vllm-project#17328) Signed-off-by: mgoin <mgoin64@gmail.com> commit dcbac4c Author: Simon Mo <simon.mo@hey.com> Date: Mon Apr 28 14:12:01 2025 -0700 [Model] Qwen3 Dense FP8 Compat Fixes (vllm-project#17318) Signed-off-by: simon-mo <xmo@berkeley.edu> [...] ``` Commands ``` git fetch upstream git checkout -b upstream-v0.8.5 git merge upstream/v0.8.5 git cherry-pick be633fb ``` TEST PLAN accept sync: https://github.com/neuralmagic/nm-cicd/actions/runs/14841223552 related PR in cicd: neuralmagic/nm-cicd#99 release workflow: https://github.com/neuralmagic/nm-cicd/actions/runs/14845693864
This bumps the cuda version in the base layer to 12-8 instead of 12-4. This could break something if during dep install we have to build a dependency from source, as the wheels we bring in later in prepare are now being built against 12.8. FIX #xxxx (*link existing issues this PR will resolve*) <!--- pyml disable-next-line no-emphasis-as-heading --> **BEFORE SUBMITTING, PLEASE READ <https://docs.vllm.ai/en/latest/contributing/overview.html>** (anything written below this line will be removed by GitHub Actions)
notable conflicts were in Dockerfile.rocm.ubi and Dockerfile.ubi Up to date with Upstream v0.8.5.post1 tag and includes CPs for lora, llama4, compressed tensors bump
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
SchedulerConfig(Improve configs -SchedulerConfigvllm-project/vllm#16533)max-num-batched-tokensis not a power of 2 ([TPU][V1] Fix exponential padding whenmax-num-batched-tokensis not a power of 2 vllm-project/vllm#16596)pyzmqversion ([BugFix]: Update minimumpyzmqversion vllm-project/vllm#16549)vllm bench [latency, throughput]CLI commands (Addvllm bench [latency, throughput]CLI commands vllm-project/vllm#16508)compressed-tensorsWNA16 to support zero-points ([Misc] Updatecompressed-tensorsWNA16 to support zero-points vllm-project/vllm#14211)backend_xgrammar.py([V1][Structured Output] Move xgrammar related utils tobackend_xgrammar.pyvllm-project/vllm#16578)additional_dependencies: [toml]for pre-commit yapf hook ([CI] Cleanupadditional_dependencies: [toml]for pre-commit yapf hook vllm-project/vllm#16405)TokenizerPoolConfig+DeviceConfig(Improve configs -TokenizerPoolConfig+DeviceConfigvllm-project/vllm#16603)max-num-batched-tokensis not even ([TPU][V1] Fix padding recompilation whenmax-num-batched-tokensis not even vllm-project/vllm#16726)--compilation-config([Doc] Improve help examples for--compilation-configvllm-project/vllm#16729)_validate_structured_output()([V1][Structured Output] Minor modification to_validate_structured_output()vllm-project/vllm#16748)MultiModalConfig+PoolerConfig+DecodingConfigvllm-project/vllm#16789)nullable_kvsfallback (Fixnullable_kvsfallback vllm-project/vllm#16837)v1/audio/transcriptionsendpoint ([Frontend] Add sampling params tov1/audio/transcriptionsendpoint vllm-project/vllm#16591)CacheConfig(Improve configs -CacheConfigvllm-project/vllm#16835)_update_statesfor GPU model runner ([Perf] Optimize_update_statesfor GPU model runner vllm-project/vllm#16910)SpeculativeConfig(Improve configs -SpeculativeConfigvllm-project/vllm#16971)collective_rpctimeout ([BugFix] Remove default multiproc executorcollective_rpctimeout vllm-project/vllm#17000)tests/kernels/based on kernel type (Categorizetests/kernels/based on kernel type vllm-project/vllm#16799)pidpassed tokill_process_treeisintformypy(Ensure thatpidpassed tokill_process_treeisintformypyvllm-project/vllm#17051)CacheConfig.block_sizeshould always beintwhen used (CacheConfig.block_sizeshould always beintwhen used vllm-project/vllm#17052)@propertyand private field fordata_parallel_rank_local(Use@propertyand private field fordata_parallel_rank_localvllm-project/vllm#17053)TokenizerGroup(SimplifyTokenizerGroupvllm-project/vllm#16790)LoRAModelRunnerMixin(Improve static type checking inLoRAModelRunnerMixinvllm-project/vllm#17104)tool-callinggithub label ([CI] Add automation for thetool-callinggithub label vllm-project/vllm#17118):markdownhelp:toEngineArgsdocs so markdown docstrings render properly (Add:markdownhelp:toEngineArgsdocs so markdown docstrings render properly vllm-project/vllm#17124)LoRAConfig+PromptAdapterConfig(Improve configs -LoRAConfig+PromptAdapterConfigvllm-project/vllm#16980)SchedulerConfigargs into scheduler config group inEngineArgs(Move missedSchedulerConfigargs into scheduler config group inEngineArgsvllm-project/vllm#17131)get_text_config()instead of checking fortext_config(Use Transformers helperget_text_config()instead of checking fortext_configvllm-project/vllm#17105)LLM.chat()tokenization ([BugFix][Frontend] FixLLM.chat()tokenization vllm-project/vllm#16081)-nin multi-image example ([Bugfix] Fix missing int type for-nin multi-image example vllm-project/vllm#17223)structural_tagsupport using xgrammar ([V1] Addstructural_tagsupport using xgrammar vllm-project/vllm#17085)vllm_flash_attnduring development mode ([Chore] added stubs forvllm_flash_attnduring development mode vllm-project/vllm#17228)skip_tokenizer_initwithnum_scheduler_steps([Bugfix] fix error due to an uninitialized tokenizer when usingskip_tokenizer_initwithnum_scheduler_stepsvllm-project/vllm#9276)stop_token_idscontents ([Misc] Validatestop_token_idscontents vllm-project/vllm#17268)PromptAdapterConfig(Add missing class docstring forPromptAdapterConfigvllm-project/vllm#17302)get_language_modelto new MLLMs ([Bugfix] Add missingget_language_modelto new MLLMs vllm-project/vllm#17300)platforms/interface.py([Misc] Minor typo/grammar inplatforms/interface.pyvllm-project/vllm#17307)compressed-tensorsquant method consistent across vLLM (Make name ofcompressed-tensorsquant method consistent across vLLM vllm-project/vllm#17255)process_weights_after_loading. ([Bugfix] Fix moe weight losing all extra attrs afterprocess_weights_after_loading. vllm-project/vllm#16854)