fastforward on main #2

SolitaryThinker · 2025-02-12T02:07:08Z

No description provided.

sgl_moe_align_block_size is based on: sgl-project/sglang@ded9fcd moe_align_block_size is based on: sgl-project/sglang@ba5112f Signed-off-by: Yang Chen <yangche@fb.com>

When people use deepseek models, they find that they need to solve cv2 version conflict, see https://zhuanlan.zhihu.com/p/21064432691 . I added the check, and make all imports of `cv2` lazy. --------- Signed-off-by: youkaichao <youkaichao@gmail.com>

@kylesayrs

Thanks @kylesayrs for catching this!

…12570) Fix to AWQ quant loading of the new R1 model The new optimized MoE kernels for a large number of experts `moe_wn16` uses AWQ quant which requires the attention layers to be in 16bit The current merge has broken this, and the `get_quant_method` must return None for it to work correctly again --------- Signed-off-by: Srikanth Srinivas <srikanth@astrum.ai> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Beim <beim2015@outlook.com> Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: npanpaliya <nishidha.panpaliya@partner.ibm.com> Signed-off-by: Aleksandr Malyshev <maleksan@amd.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: simon-mo <xmo@berkeley.edu> Signed-off-by: Cody Yu <hao.yu.cody@gmail.com> Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: Ryan N <ryan.nguyen@centml.ai> Signed-off-by: Brian Dellabetta <bdellabe@redhat.com> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: Rahul Tuli <rahul@neuralmagic.com> Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: simon-mo <simon.mo@hey.com> Signed-off-by: Vicente Herrera <vicenteherrera@vicenteherrera.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com> Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Shawn Du <shawnd200@outlook.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Beim <805908499@qq.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: mgoin <michael@neuralmagic.com> Co-authored-by: simon-mo <xmo@berkeley.edu> Co-authored-by: Nishidha <nishidha.panpaliya@partner.ibm.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Aleksandr Malyshev <164964928+maleksan85@users.noreply.github.com> Co-authored-by: Aleksandr Malyshev <maleksan@amd.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: simon-mo <simon.mo@hey.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Zhuohan Li <zhuohan123@gmail.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Alexander Matveev <59768536+alexm-neuralmagic@users.noreply.github.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Kevin H. Luu <kevin@anyscale.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Ryan Nguyen <96593302+xpbowler@users.noreply.github.com> Co-authored-by: Brian Dellabetta <brian-dellabetta@users.noreply.github.com> Co-authored-by: fade_away <1028552010@qq.com> Co-authored-by: weilong.yu <weilong.yu@shopee.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Eldar Kurtic <eldarkurtic314@gmail.com> Co-authored-by: Rahul Tuli <rahul@neuralmagic.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Vicente Herrera <vicenteherrera@vicenteherrera.com> Co-authored-by: Jinzhen Lin <linjinzhen@hotmail.com> Co-authored-by: Shawn Du <shawnd200@outlook.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> Co-authored-by: youkaichao <youkaichao@gmail.com>

fixes problems like #12635 and #12636 and #12565 --------- Signed-off-by: youkaichao <youkaichao@gmail.com>

Signed-off-by: youkaichao <youkaichao@gmail.com>

@Isotr0py

# Adds support for `transformers` as a backend Following huggingface/transformers#35235, a bunch of models should already be supported, we are ramping up support for more models. Thanks @Isotr0py for the TP support, and @hmellor for his help as well! This includes: - `trust_remote_code=True` support: any model on the hub, if it implements attention the correct way can be natively supported!! - tensor parallel support --------- Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <41363108+Isotr0py@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Signed-off-by: Russell Bryant <rbryant@redhat.com>

…aled mm (#12696) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>

Signed-off-by: Russell Bryant <rbryant@redhat.com>

…fig (#12710) Signed-off-by: mgoin <michael@neuralmagic.com>

Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>

…essed Tensors configs (#12711)

Signed-off-by: Hongxia Yang <hongxia.yang@amd.com> Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com>

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

Signed-off-by: Michael Greenbaum <mgreenbaum@microsoft.com> Co-authored-by: Michael Greenbaum <mgreenbaum@microsoft.com>

…12722) Signed-off-by: imkero <kerorek@outlook.com>

Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

… reason (#12579) Signed-off-by: Mark McLoughlin <markmc@redhat.com>

…12676) Signed-off-by: simon-mo <xmo@berkeley.edu> Signed-off-by: Lucas Wilkinson <lcwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Co-authored-by: simon-mo <xmo@berkeley.edu>

Signed-off-by: Aviv Keshet <akeshet@scaledcognition.com>

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

…ck (#13008)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Signed-off-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com>

Signed-off-by: Farzad Abdolhosseini <farzad@fixie.ai>

…PU (#13022) Signed-off-by: kevin <kevin@anyscale.com>

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

Signed-off-by: simon-mo <simon.mo@hey.com>

Signed-off-by: Florian Greinacher <florian.greinacher@siemens.com>

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>

Signed-off-by: Russell Bryant <rbryant@redhat.com>

Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>

…13025) Signed-off-by: Ce Gao <cegao@tensorchord.ai>

Signed-off-by: Mengqing Cao <cmq0113@163.com>

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

Signed-off-by: YuhongGuo <yuhong.gyh@antgroup.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

Signed-off-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com>

Signed-off-by: Mark McLoughlin <markmc@redhat.com>

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

Signed-off-by: Hollow Man <hollowman@opensuse.org>

…sor parallel > 1 (#13023)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

…xgrammar (#12976) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

Signed-off-by: Russell Bryant <rbryant@redhat.com>

zhuohan123 and others added 30 commits February 2, 2025 19:19

[Doc] Deprecate Discord (#12668)

326fcc8

[Kernel] port sgl moe_align_block_size kernels (#12574)

95460fc

sgl_moe_align_block_size is based on: sgl-project/sglang@ded9fcd moe_align_block_size is based on: sgl-project/sglang@ba5112f Signed-off-by: Yang Chen <yangche@fb.com>

Properly check if all fused layers are in the list of targets (#12666)

c5932e5

Thanks @kylesayrs for catching this!

[cuda] manually import the correct pynvml module (#12679)

ad4a9dc

fixes problems like #12635 and #12636 and #12565 --------- Signed-off-by: youkaichao <youkaichao@gmail.com>

[ci/build] fix gh200 test (#12681)

1298a40

Signed-off-by: youkaichao <youkaichao@gmail.com>

[Misc] Fix improper placement of SPDX header in scripts (#12694)

33e0602

Signed-off-by: Russell Bryant <rbryant@redhat.com>

[Bugfix][Kernel] Fix per-token/per-channel quantization for Hopper sc…

c11de33

…aled mm (#12696) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

Squelch MLA warning for Compressed-Tensors Models (#12704)

6dd5e52

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

[Model] Add Deepseek V3 fp8_w8a8 configs for B200 (#12707)

4797dad

[MISC] Remove model input dumping when exception (#12582)

cf58b9c

Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>

[V1] Revert uncache_blocks and support recaching full blocks (#12415)

5095e96

Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>

[Core] Improve hash collision avoidance in prefix caching (#12621)

73b35cc

Signed-off-by: Russell Bryant <rbryant@redhat.com>

Support Pixtral-Large HF by using llava multimodal_projector_bias con…

5d98d56

…fig (#12710) Signed-off-by: mgoin <michael@neuralmagic.com>

[Doc] Replace ibm-fms with ibm-ai-platform (#12709)

bb392af

Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>

[Quant] Fix use_mla TypeError and support loading pure-sparsity Compr…

4896d0c

…essed Tensors configs (#12711)

[AMD][ROCm] Enable DeepSeek model on ROCm (#12662)

c36ac98

Signed-off-by: Hongxia Yang <hongxia.yang@amd.com> Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>

[Misc] Add BNB quantization for Whisper (#12381)

96b2362

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

[VLM] Merged multi-modal processor for InternVL-based models (#12553)

d1ca7df

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com>

[V1] Remove scheduling constraint on partial requests (#12674)

18a88fc

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

[VLM] merged multimodal processor and V1 support for idefics3 (#12660)

815079d

Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

[Bugfix] Fix loading of fine-tuned models based on Phi-3-Small (#12689)

6469038

Signed-off-by: Michael Greenbaum <mgreenbaum@microsoft.com> Co-authored-by: Michael Greenbaum <mgreenbaum@microsoft.com>

Avoid unnecessary multi-modal input data copy when len(batch) == 1 (#…

62467a8

…12722) Signed-off-by: imkero <kerorek@outlook.com>

[Build] update requirements of no-device for plugin usage (#12630)

649550f

Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>

[Bugfix] Fix CI failures for InternVL and Mantis models (#12728)

18016a5

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

[V1][Metrics] Add request_success_total counter, labelled with finish…

233df6f

… reason (#12579) Signed-off-by: Mark McLoughlin <markmc@redhat.com>

[Core] add and implement VLLM_LOGITS_PROCESSOR_THREADS (#12368)

b3a0d01

Signed-off-by: Aviv Keshet <akeshet@scaledcognition.com>

terrytangyuan and others added 29 commits February 10, 2025 06:09

[Doc] Add link to tool_choice tracking issue in tool_calling.md (#13003)

2431371

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

[misc] Add retries with exponential backoff for HF file existence che…

fde7126

…ck (#13008)

[Bugfix] Clean up and fix multi-modal processors (#13012)

51f0b5f

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Fix seed parameter behavior in vLLM (#13007)

2ae8890

Signed-off-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com>

[Model] Ultravox Model: Support v0.5 Release (#12912)

08b2d84

Signed-off-by: Farzad Abdolhosseini <farzad@fixie.ai>

[misc] Fix setup.py condition to avoid AMD from being mistaken with C…

91e8767

…PU (#13022) Signed-off-by: kevin <kevin@anyscale.com>

[V1][Minor] Move scheduler outputs to a separate file (#13062)

2ff4857

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

[Docs] Annouce Meta Meetup (#13065)

2c0f582

Signed-off-by: simon-mo <simon.mo@hey.com>

[Bugfix] Support missing tool parameters in mistral tokenizer (#12884)

cb080f3

Signed-off-by: Florian Greinacher <florian.greinacher@siemens.com>

[Benchmark] Add BurstGPT to benchmark_serving (#13063)

58047c6

Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>

[Core] Don't do platform detection at import time (#12933)

c320ca8

Signed-off-by: Russell Bryant <rbryant@redhat.com>

[Misc] LoRA - Refactor Punica ops tests (#12970)

78a141d

Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>

[Bugfix]: Reasoning output bug according to the chat template change (#…

fc6485d

…13025) Signed-off-by: Ce Gao <cegao@tensorchord.ai>

[V1][Metrics] Add GPU prefix cache hit rate % gauge (#12592)

41c5dd4

[executor] init local_rank as device index (#13027)

9cf4759

Signed-off-by: Mengqing Cao <cmq0113@163.com>

[ROCm] Using a more precise memory profiling (#12624)

7539bbc

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

[Build] Fix cuda link target of cumem_allocator in CPU env (#12863)

da31719

Signed-off-by: YuhongGuo <yuhong.gyh@antgroup.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>

[Platform] add pre_register_and_update function (#12432)

2e3b969

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

[Bugfix] fix flaky test (#13089)

110f59a

Signed-off-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com>

[V1][Metrics] Add several request timing histograms (#12644)

75e6e14

Signed-off-by: Mark McLoughlin <markmc@redhat.com>

Set torch_dtype in TransformersModel (#13088)

ad97763

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

[Misc] Fix typo at comments at metrics.py (#13024)

bf3e052

[Bugfix] Do not use resource module on Windows (#12858) (#13029)

21f5d50

[BugFix] Pop instead of del CUDA_VISIBLE_DEVICES (#12962)

6c4dbe2

Signed-off-by: Hollow Man <hollowman@opensuse.org>

Fix initializing GGUF weights for ColumnParallelLinear when using ten…

2b25b7d

…sor parallel > 1 (#13023)

[CI/Build][Bugfix] Fix CPU backend default threads num (#13077)

565c1ef

[Doc] Improve OpenVINO installation doc (#13102)

deb6c1c

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

[Bugfix] Guided decoding falls back to outlines when fails to import …

14ecab5

…xgrammar (#12976) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

[Misc] Move pre-commit suggestion back to the end (#13114)

72c2b68

Signed-off-by: Russell Bryant <rbryant@redhat.com>

SolitaryThinker closed this Feb 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fastforward on main #2

fastforward on main #2

SolitaryThinker commented Feb 12, 2025 •

edited by github-actions bot

Loading

fastforward on main #2

fastforward on main #2

Conversation

SolitaryThinker commented Feb 12, 2025 • edited by github-actions bot Loading

SolitaryThinker commented Feb 12, 2025 •

edited by github-actions bot

Loading