2424# each worker's `__init__` function.
2525#
2626# Then in each kind of patch, there are three folders:
27- # - patch_0_8_4 : contains the patches applied when vllm version is 0.8.4 .
27+ # - patch_0_8_5 : contains the patches applied when vllm version is 0.8.5 .
2828# - patch_main: contains the patches applied when vllm version is main branch.
29- # - patch_common: contains the patches applied in both 0.8.4 and main branch.
29+ # - patch_common: contains the patches applied in both 0.8.5 and main branch.
3030#
3131# In the future, with the vllm version upgrade, the new patch folder such as
3232# patch_0_8_5, patch_0_8_6, etc. will be added to manage the patch for different
4242# --------------------------------
4343# * Platform Patch:
4444# =================
45- # ** File: platform/patch_0_8_4/patch_config.py**
46- # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
47- # 1. `vllm.config.ModelConfig.__init__()`
48- # Why:
49- # It is hard coded for sleep mode to support cuda platform only
50- # How:
51- # Using a new method to check if sleep mode is available
52- # Related PR (if no, explain why): 1. refused by vllm. 2. vllm doesn't support 3. prepare to submit....
53- # https://github.com/vllm-project/vllm/pull/16562
54- # Future Plan:
55- # This patch is only used for 084 and can't be revert. just keep as it is.
56- #
5745# ** File: platform/patch_common/patch_distributed.py**
5846# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
5947# 1. `vllm.distributed.parallel_state.destroy_model_parallel()`
10088#
10189# * Worker Patch:
10290# ===============
103- # ** File: worker/patch_0_8_4/patch_metrics.py **
104- # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
105- # 1. `vllm.spec_decode.metrics.AsyncMetricsCollector.init_tensors` and
106- # `vllm.spec_decode.metrics.AsyncMetricsCollector._copy_rejsample_metrics_async`
107- # Why:
108- # There are cuda hard code (torch.cuda.Stream) in `AsyncMetricsCollector.init_tensors` and
109- # `AsyncMetricsCollector._copy_rejsample_metrics_async`
110- # How:
111- # Replace it with the corresponding npu method
112- # Related PR (if no, explain why): 1. refused by vllm. 2. vllm doesn't support 3. prepare to submit....
113- # https://github.com/vllm-project/vllm/pull/14411
114- # Future Plan:
115- # Revert it when the related pr is merged in vllm.
116- #
117- # ** File: worker/patch_0_8_4/patch_spec_decode_worker.py **
118- # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
119- # 1. `vllm.spec_decode.spec_decode_worker.SpecDecodeWorker._configure_model_sampler_for_spec_decode`
120- # Why:
121- # vLLM `Remove Sampler from Model Code` so vllm-ascend needs a patch to run in v0.8.4.
122- # How:
123- # Use vLLM 0.8.4 method tp patch it.
124- # Related PR (if no, explain why): 1. refused by vllm. 2. vllm doesn't support 3. prepare to submit....
125- # - https://github.com/vllm-project/vllm/pull/17084
126- # - https://github.com/vllm-project/vllm-ascend/pull/636
127- # Future Plan:
128- # Follow v0.8.4 version strategy.
129- #
13091# ** File: worker/patch_common/patch_metrics.py **
13192# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
13293# 1. `vllm.spec_decode.metrics.AsyncMetricsCollector.maybe_collect_rejsample_metrics`
197158# - https://github.com/vllm-project/vllm-ascend/pull/395
198159# Future Plan:
199160# Revert it when the related pr is merged in vllm and vllm-ascend.
200- #
201- # ** File: worker/patch_0_8_4/patch_tritonplaceholder.py **
202- # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
203- # 1. `triton` Module
204- # Why:
205- # Triton is not supported on npu currently, importing triton will break vllm-ascend
206- # How:
207- # ditto
208- # Related PR (if no, explain why): 1. refused by vllm. 2. vllm doesn't support 3. prepare to submit....
209- # TritonPlaceholder is only available in vllm>0.8.4
210- # Future Plan:
211- # Revert it when branch main doesn't maintain v0.8.4.
161+ #
0 commit comments