[Bugfix] Fix triton placeholder patch period #704

MengqingCao · 2025-04-28T08:45:25Z

Fix triton placeholder patch period

Test Script on v0.8.4

from vllm import LLM, SamplingParams

prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

# Create a sampling params object.
sampling_params = SamplingParams(max_tokens=100, temperature=0.0)
# Create an LLM.
llm = LLM(model="/home/xxx/cache/modelscope/models/OpenBMB/MiniCPM-2B-128k",
          trust_remote_code=True,
          max_model_len=1024)

# Generate texts from the prompts.
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

Result

INFO 04-28 11:10:48 [__init__.py:30] Available plugins for group vllm.platform_plugins:
INFO 04-28 11:10:48 [__init__.py:32] name=ascend, value=vllm_ascend:register
INFO 04-28 11:10:48 [__init__.py:34] all available plugins for group vllm.platform_plugins will be loaded.
INFO 04-28 11:10:48 [__init__.py:36] set environment variable VLLM_PLUGINS to control which plugins to load.
INFO 04-28 11:10:48 [__init__.py:44] plugin ascend loaded.
INFO 04-28 11:10:48 [__init__.py:230] Platform plugin ascend is activated
INFO 04-28 11:10:51 [__init__.py:30] Available plugins for group vllm.general_plugins:
INFO 04-28 11:10:51 [__init__.py:32] name=ascend_enhanced_model, value=vllm_ascend:register_model
INFO 04-28 11:10:51 [__init__.py:34] all available plugins for group vllm.general_plugins will be loaded.
INFO 04-28 11:10:51 [__init__.py:36] set environment variable VLLM_PLUGINS to control which plugins to load.
INFO 04-28 11:10:51 [__init__.py:44] plugin ascend_enhanced_model loaded.
INFO 04-28 11:10:51 [patch_tritonplaceholder.py:33] Triton not installed or not compatible; certain GPU-related functions will not be available.
WARNING 04-28 11:10:51 [patch_tritonplaceholder.py:46] Triton is not installed. Using dummy decorators. Install it via `pip install triton` to enable kernel compilation.
INFO 04-28 11:10:51 [patch_tritonplaceholder.py:71] Triton module has been replaced with a placeholder.
WARNING 04-28 11:10:51 [_custom_ops.py:21] Failed to import from vllm._C with ImportError('libnuma.so.1: cannot open shared object file: No such file or directory')
WARNING 04-28 11:10:53 [registry.py:380] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP.
WARNING 04-28 11:10:53 [registry.py:380] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:CustomQwen2VLForConditionalGeneration.
WARNING 04-28 11:10:53 [registry.py:380] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM.
WARNING 04-28 11:10:53 [registry.py:380] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM.
INFO 04-28 11:10:53 [config.py:209] Replacing legacy 'type' key with 'rope_type'
INFO 04-28 11:11:09 [config.py:689] This model supports multiple tasks: {'reward', 'classify', 'embed', 'generate', 'score'}. Defaulting to 'generate'.
INFO 04-28 11:11:09 [arg_utils.py:1742] npu is experimental on VLLM_USE_V1=1. Falling back to V0 Engine.
INFO 04-28 11:11:09 [config.py:1747] Disabled the custom all-reduce kernel because it is not supported on current platform.
WARNING 04-28 11:11:09 [platform.py:129] NPU compilation support pending. Will be available in future CANN and torch_npu releases. Using default: enforce_eager=True
INFO 04-28 11:11:09 [platform.py:134] Compilation disabled, using eager mode by default
INFO 04-28 11:11:09 [llm_engine.py:243] Initializing a V0 LLM engine (v0.8.4) with config: model='/home/cmq/cache/modelscope/models/OpenBMB/MiniCPM-2B-128k', speculative_config=None, tokenizer='/home/xxx/cache/modelscope/models/OpenBMB/MiniCPM-2B-128k', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=1024, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto,  device_config=npu, decoding_config=DecodingConfig(guided_decoding_backend='auto', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=None, served_model_name=/home/cmq/cache/modelscope/models/OpenBMB/MiniCPM-2B-128k, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=None, chunked_prefill_enabled=False, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"level":0,"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=False, 
INFO 04-28 11:11:10 [config.py:209] Replacing legacy 'type' key with 'rope_type'
WARNING 04-28 11:11:10 [utils.py:2444] Methods add_prompt_adapter,cache_config,compilation_config,current_platform,list_prompt_adapters,load_config,pin_prompt_adapter,remove_prompt_adapter not implemented in <vllm_ascend.worker.worker.NPUWorker object at 0xfffcf75131f0>
INFO 04-28 11:11:11 [parallel_state.py:959] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0
INFO 04-28 11:11:11 [model_runner.py:950] Starting to load model /home/cmq/cache/modelscope/models/OpenBMB/MiniCPM-2B-128k...
Loading pt checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
Loading pt checkpoint shards: 100% Completed | 1/1 [00:06<00:00,  6.16s/it]
Loading pt checkpoint shards: 100% Completed | 1/1 [00:06<00:00,  6.16s/it]

INFO 04-28 11:11:18 [loader.py:458] Loading weights took 6.16 seconds
INFO 04-28 11:11:19 [model_runner.py:955] Loading model weights took 5.6661 GB
INFO 04-28 11:11:34 [executor_base.py:112] # npu blocks: 1066, # CPU blocks: 91
INFO 04-28 11:11:34 [executor_base.py:117] Maximum concurrency for 1024 tokens per request: 133.25x
INFO 04-28 11:11:34 [llm_engine.py:449] init engine (profile, create kv cache, warmup model) took 15.71 seconds
Processed prompts: 100%|█████████████████████████████████████████████████████████████████| 4/4 [00:05<00:00,  1.29s/it, est. speed input: 5.04 toks/s, output: 77.48 toks/s]
Prompt: 'Hello, my name is', Generated text: " John and I am a 20 year old student at the University of California, Los Angeles. I am a very passionate and dedicated individual who is always willing to help others. I have been a tutor for over 5 years and have tutored students in a variety of subjects including math, science, and English. I have also been a teacher's assistant for 2 years and have taught students in grades 6-12. I have a passion for teaching and helping others learn."
Prompt: 'The president of the United States is', Generated text: " the head of state and head of government of the United States. The president is the commander-in-chief of the military and is responsible for the overall leadership of the country. The president is elected by the people and serves a four-year term. The president is also the head of the executive branch of the government and is responsible for making important decisions and implementing policies.\n\n- The president's role in the government\n  - The president is the head of the executive branch of the"
Prompt: 'The capital of France is', Generated text: ' Paris.\nParis is the capital of France.\nParis is the capital of France.\nThe capital of France is Paris. The capital of France is Paris.\nThe capital of France is Paris. The capital of France is Paris. The capital of France is Paris.\nThe capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital'
Prompt: 'The future of AI is', Generated text: " bright, but it's not without its challenges.\nArtificial intelligence (AI) is a rapidly evolving field that has the potential to transform the way we live, work, and interact with technology. From self-driving cars to virtual assistants, AI is already making a significant impact on our daily lives. However, as AI continues to advance, there are also concerns about its potential to disrupt the job market and create new challenges for society.\nOne of the biggest challenges of AI is its potential"

Signed-off-by: MengqingCao <cmq0113@163.com>

### What this PR does / why we need it? Re-patch TritonPlaceholder on main to make CI happy - Add triton patch back until vllm-project/vllm#17446 resolved - Move patch_main before patch_common to resolve minicpm triton import issue - Add `0.8.5` and `0.8.5.post1` to make patch work on 0.8.5 all versions Related: - #704 - #690 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? All CI passed include main Signed-off-by: Yikun Jiang <yikunkero@gmail.com>

Fix triton placeholder patch period Signed-off-by: MengqingCao <cmq0113@163.com>

### What this PR does / why we need it? Re-patch TritonPlaceholder on main to make CI happy - Add triton patch back until vllm-project/vllm#17446 resolved - Move patch_main before patch_common to resolve minicpm triton import issue - Add `0.8.5` and `0.8.5.post1` to make patch work on 0.8.5 all versions Related: - vllm-project#704 - vllm-project#690 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? All CI passed include main Signed-off-by: Yikun Jiang <yikunkero@gmail.com>

Fix triton placeholder patch period Signed-off-by: MengqingCao <cmq0113@163.com>

### What this PR does / why we need it? Re-patch TritonPlaceholder on main to make CI happy - Add triton patch back until vllm-project/vllm#17446 resolved - Move patch_main before patch_common to resolve minicpm triton import issue - Add `0.8.5` and `0.8.5.post1` to make patch work on 0.8.5 all versions Related: - vllm-project#704 - vllm-project#690 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? All CI passed include main Signed-off-by: Yikun Jiang <yikunkero@gmail.com>

[Bugfix] fix triton placeholder

97026cf

Signed-off-by: MengqingCao <cmq0113@163.com>

github-actions bot added the module:core label Apr 28, 2025

wangxiyuan approved these changes Apr 28, 2025

View reviewed changes

wangxiyuan merged commit be9e3e8 into vllm-project:main Apr 28, 2025
13 of 16 checks passed

Yikun mentioned this pull request May 5, 2025

Re-patch TritonPlaceholder on main to make CI happy #753

Merged

MengqingCao deleted the tritonpatch branch May 6, 2025 02:26

chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Oct 16, 2025

[Bugfix] Fix triton placeholder patch period (vllm-project#704)

7bf931d

Fix triton placeholder patch period Signed-off-by: MengqingCao <cmq0113@163.com>

Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025

[Bugfix] Fix triton placeholder patch period (vllm-project#704)

aa05465

Fix triton placeholder patch period Signed-off-by: MengqingCao <cmq0113@163.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Fix triton placeholder patch period #704

[Bugfix] Fix triton placeholder patch period #704

Uh oh!

MengqingCao commented Apr 28, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Bugfix] Fix triton placeholder patch period #704

[Bugfix] Fix triton placeholder patch period #704

Uh oh!

Conversation

MengqingCao commented Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MengqingCao commented Apr 28, 2025 •

edited

Loading