-
Notifications
You must be signed in to change notification settings - Fork 542
Description
Motivation.
Currently:
vLLM supports a variety of custom ops by.
- Triton ops: https://github.com/vllm-project/vllm/tree/main/vllm/lora/ops/triton_ops
- Torch native ops: https://github.com/vllm-project/vllm/tree/main/vllm/lora/ops/torch_ops
- Custom ops via torch bindings:
https://github.com/vllm-project/vllm/blob/cdc1fa12eb1ba4795d24e97dcffa2018668a9267/csrc/torch_bindings.cpp#L480 - 3rd party Lib: https://github.com/vllm-project/vllm/blob/cdc1fa12eb1ba4795d24e97dcffa2018668a9267/vllm/attention/backends/flashinfer.py#L12
vLLM Ascend current (v0.7.1rc1) supports torch native ops (with torch npu), the whole workflow like: vllm --> torch --> torch_npu --> atb ---> cann, but in this way:
- the devs should have to first implements the ops in atb
- then exposed to torch_npu
- upgrade
torch_nputo latest version as dependency. - finally, users can use the ops.
The lengthy version matching and upgrade process discourages developers from implementing the Ascend operator.
Proposed Change.
This RFC aims to smooth out the complicated process for ops development and make everything clear and simple. It can also help Ascend developers to create ops with better collaboration.
This RFC is going to start with exploring custom ops support via two ways:
- AscendCL (aclnn)
- AscendC
We propose to support custom ops via torch bindings to archive this goal.
Work items:
- Custom Ops framework for vLLM Ascend
- A real ops implements with CI passed
- A turtorial to help users understand how to develop the custom ops
Feedback Period.
now - 2025.03.06
CC List.
cc @wangxiyuan
cc @ganyi1996ppo
Any Other Things.
Ready in 2025 Q1 (vLLM Ascend first release v1.7.3)