[RFC] Initial support for Intel GPU #3725

jikunshang · 2024-03-29T07:31:35Z

Anything you want to discuss about vllm.

Progress

Cmake and build System for Intel XPU/SYCL
vLLM custom op implementation in SYCL source code
Integrate Intel XPU backend for basic model inference.
Support tensor parallelism with Ray for XPU backend
Integrate with IPEX optimized kernels(eg, page attention) for better performance
Quantization support

Target Intel GPU device and models

For Intel GPU device(in pytorch context, it's named xpu), we are trying to make vllm support Intel Xe architecture Graphic cards, including data center MAX GPUs(such as PVC 1550, PVC 1100), and client GPUs(such as Arc A770) natively.

For models, we will make sure vLLM + xpu works well with all existing vLLM supported models.

Design

Python API

Since Intel GPU have similar API (via IPEX) and behavior compare with CUDA device, we just introduce 2 new classes

XPUExecutor(extends ExecutorBase), have similar behavior with GpuExecutor, will dispatch to generate this executor class based on device type in LLMEngine and AsyncLLMEngine
XPUWorker( extends Worker Class) is used to initial the environment, most of code is shared from parent class.

Torch API

Meanwhile, we introduce torch_sdpa backend (reuse torch scaled_dot_production_attention from CPU backend support) to compute prompt tokens attention since xformers and flash_attn are not supported on Intel GPU.

Custom Op

vLLM implemented many efficient CUDA kernels and packaged as _C library by pybind. These kernels are ported to SYCL, with the same function signatures to replace the CUDA kernels directly. The SYCL custom kernel building procedure is integrated into vLLM CMake build system.

Background & References

Intel Max series GPU: https://www.intel.com/content/www/us/en/products/docs/processors/max-series/overview.html
You can try to get Intel GPU access via Intel Developer Cloud.
Intel extension for pytorch: https://github.com/intel/intel-extension-for-pytorch/tree/xpu-main

The text was updated successfully, but these errors were encountered:

github-actions · 2024-10-29T02:03:01Z

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

jikunshang added the misc label Mar 29, 2024

WoosukKwon added RFC and removed misc labels Mar 29, 2024

This was referenced Apr 3, 2024

[Hardware][Intel GPU]Add Initial Intel GPU(XPU) inference backend #3814

Merged

[Feature][WIP] Prototype of vLLM execution on Intel GPU devices via SYCL. #2378

Closed

github-actions bot added the stale label Oct 29, 2024

jikunshang closed this as completed Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Initial support for Intel GPU #3725

[RFC] Initial support for Intel GPU #3725

jikunshang commented Mar 29, 2024 •

edited

Loading

github-actions bot commented Oct 29, 2024

[RFC] Initial support for Intel GPU #3725

[RFC] Initial support for Intel GPU #3725

Comments

jikunshang commented Mar 29, 2024 • edited Loading

Anything you want to discuss about vllm.

Progress

Target Intel GPU device and models

Design

Python API

Torch API

Custom Op

Background & References

github-actions bot commented Oct 29, 2024

jikunshang commented Mar 29, 2024 •

edited

Loading