[XPU] support XPU VL model inference #4030

cqulilujia · 2025-09-09T09:59:57Z

XPU平台支持Ernie4.5T-VL模型

支持424B全量模型int8量化八卡部署
支持28B轻量模型int8量化单卡部署
支持28B轻量模型int8量化四卡部署
支持prefill insert v0版本和v1版本
（已废弃此功能）XPU显存管理统一使用paddle.device.xpu.**接口，paddle需更新到 PR73189之后
可使用stride tensor机制（即无需设置FLAGS_use_stride_kernel=0），paddle需更新到PR74819之后
paddle框架修复insert v1 bug，需更新到PR75017之后

paddle-bot · 2025-09-09T10:00:02Z

Thanks for your contribution!

custom_ops/xpu_ops/src/ops/block_attn.cc

mayang002 · 2025-09-09T12:08:50Z

custom_ops/xpu_ops/src/plugin/src/wrapper/text_image_gather_scatter.cpp

+    int64_t hidden_size,
+    bool is_scatter);
+}  // namespace plugin
+}  // namespace xpu2


xpu2 --> xpu3

hong19860320 · 2025-09-10T08:39:56Z

fastdeploy/worker/xpu_worker.py

-        used_memory = xpu_get_used_global_memory(int(self.device_ids[self.local_rank]))
-        available_kv_cache_memory = total_available_memory - used_memory
+        # 3. Statistical memory information
+        paddle_reserved_mem_after_run = paddle.device.xpu.max_memory_reserved(local_rank)


local_rank 不是 device_id 吧？是不是应该通过 self.device_ids[self.local_rank] 获取？

对于 4567卡场景确实会有问题，我改一下

对，这个问题上次出过

ming1753 · 2025-09-12T03:08:45Z

fastdeploy/engine/sched/resource_manager_v1.py

        # TODO: set condition to new _get_num_new_tokens
        num_new_tokens = request.need_prefill_tokens - request.num_computed_tokens
        num_new_tokens = min(num_new_tokens, token_budget)
+        request.with_image = False


166行已经在多模下默认为False了，这里重复了

这里如果是文本请求，with_image不是request的成员变量，在模型里访问会挂掉，之前遗留的一个bug

ming1753 · 2025-09-12T03:14:04Z

fastdeploy/model_executor/models/ernie4_5_vl/ernie4_5_vl_moe.py

+    from fastdeploy.model_executor.ops.xpu import (
+        text_image_gather_scatter,
+        text_image_index_out,
+    )


这里最好参考其他算子做下封装，定义一个op类，在类内部区分不同硬件的实现

ming1753 · 2025-09-12T03:16:38Z

fastdeploy/model_executor/models/ernie4_5_vl/ernie4_5_vl_moe.py

        image_features: Optional[paddle.Tensor],
        forward_meta: ForwardMeta,
    ):
+        vl_moe_meta = self.ernie.prepare_vl_moe_meta(ids_remove_padding=ids_remove_padding)


这里的改动建议让何泓域review下，可能会对cuda_graph的支持有影响

hong19860320 · 2025-09-15T04:20:52Z

fastdeploy/worker/xpu_worker.py

        )

-        assert self.device_ids[self.local_rank] is not None, f"device_id is none for rank {self.local_rank}"
-        assert (


这个部分的 assert 是不是应该加上

hong19860320

LGTM

aquagull · 2025-09-17T06:25:53Z

fastdeploy/model_executor/models/ernie4_5_vl/ernie4_5_vl_moe.py

        ids_remove_padding: paddle.Tensor,
        image_token_num: int,
        image_features: Optional[paddle.Tensor] = None,
+        vl_moe_meta: Optional[VLMoEMeta] = None,


这个我看没有用到，可以删掉~

aquagull · 2025-09-17T06:26:49Z

LGTM for vl model cudagraph

cqulilujia · 2025-09-18T08:52:27Z

/re-run all-failed

hong19860320

LGTM

hong19860320 · 2025-09-23T02:31:30Z

@cqulilujia 请确保 paddle 相关PR 合入到 3.2.1，否则会影响后续 FD 的发版。

hong19860320

LGTM

paddle-bot bot added the XPU label Sep 9, 2025

mayang002 reviewed Sep 9, 2025

View reviewed changes

custom_ops/xpu_ops/src/ops/block_attn.cc Outdated Show resolved Hide resolved

mayang002 reviewed Sep 9, 2025

View reviewed changes

hong19860320 reviewed Sep 10, 2025

View reviewed changes

cqulilujia force-pushed the xpu-vl branch 2 times, most recently from 2675a6d to d0e3f1f Compare September 11, 2025 12:47

ming1753 reviewed Sep 12, 2025

View reviewed changes

hong19860320 reviewed Sep 15, 2025

View reviewed changes

cqulilujia force-pushed the xpu-vl branch 5 times, most recently from 4a5e580 to 0471e4f Compare September 16, 2025 06:22

hong19860320 previously approved these changes Sep 17, 2025

View reviewed changes

aquagull reviewed Sep 17, 2025

View reviewed changes

cqulilujia dismissed hong19860320’s stale review via eb12eaf September 17, 2025 06:49

cqulilujia force-pushed the xpu-vl branch from 0471e4f to eb12eaf Compare September 17, 2025 06:49

cqulilujia force-pushed the xpu-vl branch 4 times, most recently from eac75f3 to 4d71517 Compare September 22, 2025 11:53

hong19860320 previously approved these changes Sep 23, 2025

View reviewed changes

cqulilujia added 3 commits September 25, 2025 11:04

[XPU] support XPU VL model inference

55faa30

fix image op import and device check

d58f42a

rebase develop

4a88598

fix perf

3f08bc3

cqulilujia dismissed hong19860320’s stale review via 3f08bc3 September 25, 2025 03:06

cqulilujia force-pushed the xpu-vl branch from 4d71517 to 3f08bc3 Compare September 25, 2025 03:06

hong19860320 approved these changes Sep 25, 2025

View reviewed changes

heavengate approved these changes Sep 25, 2025

View reviewed changes

DDDivano approved these changes Sep 25, 2025

View reviewed changes

phlrain approved these changes Sep 25, 2025

View reviewed changes

Jiang-Jia-Jun approved these changes Sep 25, 2025

View reviewed changes

Jiang-Jia-Jun merged commit 87179cb into PaddlePaddle:develop Sep 25, 2025
13 of 17 checks passed

[XPU] support XPU VL model inference #4030

[XPU] support XPU VL model inference #4030

Uh oh!

Conversation

cqulilujia commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paddle-bot bot commented Sep 9, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cqulilujia Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hong19860320 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aquagull commented Sep 17, 2025

Uh oh!

cqulilujia commented Sep 18, 2025

Uh oh!

hong19860320 left a comment

Choose a reason for hiding this comment

Uh oh!

hong19860320 commented Sep 23, 2025

Uh oh!

hong19860320 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

cqulilujia commented Sep 9, 2025 •

edited

Loading

cqulilujia Sep 15, 2025 •

edited

Loading