Skip to content

Conversation

@cqulilujia
Copy link
Contributor

@cqulilujia cqulilujia commented Sep 9, 2025

XPU平台支持Ernie4.5T-VL模型

  1. 支持424B全量模型int8量化八卡部署
  2. 支持28B轻量模型int8量化单卡部署
  3. 支持28B轻量模型int8量化四卡部署
  4. 支持prefill insert v0版本和v1版本
  5. (已废弃此功能)XPU显存管理统一使用paddle.device.xpu.**接口,paddle需更新到 PR73189之后
  6. 可使用stride tensor机制(即无需设置FLAGS_use_stride_kernel=0),paddle需更新到PR74819之后
  7. paddle框架修复insert v1 bug,需更新到PR75017之后

@paddle-bot
Copy link

paddle-bot bot commented Sep 9, 2025

Thanks for your contribution!

@paddle-bot paddle-bot bot added the XPU label Sep 9, 2025
int64_t hidden_size,
bool is_scatter);
} // namespace plugin
} // namespace xpu2

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xpu2 --> xpu3

used_memory = xpu_get_used_global_memory(int(self.device_ids[self.local_rank]))
available_kv_cache_memory = total_available_memory - used_memory
# 3. Statistical memory information
paddle_reserved_mem_after_run = paddle.device.xpu.max_memory_reserved(local_rank)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local_rank 不是 device_id 吧?是不是应该通过 self.device_ids[self.local_rank] 获取?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

对于 4567卡场景确实会有问题,我改一下

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

对,这个问题上次出过

@cqulilujia cqulilujia force-pushed the xpu-vl branch 2 times, most recently from 2675a6d to d0e3f1f Compare September 11, 2025 12:47
# TODO: set condition to new _get_num_new_tokens
num_new_tokens = request.need_prefill_tokens - request.num_computed_tokens
num_new_tokens = min(num_new_tokens, token_budget)
request.with_image = False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

166行已经在多模下默认为False了,这里重复了

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里如果是文本请求,with_image不是request的成员变量,在模型里访问会挂掉,之前遗留的一个bug

from fastdeploy.model_executor.ops.xpu import (
text_image_gather_scatter,
text_image_index_out,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里最好参考其他算子做下封装,定义一个op类,在类内部区分不同硬件的实现

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

image_features: Optional[paddle.Tensor],
forward_meta: ForwardMeta,
):
vl_moe_meta = self.ernie.prepare_vl_moe_meta(ids_remove_padding=ids_remove_padding)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的改动建议让何泓域review下,可能会对cuda_graph的支持有影响

)

assert self.device_ids[self.local_rank] is not None, f"device_id is none for rank {self.local_rank}"
assert (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个部分的 assert 是不是应该加上

Copy link
Contributor Author

@cqulilujia cqulilujia Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已改

@cqulilujia cqulilujia force-pushed the xpu-vl branch 5 times, most recently from 4a5e580 to 0471e4f Compare September 16, 2025 06:22
hong19860320
hong19860320 previously approved these changes Sep 17, 2025
Copy link
Collaborator

@hong19860320 hong19860320 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

ids_remove_padding: paddle.Tensor,
image_token_num: int,
image_features: Optional[paddle.Tensor] = None,
vl_moe_meta: Optional[VLMoEMeta] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个我看没有用到,可以删掉~

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@aquagull
Copy link
Contributor

LGTM for vl model cudagraph

@cqulilujia
Copy link
Contributor Author

/re-run all-failed

@cqulilujia cqulilujia force-pushed the xpu-vl branch 4 times, most recently from eac75f3 to 4d71517 Compare September 22, 2025 11:53
hong19860320
hong19860320 previously approved these changes Sep 23, 2025
Copy link
Collaborator

@hong19860320 hong19860320 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hong19860320
Copy link
Collaborator

@cqulilujia 请确保 paddle 相关PR 合入到 3.2.1,否则会影响后续 FD 的发版。

Copy link
Collaborator

@hong19860320 hong19860320 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Jiang-Jia-Jun Jiang-Jia-Jun merged commit 87179cb into PaddlePaddle:develop Sep 25, 2025
13 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants