-
Notifications
You must be signed in to change notification settings - Fork 658
[XPU] support XPU VL model inference #4030
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for your contribution! |
| int64_t hidden_size, | ||
| bool is_scatter); | ||
| } // namespace plugin | ||
| } // namespace xpu2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xpu2 --> xpu3
fastdeploy/worker/xpu_worker.py
Outdated
| used_memory = xpu_get_used_global_memory(int(self.device_ids[self.local_rank])) | ||
| available_kv_cache_memory = total_available_memory - used_memory | ||
| # 3. Statistical memory information | ||
| paddle_reserved_mem_after_run = paddle.device.xpu.max_memory_reserved(local_rank) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
local_rank 不是 device_id 吧?是不是应该通过 self.device_ids[self.local_rank] 获取?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
对于 4567卡场景确实会有问题,我改一下
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
对,这个问题上次出过
2675a6d to
d0e3f1f
Compare
| # TODO: set condition to new _get_num_new_tokens | ||
| num_new_tokens = request.need_prefill_tokens - request.num_computed_tokens | ||
| num_new_tokens = min(num_new_tokens, token_budget) | ||
| request.with_image = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
166行已经在多模下默认为False了,这里重复了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里如果是文本请求,with_image不是request的成员变量,在模型里访问会挂掉,之前遗留的一个bug
| from fastdeploy.model_executor.ops.xpu import ( | ||
| text_image_gather_scatter, | ||
| text_image_index_out, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里最好参考其他算子做下封装,定义一个op类,在类内部区分不同硬件的实现
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
| image_features: Optional[paddle.Tensor], | ||
| forward_meta: ForwardMeta, | ||
| ): | ||
| vl_moe_meta = self.ernie.prepare_vl_moe_meta(ids_remove_padding=ids_remove_padding) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的改动建议让何泓域review下,可能会对cuda_graph的支持有影响
| ) | ||
|
|
||
| assert self.device_ids[self.local_rank] is not None, f"device_id is none for rank {self.local_rank}" | ||
| assert ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个部分的 assert 是不是应该加上
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已改
4a5e580 to
0471e4f
Compare
hong19860320
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| ids_remove_padding: paddle.Tensor, | ||
| image_token_num: int, | ||
| image_features: Optional[paddle.Tensor] = None, | ||
| vl_moe_meta: Optional[VLMoEMeta] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个我看没有用到,可以删掉~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
LGTM for vl model cudagraph |
|
/re-run all-failed |
eac75f3 to
4d71517
Compare
hong19860320
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@cqulilujia 请确保 paddle 相关PR 合入到 3.2.1,否则会影响后续 FD 的发版。 |
hong19860320
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
XPU平台支持Ernie4.5T-VL模型