Skip to content

Conversation

@LCStayingdullCircuit
Copy link
Contributor

@LCStayingdullCircuit LCStayingdullCircuit commented Jul 31, 2025

PR Category

Execute Infrastructure

PR Types

Bug fixes

Description

本次 PR 主要解决了 fused_rotary_position_embedding 函数中出现的 CUDA error 700 (illegal address) 问题。

问题根源:
该错误是由于在计算时,代码默认使用了 query (q) 的 batch_size 作为 key (k) 和 value (v) 张量的 batch_size_stride。当 qbatch_size 大于 kvbatch_size 时,会导致显存的非法地址访问,从而引发 CUDA 错误。

此问题并非仅限于大Tensor的场景,以下测例同样可以复现该错误:

paddle.incubate.nn.functional.fused_rotary_position_embedding(Tensor([1682, 8, 2, 16],"float32"), Tensor([168, 8, 2, 16],"float32"), Tensor([168, 8, 2, 16],"float32"), Tensor([1, 8, 1, 16],"float32"), Tensor([1, 8, 1, 16],"float32"), position_ids=None, use_neox_rotary_style=True, time_major=False, )

解决方案:
限制q的batch_size不能超过k,v的batch_size大小。

@paddle-bot
Copy link

paddle-bot bot commented Jul 31, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Comment on lines 82 to 86
common::errors::InvalidArgument("The batch_size of q (%d) must be less "
"than or equal to k's (%d) to "
"prevent out-of-bounds memory access.",
batch_size,
k_batch_size));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这样限制的目的,到底是因为我们算子实现不完备,所以不得不进行限制,还是从算法原理上就不允许超过?

Copy link
Contributor Author

@LCStayingdullCircuit LCStayingdullCircuit Jul 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我们的文档是没有详细描述的,我自己了解以后的判断是k和v的batch_size小于q的batch_size是不合理的,标准的应该是严格相等,这里我选择了保留大于的情况是因为q_batch_size < k/v_batch_size的测例可以pass,当前改法应该是改动最小的修改方法。
对于q_batch_size > k/v_batch_size这种情况,应该需要类似于广播机制这种额外的处理。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

那这个报错信息就不够清晰,应该报错这是违背定义的情况,让用户知道自己写错了;而不是为了防止越界,这样变成是好像是我们错了

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

了解了

lshpku
lshpku previously approved these changes Aug 1, 2025
@LCStayingdullCircuit
Copy link
Contributor Author

/re-run all-failed

@lshpku lshpku merged commit 8e5cba3 into PaddlePaddle:develop Aug 2, 2025
70 of 71 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants