slice large tensor for cudnn_softmax #43681

zhangting2020 · 2022-06-20T11:18:19Z

PR types

Bug fixes

PR changes

OPs

Describe

slice large tensor for cudnn_softmax

背景

cuDNN在tensor元素超过2G时，会出现CUDNN_STATUS_NOT_SUPPORT的错误。本PR在输入size超过2G时，将输入切片，调用多次kernel分别处理切片数据。

测试案例

import paddle
import paddle.nn.functional as F
import numpy as np

np.random.seed(1234)
x_np = np.random.random((2, 16, 8481, 8481)).astype('float32')
x = paddle.to_tensor(x_np)
out = F.softmax(x)

性能

paddle

torch

paddle-bot-old · 2022-06-20T11:19:00Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Xreki · 2022-06-20T11:37:11Z

paddle/phi/kernels/gpudnn/softmax_gpudnn.h

-                                               dim,
-                                               dim,
-                                               dim_log2);
+      int64_t remaining = N;


如果只是支持了cudnn实现，那这些逻辑是不是直接实现在SoftmaxForward/BackwardCudnnKernel函数里面更好？避免一个函数太长了吧。

在最外层的softmax接口里去掉了这段，切片的逻辑封装了一下。

Xreki · 2022-06-20T11:37:52Z

paddle/phi/kernels/gpudnn/softmax_gpudnn.h

-                                              dim_log2);
+      int64_t remaining = N;
+      auto* x_data = x.data<T>();
+      int64_t batch_size = INT_MAX / dim;


INT_MAX -> std::numeric_limits<int32_t>::max()

Xreki

LGTM

…rge tensor for cudnn_softmax (#43719) [cherry pick] Support optional residual add in fused ops and slice large tensor for cudnn_softmax cherry-pick #43635 #43681 #43474

slice large tensor for cudnn_softmax

a1f15cb

Xreki reviewed Jun 20, 2022

View reviewed changes

polish code

a5f4a39

Xreki approved these changes Jun 21, 2022

View reviewed changes

Xreki merged commit bd5e97d into PaddlePaddle:develop Jun 21, 2022

zhangting2020 added a commit to zhangting2020/Paddle that referenced this pull request Jun 21, 2022

slice large tensor for cudnn_softmax (PaddlePaddle#43681)

8b90063

zhangting2020 mentioned this pull request Jun 21, 2022

[cherry pick] Support optional residual add in fused ops and slice large tensor for cudnn_softmax #43719

Merged

sneaxiy pushed a commit to sneaxiy/Paddle that referenced this pull request Jun 27, 2022

slice large tensor for cudnn_softmax (PaddlePaddle#43681)

a09082a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

slice large tensor for cudnn_softmax #43681

slice large tensor for cudnn_softmax #43681

zhangting2020 commented Jun 20, 2022 •

edited

Loading

paddle-bot-old bot commented Jun 20, 2022

Xreki Jun 20, 2022

zhangting2020 Jun 21, 2022

Xreki Jun 20, 2022

zhangting2020 Jun 21, 2022

Xreki left a comment

slice large tensor for cudnn_softmax #43681

slice large tensor for cudnn_softmax #43681

Conversation

zhangting2020 commented Jun 20, 2022 • edited Loading

PR types

PR changes

Describe

paddle-bot-old bot commented Jun 20, 2022

Xreki Jun 20, 2022

Choose a reason for hiding this comment

zhangting2020 Jun 21, 2022

Choose a reason for hiding this comment

Xreki Jun 20, 2022

Choose a reason for hiding this comment

zhangting2020 Jun 21, 2022

Choose a reason for hiding this comment

Xreki left a comment

Choose a reason for hiding this comment

zhangting2020 commented Jun 20, 2022 •

edited

Loading