Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement SparseConv3d kernel #39784

Merged
merged 16 commits into from
Mar 3, 2022
Merged

Implement SparseConv3d kernel #39784

merged 16 commits into from
Mar 3, 2022

Conversation

zkh2016
Copy link
Contributor

@zkh2016 zkh2016 commented Feb 21, 2022

PR types

New features

PR changes

OPs

Describe

实现SparseConv3d的kenrel,算法采用Second,参考串行代码PR.

性能情况:与spconv的native实现对比gpu kernel耗时:v100上,对比单测,其中 X(2, 400,400, 15, 17), kernel(3, 3, 3, 17, 19) : fp32 1.3x加速比,fp16 1.4x加速比。

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

}
__syncthreads();
for (int i = threadIdx.x; i < kernel_size; i += blockDim.x) {
atomicAdd(&counter[i], counter_buf[i]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里为什么需要原子操作呢 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里分块进行计数,然后在atomicAdd到global中,后面考虑进一步优化。

0.7473, 0.5403, 0.5391, 0.0796, 0.4734, 0.9097, 0.1712, 0.6237, 0.8837};

std::vector<std::vector<int>> out_indices = {
// {0, 0, 0, 0},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议删除无用的注释

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


0.7473, 0.5403, 0.5391, 0.0796, 0.4734, 0.9097, 0.1712, 0.6237, 0.8837};

std::vector<std::vector<int>> out_indices = {// {0, 0, 0, 0},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议删除无用的注释,如下

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

namespace sparse {

// TODO(zhangkaihuo) replace this kernel with KP::InitWithDataIndex
__global__ void InitByIndexKernel(const int n, int* out1, int* out2) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

等这个代码迁移到phi下面后再进行复用。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// this kernel with phi::GatherCUDAKernel;
template <typename T, typename IndexT = int>
__global__ void GatherKernel(const T* params,
const IndexT* indices,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

加点注释

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@zkh2016 zkh2016 merged commit 6bf85ea into PaddlePaddle:develop Mar 3, 2022
@zkh2016 zkh2016 deleted the conv3d branch August 19, 2022 04:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants