Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Pooling NCHW Kernel #7412

Closed
MARD1NO opened this issue Jan 30, 2022 · 3 comments · Fixed by #7426
Closed

Optimize Pooling NCHW Kernel #7412

MARD1NO opened this issue Jan 30, 2022 · 3 comments · Fixed by #7426

Comments

@MARD1NO
Copy link
Contributor

MARD1NO commented Jan 30, 2022

No description provided.

@MARD1NO
Copy link
Contributor Author

MARD1NO commented Feb 7, 2022

问题定位到是使用int64_t作为索引计算,其中涉及到大量的除法取余,解决方案是dispatch,根据elem_cnt来分发到int32/int64的分支

@simonJJJ
Copy link
Contributor

simonJJJ commented Feb 7, 2022

感觉大部分kernel都不会用到int64_t索引?往往都是CUDA_1D_KERNEL_LOOP里的int32_t的索引来做一系列推导

@MARD1NO
Copy link
Contributor Author

MARD1NO commented Feb 9, 2022

感觉大部分kernel都不会用到int64_t索引?往往都是CUDA_1D_KERNEL_LOOP里的int32_t的索引来做一系列推导

嗯没特殊情况就直接int32吧

@MARD1NO MARD1NO linked a pull request Feb 9, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants