-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TOPI][OP] Use Thrust sort for argsort and topk #5097
Conversation
The current GPU sort implementation (odd-even transposition sort) is too slow when the number of elements is large. This PR introduces Thrust implementation of sort which is much faster. Note that this change requires CMake 3.8 or later since we have to use nvcc to compile a thrust code.
Is this cuda 10.1 compatible? I tried to build it on my machine with cuda 10.1 and it gives me the following errors:
|
@Laurawly Thanks, I've update the code and confirmed that it can be compiled on Ubuntu 18.04 and CUDA 10.1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Could you also add |
* [TOPI][OP] Use Thrust sort for argsort and topk The current GPU sort implementation (odd-even transposition sort) is too slow when the number of elements is large. This PR introduces Thrust implementation of sort which is much faster. Note that this change requires CMake 3.8 or later since we have to use nvcc to compile a thrust code. * cmake: make CUDA optional * allow .cu file to be into the repository * pylint fix and cleanup * require cmake 3.8 only when thrust is enabled * fix nvcc compiler error when passing -pthread * add missing include * add USE_THRUST option in config.cmake * retrigger CI * retrigger CI
* [TOPI][OP] Use Thrust sort for argsort and topk The current GPU sort implementation (odd-even transposition sort) is too slow when the number of elements is large. This PR introduces Thrust implementation of sort which is much faster. Note that this change requires CMake 3.8 or later since we have to use nvcc to compile a thrust code. * cmake: make CUDA optional * allow .cu file to be into the repository * pylint fix and cleanup * require cmake 3.8 only when thrust is enabled * fix nvcc compiler error when passing -pthread * add missing include * add USE_THRUST option in config.cmake * retrigger CI * retrigger CI
The current GPU sort implementation (odd-even transposition sort) is too slow when the number of elements is large. This PR introduces Thrust implementation of sort which is much faster.
Note that this change requires CMake
3.83.13 or later since we have to use nvcc to compile a thrust code.benchmark script
result (NVIDIA P100, without thrust)
result (NVIDIA P100, with thrust)
@icemelon9 @vinx13 @masahi could you help to review?