-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CUDA] Parallel Cuda Mergesort #7099
Conversation
I'm hitting some very odd segfaults, just in the debug runtime with nvptx. Trying to figure out what's going on, I'll keep this as WIP until I can get that fixed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look great! I think the main implementation might benefit from a couple of comments describing what it is doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice approach. I didn't spot an obvious reason for the segfault.. Also agree with @tkonolige on docs, some high level info on the problem flattening and index mapping / thread assignment for each slice (and as slices are merged) will help maintainability by others.
6b8d79a
to
8c6b03b
Compare
Many thanks to @masahi for helping me find an issue with heterogeneous lowering and some overflow issues in how I was handling the threads. I think it should be ready for review now, thanks everyone! |
8c6b03b
to
1771a20
Compare
@@ -277,7 +277,7 @@ def _build_for_device(input_mod, target, target_host): | |||
lambda f: "calling_conv" not in f.attrs | |||
or f.attrs["calling_conv"].value != CallingConv.DEVICE_KERNEL_LAUNCH | |||
), | |||
tvm.tir.transform.Apply(lambda f: f.with_attr("target", target)), | |||
tvm.tir.transform.Apply(lambda f: f.with_attr("target", target_host)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record, segfault with nvptx was happening because the generated host code was calling intrinsics registered for nvptx, like __nv_log2
or __nv_ceil
. The reason it was working on CUDA was just by coincident: there is no CUDA intrinsics registered for fp64 log2, ceil, so TVM was using the default lowering, which happens to be the right one (llvm).
This change fixes that issue.
@mbrookhart I think we can revive some tests that are currently disabled due to flaky sort. See tvm/tests/python/relay/test_any.py Lines 253 to 256 in bad149e
tvm/tests/python/topi/python/test_topi_argwhere.py Lines 63 to 71 in 76b4ad0
|
fix lint
ee7dc9c
to
5bc7a41
Compare
We should also remove tvm/tests/python/topi/python/test_topi_argwhere.py Lines 63 to 65 in 76b4ad0
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the work. please fix the unit test
Oh no! A copy-paste error! Will fix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@Laurawly @zhiics @icemelon9 @csullivan @tkonolige
There have been many complaints recently about stability and performance of the tir-based cuda sort kernel. I've spent a couple of days this week getting a cuda version of Parallel Mergesort. It's a stable sort, so it fixes the flakiness we've seen with argsort and argwhere, it changes the threading to support dynamic shapes, and it increases the performance significantly over the previous kernel.
This PR only addresses the core sort_ir function, extending this to other versions sort in this file is future work.
I tested performance on a variety of shapes using this script and obtained these numbers on my 1070TI. It's not as fast as Thrust, as expected, but it's much closer for all shapes tested here, and even manages to beat thrust on a few. (times are in milliseconds)
Thanks!