[TOPI][OP] Use Thrust sort for argsort and topk #5097

kazum · 2020-03-19T09:07:25Z

The current GPU sort implementation (odd-even transposition sort) is too slow when the number of elements is large. This PR introduces Thrust implementation of sort which is much faster.

Note that this change requires CMake ~~3.8~~ 3.13 or later since we have to use nvcc to compile a thrust code.

benchmark script

import tvm
from tvm import relay
from tvm.contrib import graph_runtime
import numpy as np

target = 'cuda'
ctx = tvm.gpu(0)
n = 100000

x = relay.var("x", shape=(n,))
out = relay.topk(x)
func = relay.Function([x], out[0])

with relay.build_config(opt_level=3):
    graph, lib, params = relay.build(func, target)

module = graph_runtime.create(graph, lib, ctx)

print("Evaluate inference time cost...")
ftimer = module.module.time_evaluator("run", ctx, number=1, repeat=3)
prof_res = np.array(ftimer().results) * 1000
print("Mean inference time (std dev): %.2f ms (%.2f ms)" %
      (np.mean(prof_res), np.std(prof_res)))

result (NVIDIA P100, without thrust)

Evaluate inference time cost...
Mean inference time (std dev): 2058.89 ms (0.07 ms)

result (NVIDIA P100, with thrust)

Evaluate inference time cost...
Mean inference time (std dev): 1.11 ms (0.03 ms)

@icemelon9 @vinx13 @masahi could you help to review?

The current GPU sort implementation (odd-even transposition sort) is too slow when the number of elements is large. This PR introduces Thrust implementation of sort which is much faster. Note that this change requires CMake 3.8 or later since we have to use nvcc to compile a thrust code.

Laurawly · 2020-03-19T16:58:18Z

Is this cuda 10.1 compatible? I tried to build it on my machine with cuda 10.1 and it gives me the following errors:

/home/ubuntu/workplace/tvm-1/src/runtime/contrib/thrust/thrust.cu(42): error: namespace "thrust" has no member "device_ptr"

/home/ubuntu/workplace/tvm-1/src/runtime/contrib/thrust/thrust.cu(42): error: type name is not allowed

/home/ubuntu/workplace/tvm-1/src/runtime/contrib/thrust/thrust.cu(43): error: namespace "thrust" has no member "device_ptr"

/home/ubuntu/workplace/tvm-1/src/runtime/contrib/thrust/thrust.cu(43): error: type name is not allowed

/home/ubuntu/workplace/tvm-1/src/runtime/contrib/thrust/thrust.cu(44): error: namespace "thrust" has no member "device_ptr"

/home/ubuntu/workplace/tvm-1/src/runtime/contrib/thrust/thrust.cu(44): error: type name is not allowed

/home/ubuntu/workplace/tvm-1/src/runtime/contrib/thrust/thrust.cu(52): error: identifier "data_ptr" is undefined

/home/ubuntu/workplace/tvm-1/src/runtime/contrib/thrust/thrust.cu(52): error: identifier "values_ptr" is undefined

/home/ubuntu/workplace/tvm-1/src/runtime/contrib/thrust/thrust.cu(55): error: identifier "indices_ptr" is undefined

/home/ubuntu/workplace/tvm-1/src/runtime/contrib/thrust/thrust.cu(42): error: identifier "data_ptr" is undefined

kazum · 2020-03-19T20:29:09Z

@Laurawly Thanks, I've update the code and confirmed that it can be compiled on Ubuntu 18.04 and CUDA 10.1.

masahi

Looks great!

icemelon

lgtm

Laurawly · 2020-03-20T05:35:57Z

Could you also add USE_THRUST option in config.cmake file?

Laurawly · 2020-03-20T17:01:06Z

Thanks @kazum @masahi @icemelon9

* [TOPI][OP] Use Thrust sort for argsort and topk The current GPU sort implementation (odd-even transposition sort) is too slow when the number of elements is large. This PR introduces Thrust implementation of sort which is much faster. Note that this change requires CMake 3.8 or later since we have to use nvcc to compile a thrust code. * cmake: make CUDA optional * allow .cu file to be into the repository * pylint fix and cleanup * require cmake 3.8 only when thrust is enabled * fix nvcc compiler error when passing -pthread * add missing include * add USE_THRUST option in config.cmake * retrigger CI * retrigger CI

kazum added 5 commits March 19, 2020 17:56

cmake: make CUDA optional

4823e93

allow .cu file to be into the repository

98b2aa3

pylint fix and cleanup

13b3edc

require cmake 3.8 only when thrust is enabled

addba76

kazum added 2 commits March 19, 2020 20:19

fix nvcc compiler error when passing -pthread

041860e

add missing include

0d88876

masahi approved these changes Mar 20, 2020

View reviewed changes

icemelon approved these changes Mar 20, 2020

View reviewed changes

kazum added 3 commits March 20, 2020 14:58

add USE_THRUST option in config.cmake

b1e388d

retrigger CI

bc06ce4

retrigger CI

cc59cbf

tqchen assigned Laurawly Mar 20, 2020

Laurawly merged commit 2e8f3a9 into apache:master Mar 20, 2020

kazum deleted the thrust branch March 21, 2020 01:50

ZihengJiang mentioned this pull request Sep 25, 2020

TVM v0.7 Release Note Candidate #6486

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TOPI][OP] Use Thrust sort for argsort and topk #5097

[TOPI][OP] Use Thrust sort for argsort and topk #5097

kazum commented Mar 19, 2020 •

edited

Loading

Laurawly commented Mar 19, 2020

kazum commented Mar 19, 2020

masahi left a comment

icemelon left a comment

Laurawly commented Mar 20, 2020

Laurawly commented Mar 20, 2020 •

edited

Loading

[TOPI][OP] Use Thrust sort for argsort and topk #5097

[TOPI][OP] Use Thrust sort for argsort and topk #5097

Conversation

kazum commented Mar 19, 2020 • edited Loading

Laurawly commented Mar 19, 2020

kazum commented Mar 19, 2020

masahi left a comment

Choose a reason for hiding this comment

icemelon left a comment

Choose a reason for hiding this comment

Laurawly commented Mar 20, 2020

Laurawly commented Mar 20, 2020 • edited Loading

kazum commented Mar 19, 2020 •

edited

Loading

Laurawly commented Mar 20, 2020 •

edited

Loading