-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Frontend][Tensorflow] Sparse_Dense Op CSR scheduling issue resolved for Cuda & X86 #7148
Conversation
…for both cuda & x86
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the issue holding up CSR scheduling?
We already talked about performance for this right?
Cuda scheduling for Sparse_dense Op is internally changed to Sparse_dense_padded. But it works only when multiple of warp_size, but if it is lower than that, there is no fallback scheduling for CSR, so i have resolved that part here. Please let me know in case i am not clear. Thanks! |
Gentle ping @tkonolige ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I just hit the bug that this fixes. Can we add a test to make sure we don't hit it again in the future. Here is the test I wrote:
@tvm.testing.requires_cuda
def test_sparse_dense_padded_alter_op():
with tvm.target.Target("cuda"):
M = 128
N = 16
K = 128
X_np = np.random.randn(M, K).astype("float32")
W_sp_np = random_bsr_matrix(N, K, 2, 2, density=0.01, dtype="float32")
x = relay.var("x", relay.TensorType(X_np.shape,"float32"))
mult = relay.op.nn.sparse_dense(
x,
(
relay.Constant(tvm.nd.array(W_sp_np.data)),
relay.Constant(tvm.nd.array(W_sp_np.indices)),
relay.Constant(tvm.nd.array(W_sp_np.indptr)),
),
)
f = relay.Function([x], mult)
f_ = relay.transform.InferType()(tvm.IRModule.from_expr(f))
f_ = relay.transform.AlterOpLayout()(f_)
assert f_["main"].body.op.name == "nn.internal.sparse_dense_padded"
# build with cuda and AlterOpLayout to ensure that sparse_dense_padded is has an implementation
with tvm.transform.PassContext(opt_level=3, required_pass="AlterOpLayout"):
x = relay.build(tvm.IRModule.from_expr(f), target=tvm.target.Target("cuda"))
in tests/python/topi/python/test_topi_sparse.py
Thanks @tkonolige ! The test case is added now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! @comaniac @junrushao1994 I think this is ready to merge. (Assuming it passes CI).
Thanks @ANSHUMAN87 @tkonolige |
…for Cuda & X86 (apache#7148) * [Frontend][Tensorflow] Sparse_Dense Op CSR scheduling issue resolved for both cuda & x86 * [1] Review comments handled * [2] Review comments handled * [3] Review comments handled
…for Cuda & X86 (apache#7148) * [Frontend][Tensorflow] Sparse_Dense Op CSR scheduling issue resolved for both cuda & x86 * [1] Review comments handled * [2] Review comments handled * [3] Review comments handled
…for Cuda & X86 (apache#7148) * [Frontend][Tensorflow] Sparse_Dense Op CSR scheduling issue resolved for both cuda & x86 * [1] Review comments handled * [2] Review comments handled * [3] Review comments handled
…for Cuda & X86 (apache#7148) * [Frontend][Tensorflow] Sparse_Dense Op CSR scheduling issue resolved for both cuda & x86 * [1] Review comments handled * [2] Review comments handled * [3] Review comments handled
This is a follow up PR.
cc @tkonolige , @FrozenGene !