Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TOPI] Fix mali conv2d performance regression #3131

Merged
merged 3 commits into from
May 5, 2019
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion python/tvm/autotvm/tophub.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@
'cuda': "v0.04",
'rocm': "v0.02",
'opencl': "v0.02",
'mali': "v0.04",
'mali': "v0.05",

'vta': "v0.04",
}
Expand Down
7 changes: 4 additions & 3 deletions topi/python/topi/mali/conv2d.py
Original file line number Diff line number Diff line change
Expand Up @@ -352,9 +352,10 @@ def _decl_winograd(cfg, data, kernel, strides, padding, dilation, layout, out_dt
# unpack output
output = tvm.compute((N, CO, H, W), lambda n, co, h, w:
Y[co][n * nH * nW + (h//m) * nW + w//m][h % m][w % m]
# thw following term is used to make the padding effective,
# otherwise the padding will be eliminated by bound inference
+ tvm.const(0, out_dtype) * M[alpha-1][alpha-1][CO-1][P_round-1],
# thw following hack term is used to make the padding in batch gemm ("M")
# effective, otherwise the padding will be eliminated by bound inference
+ tvm.expr.Mul(tvm.const(0, out_dtype),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to leave a comment point to the issue #3088 so ppl understand why Mul instead of *

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still confused at why we need this multiplication?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thw -> the

Copy link
Member Author

@merrymercy merrymercy May 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@icemelon9 During batch gemm, we introduce some padding to avoid partial tile, so we can safely vectorize the innermost loop. However, we won't use all the output of batch gemm (the padded part is ignored in final results). The InferBound pass in tvm analyzes computation region from output to input, and only keeps the necessary part. If we don't add this term, the padding added in batch gemm will be ignored, regardless of how we tweak the shape argument in tvm.compute.
This term accesses the last element in the padded buffer, so it makes all padding effective.

Copy link
Member Author

@merrymercy merrymercy May 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yzliu tvm.expr.Mul won't do const fold, while * is equal to tvm.expr.Mul + const fold.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate in the comment by what you replied to @icemelon9 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's too long to be put in the comment..

M[alpha-1][alpha-1][CO-1][P_round-1]),
name='output', tag='winograd_conv2d_output')

# we have to manually assign effective GFLOP for winograd
Expand Down