Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Relay][Pass] Support combine multiple dense op just into dense #6062

Conversation

wrongtest-intellif
Copy link
Contributor

Hi there, this PR is a minor modification to CombineParallelDense pass, refer to https://discuss.tvm.ai/t/yet-another-dense-op-combine-strategy/7126. The changes are:

  • Add option "to_batch" (default to True) to control whether combine dense ops into batch_matmul or dense.

  • Add implementation to combine dense ops into one large dense instead of batch_matmul, which take almost same logic with that of CombineParallelConv2D pass.

  • Test cases for combine various shapes of elem-wise op followed.

  • The new strategy can combine even ops of different output dims and may take better performance in circumstances where flat matmul operation is faster than equivalent batch_matmul operation.

@wrongtest-intellif wrongtest-intellif force-pushed the feat/SupportAnotherDenseCombineStrategy branch from 022bd68 to 2170bed Compare July 15, 2020 08:28
@wrongtest-intellif
Copy link
Contributor Author

cc @jroesch @icemelon9 @comaniac

Copy link
Contributor

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I have two questions other than the comments:

  1. How to trigger this change (i.e., to_batch=false) for an end user? It seems to me that you can configure it only by manually modifying the build_module or VM compiler and rebuilding TVM.

  2. IIUC, the reason that ParallelDenseFlatCombiner derived from ParallelOpCombiner instead of ParallelOpBatchCombiner is it requires special processes to almost every function, so it seems no benefit to derive from ParallelOpBatchCombiner. Now the class hierarchy becomes:

    • ParallelDenseBatchCombiner <- ParallelOpBatchCombiner <- ParallelOpCombiner
    • ParallelDenseFlatCombiner <----------------------------------|

    Since I didn't find any other classes derived from ParallelOpBatchCombiner, should we simplify ParallelOpBatchCombiner class if we cannot make both ParallelDense*Combiner derive from it?

src/relay/backend/build_module.cc Outdated Show resolved Hide resolved
src/relay/backend/vm/compiler.cc Outdated Show resolved Hide resolved
src/relay/transforms/combine_parallel_dense.cc Outdated Show resolved Hide resolved
include/tvm/relay/transform.h Outdated Show resolved Hide resolved
@wrongtest-intellif
Copy link
Contributor Author

wrongtest-intellif commented Jul 16, 2020

Thanks for the PR! I have two questions other than the comments:

  1. How to trigger this change (i.e., to_batch=false) for an end user? It seems to me that you can configure it only by manually modifying the build_module or VM compiler and rebuilding TVM.

  2. IIUC, the reason that ParallelDenseFlatCombiner derived from ParallelOpCombiner instead of ParallelOpBatchCombiner is it requires special processes to almost every function, so it seems no benefit to derive from ParallelOpBatchCombiner. Now the class hierarchy becomes:

    • ParallelDenseBatchCombiner <- ParallelOpBatchCombiner <- ParallelOpCombiner
    • ParallelDenseFlatCombiner <----------------------------------|

    Since I didn't find any other classes derived from ParallelOpBatchCombiner, should we simplify ParallelOpBatchCombiner class if we cannot make both ParallelDense*Combiner derive from it?

Thanks for your comments !

  • In our practice we just manually call Python api mod = relay.transform.CombineParallelDense(3, False)(mod).
    Because this pass will change the shape, currently we have to manually call it (or relay.optimize(mod) for default optimization) before any auto-tuning step to consider the combined kernel shape and then build.

  • How about improve to make ParallelOpBatchCombiner as an exposed optional pass (maybe in another PR)? It can be used like mod = CombineParallelOpToBatch("op_name", "batch_op_name", 3). This may serve the original idea of this class and users can combine various kinds of op flexibly. Of course, the use case may be rare in common network structures.

Copy link
Contributor

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks.
Would like to get comments from @jroesch @icemelon9 @jonso4 as well.

Copy link
Contributor

@MarisaKirisame MarisaKirisame left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO there should be a CombineParallelOp Pass that just call CombineParallelDense/Conv2d/etc in sequential. Can you add that?
Also, why CombineParallel instead of Batch? Ppl use static batching/dynamic batching to describe this process.

include/tvm/relay/transform.h Outdated Show resolved Hide resolved
@tqchen
Copy link
Member

tqchen commented Jul 25, 2020

@MarisaKirisame please manage the PR and merge after things everyone approves

@MarisaKirisame
Copy link
Contributor

MarisaKirisame commented Jul 25, 2020

@tqchen got it.

@tqchen
Copy link
Member

tqchen commented Aug 6, 2020

@MarisaKirisame please followup

@MarisaKirisame
Copy link
Contributor

@wrongtest can you just make a pass that is nothing but a sequential of the 3 combine passes? that's all i think should be changed.

@wrongtest-intellif
Copy link
Contributor Author

Sorry for too late, a wrapped function BatchingOps() is added.

@MarisaKirisame MarisaKirisame merged commit b3c42f9 into apache:master Aug 7, 2020
wjliu1998 pushed a commit to wjliu1998/incubator-tvm that referenced this pull request Aug 13, 2020
…he#6062)

* feat: Support combine multiple matmuls to flat matmul

* fix: Change to_batch -> to_batch_matmul and enrich docstring

* feat: Add wrapped batching ops pass for python
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Aug 26, 2020
…he#6062)

* feat: Support combine multiple matmuls to flat matmul

* fix: Change to_batch -> to_batch_matmul and enrich docstring

* feat: Add wrapped batching ops pass for python
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Aug 26, 2020
…he#6062)

* feat: Support combine multiple matmuls to flat matmul

* fix: Change to_batch -> to_batch_matmul and enrich docstring

* feat: Add wrapped batching ops pass for python
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Aug 26, 2020
…he#6062)

* feat: Support combine multiple matmuls to flat matmul

* fix: Change to_batch -> to_batch_matmul and enrich docstring

* feat: Add wrapped batching ops pass for python
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Sep 2, 2020
…he#6062)

* feat: Support combine multiple matmuls to flat matmul

* fix: Change to_batch -> to_batch_matmul and enrich docstring

* feat: Add wrapped batching ops pass for python
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Sep 3, 2020
…he#6062)

* feat: Support combine multiple matmuls to flat matmul

* fix: Change to_batch -> to_batch_matmul and enrich docstring

* feat: Add wrapped batching ops pass for python
@zhanghaohit zhanghaohit deleted the feat/SupportAnotherDenseCombineStrategy branch December 17, 2020 02:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants