-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Relay] Add new IR pass CombineParallelDense #3862
Conversation
@vinx13 I will add my review can you as well? |
@MarisaKirisame can you add review as well |
Overall LGTM, thanks for you contribution! My comments are about documentation mostly. We are making a push this release cycle to improve the docs. I am trying to encourage everyone to incrementally grow them with each PR to passes, etc. I will take another pass after other reviews come in, and then merge. |
As most operators take in batched input, it is not hard to imagine different version of CombineXXX... Furthermore, if we do dynamic batching ala tensorflow fold (which I think we will do), we need a CombineXXX for every XXX operator. So, is it possible to merge the two combine into a generic pass? |
@MarisaKirisame that's a really good idea. The only tricky part is that How about this: we create a new class, I'm not sure it's possible to merge What do you think? @jroesch any thoughts? |
@soiferj sounds good to me. maybe the default should be False, because op need manual batching to be supported (for example, changing dense to BatchMatMul). For the easy case, we can override DeclareUnaryOp and DeclareBinaryOp, which will do the job |
@MarisaKirisame, sounds good. For now, I'll add support for this functionality. I think we can decide how we actually want to enable it for all ops in another PR. Is that alright? |
@soiferj exactly. |
@MarisaKirisame do you think that the new base class, The reason why I ask is that broadcasting logic for element-wise ops that follow the main op can get tricky if we support an arbitrary number of dimensions. |
@soiferj both are completely ok with me. |
@soiferj sounds good to me in general, I am always in favor of slowly layering changes, etc but setting yourself up for future success. I think Marisa's point is that we should be able to hook just a few methods and leave the analysis/traversal code generic between the passes. For example we could also overload how they are combined with another hook, i.e concat vs. stack. If I understand the goal correctly is we have some operators like, |
Just pushed a new set of changes that introduces |
Also, I have a design question: after we support |
@soiferj Likely some fused operations are not inlined. Can you check what sub-function after fusion raised this error during Relay compilation? |
The error says it's coming from
|
We need to trace the Relay part of these functions (and see how their TOPI schedules are called) |
Okay, I have the output. What exactly should I be looking for? |
@soiferj After fusion the Relay program is split into several parts like
Each part is passed to |
@vinx13, nevermind, I made a mistake on my testing branch. I created a new branch, merged my changes with master, and can run the model now. Sorry about that. |
The CI failure doesn't look related to me...it is occurring in |
Thanks everyone for your help and feedback :) This PR has been approved for the past few days - would someone be able to take a look at merging when they get a chance? |
@jroesch Could you take one more look at this PR? |
@jroesch would you mind taking a look when you have a chance? |
Sorry for dropping the ball on this one, been busy with life and other work things :) will merge tonight. |
I think everything looks good to me, thanks for contribution, especially all the extra work generalizing the interface |
* Refactor to create abstract ParallelOpCombiner * First draft of CombineParallelDense * Begin to work on tests * Test * Refactor to move out more common code * Clean up * Fix * Remove statics * fix wording * Start to add combine_parallel_op_batch * Resolve PR comments * Resolve PR comments * dummy change to retrigger CI * Change special case from bias_add to add * Revert special case change * Ignore units check * dummy change to retrigger CI * dummy change to re-trigger CI * Improve docs * Update docs * Update docs
* Refactor to create abstract ParallelOpCombiner * First draft of CombineParallelDense * Begin to work on tests * Test * Refactor to move out more common code * Clean up * Fix * Remove statics * fix wording * Start to add combine_parallel_op_batch * Resolve PR comments * Resolve PR comments * dummy change to retrigger CI * Change special case from bias_add to add * Revert special case change * Ignore units check * dummy change to retrigger CI * dummy change to re-trigger CI * Improve docs * Update docs * Update docs
* Refactor to create abstract ParallelOpCombiner * First draft of CombineParallelDense * Begin to work on tests * Test * Refactor to move out more common code * Clean up * Fix * Remove statics * fix wording * Start to add combine_parallel_op_batch * Resolve PR comments * Resolve PR comments * dummy change to retrigger CI * Change special case from bias_add to add * Revert special case change * Ignore units check * dummy change to retrigger CI * dummy change to re-trigger CI * Improve docs * Update docs * Update docs
This IR pass is similar to CombineParallelConv2D, but with the Dense operator. The pass takes multiple parallel dense operators and combines them into one batch_matmul operator. Element-wise operations after dense are also stacked and fused.
The code is refactored so CombineParallelConv2D and CombineParallelDense can share code, and make it easier to write a new "combine" pass later.
This is in opt level 4. In another PR, we can add tuning to decide whether or not to apply this pass.
This will help models like BERT, which have parallel Dense + BiasAdd at the start of each layer.
Discussed in this RFC: https://discuss.tvm.ai/t/discussion-new-ir-pass-proposal-combineparalleldense/3813
@vinx13 @icemelon9 would you be able to take a look?