-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TVM] Zero elimination #2634
[TVM] Zero elimination #2634
Conversation
@sgrechanik-h please request reviews |
@ZihengJiang @were @xqdan @derisavi @grwlf Please review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a partial review containing few questions.
Visit(op->buffer_var); | ||
IRVisitor::Visit_(op); | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not an issue, but a generic thought. How can we be sure that we really visit all IR operations involving variables? What if somebody adds new operation containing variable later? How to let them know that they need to fix this place too? May be we should think about using direct switch
on IRNodeTypes
rather than visitor pattern in such cases and instruct the C++ compiler to issue a warning if case for some node is missing. @tqchen what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want a compile time error/warning then the simplest solution is to create an alternative of IRVisitor
with every Visit_
function declared as purely virtual:
class IRVisitorNoDefault : public IRVisitor {
public:
virtual void Visit_(const Variable* op) = 0;
virtual void Visit_(const LetStmt* op) = 0;
//...
};
(Using a switch and configuring the compiler to raise and error on an incomplete switch could be an alternative, but it may be a bit compiler-dependent).
However, I'm not sure if it's worth doing since new operations are rarely added into IR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting such a large effort into this work.
I haven't reviewed the whole code yet but these are my comments so far.
} else { | ||
return false; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method's implementation is very similar to that of tvm::is_const_int()
. I wonder whether it's possible to reuse one in the other.
one difference I see is in how UIntImm
is handled. Seems to me that your handling is correct and is_const_int
's handling is not. So maybe the latter need to be fixed and then it will be more similar to yours.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it makes sense to express is_const_int
through is_const_value
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks.
Thanks for putting great effort into this. Can we comment a bit on how some of the simplification routines might fit into the new infrastructure #2588? Ideally, we would like to have a consolidated effort on index simplifications. So that some of the effort introduced here can benefit general cases. I understand that such discussion and deliberation might cause a bit slowdown in bringing this in, but could benefit us in the long term. |
@tqchen Honestly speaking, I don't quite understand the design and motivation behind the new Analyzer class. In zero-elimination there are several internal analysis/transformations which are basically functions of the signature Concerning the usefulness of adding these transformations to the analyzer infrastructure, I should think a bit more on it. There are analyses like FreeVariables which are probably useful, there are transformations which are too zero-elimination-specific (like LiftNonzeronessConditions), and there are some transformations for which I'm not sure (like div/mod elimination). |
It is not about the usefulness of the features, but want to discuss where to best put them, so that we have a clear collection of sub-analyzers, and limited amount of states. Let me give some of the example questions we can ask:
|
Like replacing
Div/mod elimination is not really a simplification, it is a transformation into a form which can be manipulated more easily by certain algorithms. So I don't think it should be a part of any simplifier. It's actually quite common: many transformations don't really simplify expressions in the usual sense.
This is possible. Actually I have several doubts about the new Analyzer infrastructure that may be worth discussing, in particular:
|
const Array<Var>& outer_axis, | ||
const Map<Var, Range>& vranges) { | ||
// TODO(sgrechanik-h): We don't use divmod elimination here because of some performance problems | ||
auto res = SimplifyDomain(cond, outer_axis, vranges, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SimplifyDomain iterates over outer_axis in reverse order at line 1297 (as described by equations 22-26 in the paper?). As such, res.axis and res.conditions are in opposite orders. (conditions are added at line 1394 in the original order) This can result in a unnecessary transpose copy by line 1468 when trying to create a TensorFromExpr with res.axis which contain axes in reverse order from that of new_expr. Should a simple fix in SimplifyDomain ensure original order of simplified axes in res (i.e. not reverse), a unnecessary copy can still result from TensorFromExpr. Any thoughts? Tensor copy can be costly to performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kaitingwang I think it makes sense to reverse the resulting res.axis
to match the original order of variables, I'll fix it. However, although it helps in some cases, the real problem is in the ExtractAsTensorMaybe function which doesn't check if the expression is costly enough to make the extraction beneficial. Since "costly enough" is rather difficult to define precisely, I think, the proper solution is to rely on schedules (compute_inline
should help, I guess). However, this is quite naive (especially considering that we don't have an autoscheduler yet), so I think I'll implement some heuristics to prevent such extractions in some obvious cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, in cases where new_expr is a computational intensive expression, computing it once and memoize the result for later uses is wise. This is in the case where the uses may require recomputing the same computational intensive expression multiple times. Thanks for the fix.
@sgrechanik-h perhaps some of the discussion should go to #2588 . Thanks for being patient. I think one goal would be moving all the states related to the context information into the Analyzer and move everything related to expression simplification there. For now, I did not implement memoization mainly because memoization can cause problems when we update context information about variable x(and previously computed transformation about x related expression may no longer hold. It would be great if we can consolidate everything into the new infra, and have more specific discussions in #2588 on how things can be further improved. I hope by doing so we can have a common infra that we can improve upon and produce as less technical debts as possible |
@tqchen I think that it may be important to implement memoization as early as possible exactly for this reason: it may cause lots of problems requiring us to make completely different design decisions about the whole Analyzer infrastructure. |
e = Substitute(e, vmap); | ||
} | ||
|
||
return CanonicalSimplify(Simplify(CanonicalSimplify(e, vranges), vranges), vranges); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you explain why call 3 times simplify here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is simply the combination that gave the best result. I think I will rewrite this piece of code in the near future so that it doesn't use the Halide simplifier and uses the new simplifiers instead (and hopefully there won't be a need to make 3 calls).
Superseded by #5171 |
This PR is extracted from the tensor expression autodiff PR #2498. It contains transformations required for optimizing the code generated by the autodiff. Note that although the transformations are independent from the autodiff, they aren't used anywhere else yet.
The main goal of zero elimination is to remove summation over zeros. This is achieved by
LiftNonzeronessCondition
does this)SimplifyReductionDomain
)The facade function which combines both these steps is
OptimizeAndLiftNonzeronessConditions
.A brief summary of functions added by the PR:
include/tvm/ir_operator.h
is_const_value
- check if an Expr is a const of the given value, works for floats as well as for ints.src/op/op_util.h
TensorFromExpr
- create a tensor from an expr and itervars. Something like compute, but easier to use in some contexts since it automatically duplicates reductions.TransformBody
- create a new tensor by transform the body of the given tensor using the given expression transforming functions.src/pass/zero_elimination.h
IsSumCombiner
- check whether the given combiner is a sum.CanFactorZeroFromCombiner
- check if we can factor out zero from the given combiner.LiftNonzeronessCondition
- transform the expression into the formselect(c, e, 0)
.InlineTailCall
- if the given tensor is just a call to another tensor, inline it. This function is used in the autodiff.InlineTensors
- transform the given expression or tensor by inlining calls to other tensors, also used in the autodiff.SolveSystemOfInequalities
- perform Fourier-Motzkin elimination.SimplifyDomain
- simplify the given iteration domain using Fourier-Motzkin elimination and several other transformations.SimplifyReductionDomain
- simplify the iteration domain of the given reduction using the previous function.ExtractAsTensorMaybe
- extract the given expr into a separate tensor if this seems reasonable.ExtractNonTopReductions
- extract non-top-level reductions as separate tensors.ExtractReductions
- almost the same thing, but may also extract the given expr itself.OptimizeAndLiftNonzeronessConditions
- a more powerful version ofLiftNonzeronessCondition
which also works for reductions and tries to optimize the given tensor using the above transformations (mostly by simplifying iteration domains). This is the most important function which is used by the autodiff.python/tvm/testing.py
estimate_performance
- statically estimate performance of the given Tensor or Expr, used for regression testing.