Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TVM] Zero elimination #2634

Closed
wants to merge 10 commits into from

Conversation

sgrechanik-h
Copy link
Contributor

This PR is extracted from the tensor expression autodiff PR #2498. It contains transformations required for optimizing the code generated by the autodiff. Note that although the transformations are independent from the autodiff, they aren't used anywhere else yet.

The main goal of zero elimination is to remove summation over zeros. This is achieved by

  1. Factoring out conditions (LiftNonzeronessCondition does this)
  2. Fusing conditions with reductions and simplifying their iteration domains (SimplifyReductionDomain)

The facade function which combines both these steps is OptimizeAndLiftNonzeronessConditions.

A brief summary of functions added by the PR:

  • include/tvm/ir_operator.h
    • is_const_value - check if an Expr is a const of the given value, works for floats as well as for ints.
  • src/op/op_util.h
    • TensorFromExpr - create a tensor from an expr and itervars. Something like compute, but easier to use in some contexts since it automatically duplicates reductions.
    • TransformBody - create a new tensor by transform the body of the given tensor using the given expression transforming functions.
  • src/pass/zero_elimination.h
    • IsSumCombiner - check whether the given combiner is a sum.
    • CanFactorZeroFromCombiner - check if we can factor out zero from the given combiner.
    • LiftNonzeronessCondition - transform the expression into the form select(c, e, 0).
    • InlineTailCall - if the given tensor is just a call to another tensor, inline it. This function is used in the autodiff.
    • InlineTensors - transform the given expression or tensor by inlining calls to other tensors, also used in the autodiff.
    • SolveSystemOfInequalities - perform Fourier-Motzkin elimination.
    • SimplifyDomain - simplify the given iteration domain using Fourier-Motzkin elimination and several other transformations.
    • SimplifyReductionDomain - simplify the iteration domain of the given reduction using the previous function.
    • ExtractAsTensorMaybe - extract the given expr into a separate tensor if this seems reasonable.
    • ExtractNonTopReductions - extract non-top-level reductions as separate tensors.
    • ExtractReductions - almost the same thing, but may also extract the given expr itself.
    • OptimizeAndLiftNonzeronessConditions - a more powerful version of LiftNonzeronessCondition which also works for reductions and tries to optimize the given tensor using the above transformations (mostly by simplifying iteration domains). This is the most important function which is used by the autodiff.
  • python/tvm/testing.py
    • estimate_performance - statically estimate performance of the given Tensor or Expr, used for regression testing.

@tqchen
Copy link
Member

tqchen commented Feb 23, 2019

@sgrechanik-h please request reviews

@sgrechanik-h
Copy link
Contributor Author

@ZihengJiang @were @xqdan @derisavi @grwlf Please review.

Copy link
Contributor

@sergei-mironov sergei-mironov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a partial review containing few questions.

Visit(op->buffer_var);
IRVisitor::Visit_(op);
}
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an issue, but a generic thought. How can we be sure that we really visit all IR operations involving variables? What if somebody adds new operation containing variable later? How to let them know that they need to fix this place too? May be we should think about using direct switch on IRNodeTypes rather than visitor pattern in such cases and instruct the C++ compiler to issue a warning if case for some node is missing. @tqchen what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want a compile time error/warning then the simplest solution is to create an alternative of IRVisitor with every Visit_ function declared as purely virtual:

class IRVisitorNoDefault : public IRVisitor {
 public:
  virtual void Visit_(const Variable* op) = 0;
  virtual void Visit_(const LetStmt* op) = 0;
  //...
};

(Using a switch and configuring the compiler to raise and error on an incomplete switch could be an alternative, but it may be a bit compiler-dependent).

However, I'm not sure if it's worth doing since new operations are rarely added into IR.

include/tvm/ir_operator.h Show resolved Hide resolved
src/pass/zero_elimination.cc Show resolved Hide resolved
src/pass/zero_elimination.cc Outdated Show resolved Hide resolved
Copy link
Contributor

@derisavi derisavi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting such a large effort into this work.
I haven't reviewed the whole code yet but these are my comments so far.

} else {
return false;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method's implementation is very similar to that of tvm::is_const_int(). I wonder whether it's possible to reuse one in the other.

one difference I see is in how UIntImm is handled. Seems to me that your handling is correct and is_const_int's handling is not. So maybe the latter need to be fixed and then it will be more similar to yours.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it makes sense to express is_const_int through is_const_value.

python/tvm/testing.py Show resolved Hide resolved
src/pass/zero_elimination.cc Outdated Show resolved Hide resolved
src/pass/zero_elimination.cc Outdated Show resolved Hide resolved
src/pass/zero_elimination.cc Show resolved Hide resolved
src/pass/zero_elimination.cc Show resolved Hide resolved
src/pass/zero_elimination.cc Show resolved Hide resolved
src/pass/zero_elimination.cc Outdated Show resolved Hide resolved
src/pass/zero_elimination.cc Outdated Show resolved Hide resolved
src/pass/zero_elimination.cc Show resolved Hide resolved
Copy link
Contributor

@sergei-mironov sergei-mironov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks.

@tqchen
Copy link
Member

tqchen commented Mar 3, 2019

Thanks for putting great effort into this. Can we comment a bit on how some of the simplification routines might fit into the new infrastructure #2588?

Ideally, we would like to have a consolidated effort on index simplifications. So that some of the effort introduced here can benefit general cases.

I understand that such discussion and deliberation might cause a bit slowdown in bringing this in, but could benefit us in the long term.

@sgrechanik-h
Copy link
Contributor Author

@tqchen Honestly speaking, I don't quite understand the design and motivation behind the new Analyzer class. In zero-elimination there are several internal analysis/transformations which are basically functions of the signature ResType (Expr, Map<Var, Range>), and they could be added as subanalyses by using a simple adapter from such functions to analyses. However, it is probable that I'm missing the point.

Concerning the usefulness of adding these transformations to the analyzer infrastructure, I should think a bit more on it. There are analyses like FreeVariables which are probably useful, there are transformations which are too zero-elimination-specific (like LiftNonzeronessConditions), and there are some transformations for which I'm not sure (like div/mod elimination).

@tqchen
Copy link
Member

tqchen commented Mar 5, 2019

It is not about the usefulness of the features, but want to discuss where to best put them, so that we have a clear collection of sub-analyzers, and limited amount of states.

Let me give some of the example questions we can ask:

  • Can Map<Var, Range> fold directly into a Sub-Analyzer that gives you the constant bound or symbolic bound?
  • does div/mod fold into Canonical/Rewrite Simplifier?
  • Reduction simplification might now go into the Simplifier as well now that we can define Visitors for Reduce.

@sgrechanik-h
Copy link
Contributor Author

@tqchen,

  • Can Map<Var, Range> fold directly into a Sub-Analyzer that gives you the constant bound or symbolic bound?

Like replacing Map<Var, Range> with Analyzer and calling analyzer.const_int_bound(...) instead of looking up in the map? Yes, this is possible.

  • does div/mod fold into Canonical/Rewrite Simplifier?

Div/mod elimination is not really a simplification, it is a transformation into a form which can be manipulated more easily by certain algorithms. So I don't think it should be a part of any simplifier. It's actually quite common: many transformations don't really simplify expressions in the usual sense.

  • Reduction simplification might now go into the Simplifier as well now that we can define Visitors for Reduce.

This is possible.

Actually I have several doubts about the new Analyzer infrastructure that may be worth discussing, in particular:

  • The Analyzer class combines context tracking and managing subanalyzers. Or may be analyzers combine context tracking and analysis itself. I have a feeling that these things should be separated, although I may be wrong.
  • The notion of a subanalyzer is not very well defined.
  • It seems very hard to add a new subanalyzer: it seems to involve modifying several classes in several different places in a way that is very dependent on the particular analyzer, instead of just adding a class or a function to a list.
  • There is no example of memoization yet, and I'm not sure how it'll go. However, I believe that the problem of memoization should be separated from analysis/transformation itself, i.e. there should be some combinator to make arbitrary analysis memoized.

const Array<Var>& outer_axis,
const Map<Var, Range>& vranges) {
// TODO(sgrechanik-h): We don't use divmod elimination here because of some performance problems
auto res = SimplifyDomain(cond, outer_axis, vranges, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SimplifyDomain iterates over outer_axis in reverse order at line 1297 (as described by equations 22-26 in the paper?). As such, res.axis and res.conditions are in opposite orders. (conditions are added at line 1394 in the original order) This can result in a unnecessary transpose copy by line 1468 when trying to create a TensorFromExpr with res.axis which contain axes in reverse order from that of new_expr. Should a simple fix in SimplifyDomain ensure original order of simplified axes in res (i.e. not reverse), a unnecessary copy can still result from TensorFromExpr. Any thoughts? Tensor copy can be costly to performance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kaitingwang I think it makes sense to reverse the resulting res.axis to match the original order of variables, I'll fix it. However, although it helps in some cases, the real problem is in the ExtractAsTensorMaybe function which doesn't check if the expression is costly enough to make the extraction beneficial. Since "costly enough" is rather difficult to define precisely, I think, the proper solution is to rely on schedules (compute_inline should help, I guess). However, this is quite naive (especially considering that we don't have an autoscheduler yet), so I think I'll implement some heuristics to prevent such extractions in some obvious cases.

Copy link
Contributor

@kaitingwang kaitingwang Mar 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, in cases where new_expr is a computational intensive expression, computing it once and memoize the result for later uses is wise. This is in the case where the uses may require recomputing the same computational intensive expression multiple times. Thanks for the fix.

@tqchen
Copy link
Member

tqchen commented Mar 25, 2019

@sgrechanik-h perhaps some of the discussion should go to #2588 . Thanks for being patient. I think one goal would be moving all the states related to the context information into the Analyzer and move everything related to expression simplification there.

For now, I did not implement memoization mainly because memoization can cause problems when we update context information about variable x(and previously computed transformation about x related expression may no longer hold.

It would be great if we can consolidate everything into the new infra, and have more specific discussions in #2588 on how things can be further improved. I hope by doing so we can have a common infra that we can improve upon and produce as less technical debts as possible

@sgrechanik-h
Copy link
Contributor Author

For now, I did not implement memoization mainly because memoization can cause problems when we update context information about variable x(and previously computed transformation about x related expression may no longer hold.

@tqchen I think that it may be important to implement memoization as early as possible exactly for this reason: it may cause lots of problems requiring us to make completely different design decisions about the whole Analyzer infrastructure.

e = Substitute(e, vmap);
}

return CanonicalSimplify(Simplify(CanonicalSimplify(e, vranges), vranges), vranges);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain why call 3 times simplify here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is simply the combination that gave the best result. I think I will rewrite this piece of code in the near future so that it doesn't use the Halide simplifier and uses the new simplifiers instead (and hopefully there won't be a need to make 3 calls).

@tqchen
Copy link
Member

tqchen commented Mar 30, 2020

Superseded by #5171

@tqchen tqchen closed this Mar 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants