-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Advanced optimization #656
Comments
@yuanming-hu please assign me. It seems that I can't assign myself... |
Awesome!! This is vitally important for improving run-time performance & reducing compilation time. Thanks for taking charge of this. |
What if these if's contains statements with side-effect like if (cond) x++; We want to obtain: if (cond) { x++; x++; } and the duplicated x++ can be dealt in other lower passes.
What if the two local load is in different blocks? eg. if (cond) { What if a statement is shown once in IR, but ran for multiple times, should we optimize it? eg. while (cond) { We may move this out the while. First add a |
Exactly.
This is non-trivial. We could analyze the common code fragment of true-branch and the false-branch, and put them outside the
If
To merge identical local loads if no statements between them modify the variable, this is not necessary: I think directly searching for modifications when we find a local load fits the code frame better. Maybe we can add this pass later if necessary. |
No, it's just load and never used, will be opt-out by other lower passes. |
How about first make: if (cond) { to become: if (cond) print 'yes'; else print 'no'; since cond is aconstant IR value, and the second can be safely opt-out. |
I just thought about a situation:
I can't tell if the following is more efficient than the above:
(especially when the common code fragment is relatively short than the others) We can restrict this optimization to only the first statement and the last statement of the body of |
@yuanming-hu What do Line 1637 in aa90e31
May I just ignore them when merging two adjacent |
Quick answer for now: yes. I'll document this in greater detail later. You don't have to worry about that until we start doing vectorization. |
I just found a piece of IR:
I think we could optimize them all to $8. Currently There are two ways to do this optimization:
Which do you think is better? |
I think 2 is better. At compile time it's hard to judge whether $25 or $40 will be after $8, but it's sure that $8 is before $25 and $40. |
Shall this pass (identical |
Let's add a |
For checking if the first statements (which can be container statements) in both branches of |
Very good question. I need to think about this a little bit. One very important IR functionality is to test if two |
A few things to think about here
|
There are 3 kinds of solutions I thought about. Denote the number of statements in the container
To me, I prefer the 1st solution. I think it unacceptable to spend O(depth) more time whenever modifying statements, just to avoid the worst-case O(n) time finding if two If there is a stage that statements don't change anymore, we can build data structures for comparing |
Thanks for the detailed analysis. I agree with your decision and we should probably go with the 1st solution. Meanwhile, a very easy-to-implement (and slightly hacky) way to test if two statements are equivalent:
This should work for most cases (assuming the |
Thanks for the hacky way, but I want to implement a reject-fast solution. I think most of the queries will be of different |
Maybe I can implement a visitor to visit one of the |
Sounds good. I champion your decision :-)
Right, you have to use one IRNode to guide the other. |
I wonder if this IR is valid:
It causes simplify.cpp to crash because the taichi/taichi/transforms/simplify.cpp Line 479 in 24e76a1
is not an AllocaStmt when we are visiting $218 .
|
Good question. |
@yuanming-hu I found an issue when doing CSE for global pointers:
After(bad, with some debug output in
I think although the IRs in |
Final IR:
Bad(
|
Concisely describe the proposed feature
With new extensions introduced by #581, there are lots of space to optimize the IR. I also found some feasible optimizations that are not directly related to the new extension. For example, in this fragment of IR,
we could merge the two
if
's together, change$83
toconst [0]
, and then delete$5
.A list of optimizations I have done and going to do:
-1 & a
,0 | a
([opt] Algebraic simplification for bitwise operators #827)linearized
(Lowerlinearized
into a series of adds and muls. #509)alloca
's toconst [0]
([Opt] Dive into container statements to find local loads/stores for optimization, and optimize loads of new allocas to 0 #662)if
's with identical condition ([Opt] Merge adjacent if's with the identical condition #668)if
's (thanks for @archibate 's discussion) ([opt] Move common statements in both branches outside if's #727)WholeKernelCSE
pass ([opt] Move common statements in both branches outside if's #727, [opt] Improve the whole kernel CSE pass #1082)WhileControlStmt
withcond == const [1]
([opt] Eliminate WhileControlStmt with non-zero const conditions #829)DIE
forstack pop
([Opt] Dead store and stack-related operation elimination by control-flow graph #1324)Additional comments
For benchmarking, we may want to introduce a temporary boolean variable as the switch of optimization.
Some nice slides: https://courses.cs.washington.edu/courses/cse401/08wi/lecture/opt-mark.v2.pdf
The text was updated successfully, but these errors were encountered: