-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Opt] Dead store and stack-related operation elimination by control-flow graph #1324
Conversation
We can weaken an
We can do this optimization not only when However, we can't do this optimization if it's parallel executed:
|
How to deal with the "load" in offloaded range for?
If we don't deal with it, the global stores will be eliminated. We need something like this: taichi/taichi/transforms/variable_optimization.cpp Lines 318 to 331 in 13462ff
@yuanming-hu Do you have any ideas? I think we probably need to traverse the IR to find a global temp of the same offset with offload->begin_offset , and add an std::vector for each CFGNode denoting the loads not representable in the node...
|
Are there better ways to support offloaded range fors without doing implicit loads...? |
Codecov Report
@@ Coverage Diff @@
## master #1324 +/- ##
=======================================
Coverage 85.48% 85.48%
=======================================
Files 19 19
Lines 3375 3375
Branches 630 630
=======================================
Hits 2885 2885
Misses 358 358
Partials 132 132 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! LGTM. It's a lot of work. Thank you so much!
Related issue = #926 #656
Related PR = #932 #1248
Replaces #859 #857
Change in
build_cfg
:final_node
. Then assume there are reads to all global pointers after the final node (just like there are writes to all global pointers before the start node), for more convenient analysis on global pointers.CFGNode
if they are parallel executed (i.e. in an offloaded range_for/struct_for's body).Change in
CFGNode
:replace_with
, deprecateerase_entire_node
.contain_variable
for bothreach_kill_variable
(https://en.wikipedia.org/wiki/Reaching_definition) andlive_kill_variable
(https://en.wikipedia.org/wiki/Live_variable_analysis). Keep the functionreach_kill_variable
becausereach_kill
should contain definitions instead of variables by definition, but I store variables in theunordered_set
for convenience.Change in
compile_to_offloads
:variable_optimization
pass. (Faster compilation!)Benchmark:
Compilation time (#926 (comment)):
6.872m -> 1.164m
(codegen_kernel_statements: 29087 -> 28573)
Number of statements:
The only case that becomes worse is:
Previously, we didn't consider global temp as global ptrs, so we could simplify the kernel to
ret[None] = 1
...[Click here for the format server]