WIP: [NewOptimizer] The one SROA pass to rule them all #26778

Keno · 2018-04-11T03:19:08Z

This is more of a "where this is going" PR. I hope we won't need to finish this until after we tag 0.7 and have some time to clean it up. The current getfield elim pass in the new optimizer is good enough to handle everything the old optimizer did as well as most simple cases of the new iteration protocol (basically where the variable is checked against nothing right after the phi). However, there are cases that are more complicated which the current getfield elim pass does not handle. To demonstrate that that is just a temporary limitation, this branch implements the more advanced version of that pass, in a sufficient state to do some benchmarking on it (but probably gets corner cases wrong, so a bunch more work is required to fix and cleanup). Here's a motivating example (due to @JeffBezanson). On current master:

using Base.Iterators: product
@noinline dosomething(x) = x
function mycount(itr)
    n = 0
    for x in itr
        dosomething(x)
        n+=1
    end
    return n
end

iter = product(ntuple(n->("a","b","c","d"), Val(9))...);
@benchmark mycount($iter)

BenchmarkTools.Trial:
  memory estimate:  59.29 MiB
  allocs estimate:  786438

With this PR and the new optimizer enabled:

BenchmarkTools.Trial:
  memory estimate:  20.00 MiB
  allocs estimate:  262145

(Timings are faster as well, but I ran them on different machines, so I omitted them since they're not useful comparisons).

The loop has 262144 iterations, so this demonstrates that the only object that gets allocated is the tuple that is required to be allocated, while the current master allocates more than 500000 useless intermediate objects (and of course if the dosomething actually was inlineable and did something with the string, the allocation of the tuple would likely go away as well on this branch).

Keno · 2018-04-11T03:31:29Z

code_typed on master: https://gist.github.com/Keno/f1146fd6c7f8688cb00dd419f48782ab
code_typed on this branch: https://gist.github.com/Keno/0422fe1d5973265b79bd9c905f36198f

It's fun to count the number of :news and Core.tuple calls that got eliminated.

In #26778, I changed the way that the reverse affinity flag works in order to support node insertion during compaction. In particular, I had originally (for ease of implementation primarily) had that flag simply change which BB an inserted node belongs to (i.e. if that flag was set and there was a basic block boundary at the insertion site, the new node would belong to the previous BB). In #26778, I changed this to instead be a flag of whether to insert before or after a given instruction. As it turns out, this change is also required for correctness. Because domsorting can change the notion of "previous instruction" by moving basic blocks, we had (in rare circumstances) instructions end up in the wrong BB. Backporting those changes fixes that.

[NewOptimizer] Backport reverse affinity changes from #26778

vtjnash · 2018-05-07T19:30:25Z

base/compiler/ssair/ir.jl

+    line::Int
+end
+function NewNode(nnode::NewNode; pos=nnode.pos, reverse_affinity=nnode.reverse_affinity,
+                                 typ=nnode.typ, node=nnode.node, line=nnode.line)


don't use named tuples for this – performance and compiler correctness will be really bad

This is already fixed on master - I just haven't rebased this change since.

vtjnash · 2018-05-07T19:33:39Z

base/compiler/ssair/ir.jl

+function getindex(compact::IncrementalCompact, ssa::NewSSAValue)
+    return compact.new_new_nodes[ssa.id].node
+end
+
 function count_added_node!(compact, v)


for performance, add type-sig annotations

vtjnash · 2018-05-07T19:34:28Z

base/compiler/ssair/ir.jl

+
+function resort_pending!(compact)
+    sort!(compact.pending_perm, DEFAULT_STABLE, Order.By(x->compact.pending_nodes[x].pos))
+end


is this returning a value?

vtjnash · 2018-05-07T19:35:23Z

base/compiler/ssair/ir.jl

+    sort!(compact.pending_perm, DEFAULT_STABLE, Order.By(x->compact.pending_nodes[x].pos))
+end
+
+function insert_node!(compact::IncrementalCompact, before, @nospecialize(typ), @nospecialize(val), reverse_affinity::Bool=false)


what type is before - looks like this should written to use dispatch?

vtjnash · 2018-05-07T19:36:55Z

base/compiler/ssair/ir.jl

+            entry = compact.pending_nodes[pos - length(compact.ir.stmts) - length(compact.ir.new_nodes)]
+            pos, reverse_affinity = entry.pos, entry.reverse_affinity
+        end
+        line = 0 #compact.ir.lines[before.id]


vtjnash · 2018-05-07T19:38:12Z

base/compiler/ssair/ir.jl

+    entry.pos == idx && entry.reverse_affinity
+end
+
+function process_newnode!(compact, new_idx, new_node_entry, idx, active_bb)


type signature?

vtjnash · 2018-05-07T19:39:09Z

base/compiler/ssair/ir.jl

-    if compact.new_nodes_idx <= length(compact.perm) && compact.ir.new_nodes[compact.perm[compact.new_nodes_idx]][1] == idx
+    if compact.new_nodes_idx <= length(compact.perm) &&
+        (entry =  compact.ir.new_nodes[compact.perm[compact.new_nodes_idx]];
+         entry.reverse_affinity ? entry.pos == idx - 1 : entry.pos == idx)


let's not be clever with how much logic we can pack into the conditional

vtjnash · 2018-05-07T19:40:03Z

base/compiler/ssair/ir.jl

@@ -617,10 +758,12 @@ function next(compact::IncrementalCompact, (idx, active_bb)::Tuple{Int, Int})
    return (old_result_idx, compact.result[old_result_idx]), (compact.idx, active_bb)
 end

-function maybe_erase_unused!(extra_worklist, compact, idx)
-    effect_free = stmt_effect_free(compact.result[idx], compact, compact.ir.mod)
+function maybe_erase_unused!(extra_worklist, compact, idx, callback = x->nothing)


type signature?

vtjnash · 2018-05-07T19:41:40Z

base/compiler/ssair/ir.jl

+    bb = compact.result_bbs[end]
+    compact.result_bbs[end] = BasicBlock(bb,
+                StmtRange(first(bb.stmts), result_idx-1))
+end


is the BasicBlock return value intended to be meaningful?

vtjnash · 2018-05-07T19:56:56Z

base/compiler/ssair/passes.jl

+        end
+
+        # Insert PhiNodes
+        lifted_phis = map(visited_phinodes) do item


since the front-end doesn't have domtree analysis, wrap this in a let block to explicitly capture variables (to avoid preventing inference from seeing the types of these variables everywhere in the containing function)

vtjnash · 2018-05-07T19:56:56Z

base/compiler/ssair/passes.jl

+
+function count_uses(stmt, uses)
+    for ur in userefs(stmt)
+        if isa(ur[], SSAValue)


lift ur[] to a variable

vtjnash · 2018-05-07T19:58:04Z

base/compiler/ssair/passes.jl

+    end
+end
+
+function mark_phi_cycles(compact, safe_phis, phi)


vtjnash · 2018-05-07T19:59:05Z

base/compiler/ssair/passes.jl

+    changed = true
+    while changed
+        changed = false
+        safe_phis = IdSet{Int}()


IdSet{Int} are very slow. Use a BitSet.

vtjnash · 2018-05-07T20:01:20Z

base/compiler/ssair/slot2ssa.jl

+            node = renumber_ssa!(isa(entry.node, PhiNode) ?
+                rename_phinode_edges(entry.node, 0, result_order, bb_rename) : entry.node,
+                inst_rename, true)
+        ) for entry in ir.new_nodes]


avoid creating a closure over Box variables here (see elsewhere for how to re-write this)

Keno · 2018-05-07T20:07:11Z

As I said on slack, this is not currently in a reviewable state. You're welcome to continue, but you'll likely have to do it over when I pick this up again.

This is a rebased and fixed version of the improved SROA pass from #26778. There's a decent piece of new infrastructure wrapped up in this: The ability to insert new nodes during compaction. This is a bit tricky because it requires tracking which version of the statements buffer a given SSAValue belongs to. At the moment this is done mostly manually, but I'm hoping to clean that up in the future. The idea of the new SROA pass is fairly straightforward: Given a use of an interesting value, it traces through all phi nodes, finding all leaves, applies whatever transformation to those leaves and then re-inserts a phi nest corresponding to the phi nest of the original value.

Keno · 2018-05-16T19:13:14Z

Superseded by #27126.

ararslan added the compiler:optimizer Optimization passes (mostly in base/compiler/ssair/) label Apr 11, 2018

Keno mentioned this pull request Apr 27, 2018

[NewOptimizer] Backport reverse affinity changes from #26778 #26918

Merged

Keno added a commit that referenced this pull request May 1, 2018

Merge pull request #26918 from JuliaLang/kf/reverseaffinity

23388ae

[NewOptimizer] Backport reverse affinity changes from #26778

vtjnash reviewed May 7, 2018

View reviewed changes

Keno mentioned this pull request May 13, 2018

Benchmark new optimizer #26795

Closed

18 tasks

WIP: The one SROA pass to rule them all

796812b

Keno force-pushed the kf/bigsroa branch from b4f9f48 to 796812b Compare May 13, 2018 19:29

Keno mentioned this pull request May 16, 2018

Introduce improved SROA pass #27126

Merged

Keno closed this May 16, 2018

DilumAluthge deleted the kf/bigsroa branch March 25, 2021 21:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: [NewOptimizer] The one SROA pass to rule them all #26778

WIP: [NewOptimizer] The one SROA pass to rule them all #26778

Keno commented Apr 11, 2018

Keno commented Apr 11, 2018 •

edited

Loading

vtjnash May 7, 2018 •

edited

Loading

Keno May 7, 2018

vtjnash May 7, 2018

vtjnash May 7, 2018

vtjnash May 7, 2018 •

edited

Loading

vtjnash May 7, 2018

vtjnash May 7, 2018

vtjnash May 7, 2018

vtjnash May 7, 2018

vtjnash May 7, 2018

vtjnash May 7, 2018

vtjnash May 7, 2018

vtjnash May 7, 2018

vtjnash May 7, 2018

vtjnash May 7, 2018

Keno commented May 7, 2018

Keno commented May 16, 2018

WIP: [NewOptimizer] The one SROA pass to rule them all #26778

WIP: [NewOptimizer] The one SROA pass to rule them all #26778

Conversation

Keno commented Apr 11, 2018

Keno commented Apr 11, 2018 • edited Loading

vtjnash May 7, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vtjnash May 7, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Keno commented May 7, 2018

Keno commented May 16, 2018

Keno commented Apr 11, 2018 •

edited

Loading

vtjnash May 7, 2018 •

edited

Loading

vtjnash May 7, 2018 •

edited

Loading