-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: [NewOptimizer] The one SROA pass to rule them all #26778
Conversation
code_typed on master: https://gist.github.com/Keno/f1146fd6c7f8688cb00dd419f48782ab It's fun to count the number of |
In #26778, I changed the way that the reverse affinity flag works in order to support node insertion during compaction. In particular, I had originally (for ease of implementation primarily) had that flag simply change which BB an inserted node belongs to (i.e. if that flag was set and there was a basic block boundary at the insertion site, the new node would belong to the previous BB). In #26778, I changed this to instead be a flag of whether to insert before or after a given instruction. As it turns out, this change is also required for correctness. Because domsorting can change the notion of "previous instruction" by moving basic blocks, we had (in rare circumstances) instructions end up in the wrong BB. Backporting those changes fixes that.
In #26778, I changed the way that the reverse affinity flag works in order to support node insertion during compaction. In particular, I had originally (for ease of implementation primarily) had that flag simply change which BB an inserted node belongs to (i.e. if that flag was set and there was a basic block boundary at the insertion site, the new node would belong to the previous BB). In #26778, I changed this to instead be a flag of whether to insert before or after a given instruction. As it turns out, this change is also required for correctness. Because domsorting can change the notion of "previous instruction" by moving basic blocks, we had (in rare circumstances) instructions end up in the wrong BB. Backporting those changes fixes that.
In #26778, I changed the way that the reverse affinity flag works in order to support node insertion during compaction. In particular, I had originally (for ease of implementation primarily) had that flag simply change which BB an inserted node belongs to (i.e. if that flag was set and there was a basic block boundary at the insertion site, the new node would belong to the previous BB). In #26778, I changed this to instead be a flag of whether to insert before or after a given instruction. As it turns out, this change is also required for correctness. Because domsorting can change the notion of "previous instruction" by moving basic blocks, we had (in rare circumstances) instructions end up in the wrong BB. Backporting those changes fixes that.
[NewOptimizer] Backport reverse affinity changes from #26778
base/compiler/ssair/ir.jl
Outdated
line::Int | ||
end | ||
function NewNode(nnode::NewNode; pos=nnode.pos, reverse_affinity=nnode.reverse_affinity, | ||
typ=nnode.typ, node=nnode.node, line=nnode.line) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't use named tuples for this – performance and compiler correctness will be really bad
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already fixed on master - I just haven't rebased this change since.
base/compiler/ssair/ir.jl
Outdated
function getindex(compact::IncrementalCompact, ssa::NewSSAValue) | ||
return compact.new_new_nodes[ssa.id].node | ||
end | ||
|
||
function count_added_node!(compact, v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for performance, add type-sig annotations
|
||
function resort_pending!(compact) | ||
sort!(compact.pending_perm, DEFAULT_STABLE, Order.By(x->compact.pending_nodes[x].pos)) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this returning a value?
sort!(compact.pending_perm, DEFAULT_STABLE, Order.By(x->compact.pending_nodes[x].pos)) | ||
end | ||
|
||
function insert_node!(compact::IncrementalCompact, before, @nospecialize(typ), @nospecialize(val), reverse_affinity::Bool=false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what type is before
- looks like this should written to use dispatch?
entry = compact.pending_nodes[pos - length(compact.ir.stmts) - length(compact.ir.new_nodes)] | ||
pos, reverse_affinity = entry.pos, entry.reverse_affinity | ||
end | ||
line = 0 #compact.ir.lines[before.id] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO?
base/compiler/ssair/ir.jl
Outdated
entry.pos == idx && entry.reverse_affinity | ||
end | ||
|
||
function process_newnode!(compact, new_idx, new_node_entry, idx, active_bb) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type signature?
base/compiler/ssair/ir.jl
Outdated
if compact.new_nodes_idx <= length(compact.perm) && compact.ir.new_nodes[compact.perm[compact.new_nodes_idx]][1] == idx | ||
if compact.new_nodes_idx <= length(compact.perm) && | ||
(entry = compact.ir.new_nodes[compact.perm[compact.new_nodes_idx]]; | ||
entry.reverse_affinity ? entry.pos == idx - 1 : entry.pos == idx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's not be clever with how much logic we can pack into the conditional
@@ -617,10 +758,12 @@ function next(compact::IncrementalCompact, (idx, active_bb)::Tuple{Int, Int}) | |||
return (old_result_idx, compact.result[old_result_idx]), (compact.idx, active_bb) | |||
end | |||
|
|||
function maybe_erase_unused!(extra_worklist, compact, idx) | |||
effect_free = stmt_effect_free(compact.result[idx], compact, compact.ir.mod) | |||
function maybe_erase_unused!(extra_worklist, compact, idx, callback = x->nothing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type signature?
bb = compact.result_bbs[end] | ||
compact.result_bbs[end] = BasicBlock(bb, | ||
StmtRange(first(bb.stmts), result_idx-1)) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the BasicBlock
return value intended to be meaningful?
end | ||
|
||
# Insert PhiNodes | ||
lifted_phis = map(visited_phinodes) do item |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since the front-end doesn't have domtree analysis, wrap this in a let
block to explicitly capture variables (to avoid preventing inference from seeing the types of these variables everywhere in the containing function)
|
||
function count_uses(stmt, uses) | ||
for ur in userefs(stmt) | ||
if isa(ur[], SSAValue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lift ur[]
to a variable
end | ||
end | ||
|
||
function mark_phi_cycles(compact, safe_phis, phi) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type sig
changed = true | ||
while changed | ||
changed = false | ||
safe_phis = IdSet{Int}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IdSet{Int}
are very slow. Use a BitSet
.
base/compiler/ssair/slot2ssa.jl
Outdated
node = renumber_ssa!(isa(entry.node, PhiNode) ? | ||
rename_phinode_edges(entry.node, 0, result_order, bb_rename) : entry.node, | ||
inst_rename, true) | ||
) for entry in ir.new_nodes] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avoid creating a closure over Box
variables here (see elsewhere for how to re-write this)
As I said on slack, this is not currently in a reviewable state. You're welcome to continue, but you'll likely have to do it over when I pick this up again. |
This is a rebased and fixed version of the improved SROA pass from #26778. There's a decent piece of new infrastructure wrapped up in this: The ability to insert new nodes during compaction. This is a bit tricky because it requires tracking which version of the statements buffer a given SSAValue belongs to. At the moment this is done mostly manually, but I'm hoping to clean that up in the future. The idea of the new SROA pass is fairly straightforward: Given a use of an interesting value, it traces through all phi nodes, finding all leaves, applies whatever transformation to those leaves and then re-inserts a phi nest corresponding to the phi nest of the original value.
This is a rebased and fixed version of the improved SROA pass from #26778. There's a decent piece of new infrastructure wrapped up in this: The ability to insert new nodes during compaction. This is a bit tricky because it requires tracking which version of the statements buffer a given SSAValue belongs to. At the moment this is done mostly manually, but I'm hoping to clean that up in the future. The idea of the new SROA pass is fairly straightforward: Given a use of an interesting value, it traces through all phi nodes, finding all leaves, applies whatever transformation to those leaves and then re-inserts a phi nest corresponding to the phi nest of the original value.
Superseded by #27126. |
This is more of a "where this is going" PR. I hope we won't need to finish this until after we tag 0.7 and have some time to clean it up. The current getfield elim pass in the new optimizer is good enough to handle everything the old optimizer did as well as most simple cases of the new iteration protocol (basically where the variable is checked against nothing right after the phi). However, there are cases that are more complicated which the current getfield elim pass does not handle. To demonstrate that that is just a temporary limitation, this branch implements the more advanced version of that pass, in a sufficient state to do some benchmarking on it (but probably gets corner cases wrong, so a bunch more work is required to fix and cleanup). Here's a motivating example (due to @JeffBezanson). On current master:
With this PR and the new optimizer enabled:
(Timings are faster as well, but I ran them on different machines, so I omitted them since they're not useful comparisons).
The loop has
262144
iterations, so this demonstrates that the only object that gets allocated is the tuple that is required to be allocated, while the current master allocates more than500000
useless intermediate objects (and of course if the dosomething actually was inlineable and did something with the string, the allocation of the tuple would likely go away as well on this branch).