maintain external alias set for more efficiency

aviatesk · Dec 25, 2021 · 8f1a6aa · 8f1a6aa
1 parent 5ea6139
commit 8f1a6aa
Show file tree

Hide file tree

Showing 2 changed files with 408 additions and 180 deletions.
diff --git a/README.md b/README.md
@@ -5,59 +5,89 @@
 
 This analysis works on a lattice called `x::EscapeLattice`, which holds the following properties:
 - `x.Analyzed::Bool`: not formally part of the lattice, indicates `x` has not been analyzed at all
-- `x.ReturnEscape::Bool`: indicates `x` may escape to the caller via return (possibly as a field),
-    where `x.ReturnEscape && 0 ∈ x.EscapeSites` has the special meaning that it's visible to
-    the caller simply because it's passed as call argument
-- `x.ThrownEscape::Bool`: indicates `x` may escape to somewhere through an exception (possibly as a field)
-- `x.EscapeSites::BitSet`: records program counters (SSA numbers) where `x` can escape
-- `x.FieldSets::Union{Vector{IdSet{Any}},Bool}`: maintains the sets of possible values of fields of `x`:
-  * `x.FieldSets === false` indicates the fields of `x` isn't analyzed yet
-  * `x.FieldSets === true` indicates the fields of `x` can't be analyzed, e.g. the type of `x`
-    is not concrete and thus the number of its fields can't known precisely
-  * otherwise `x.FieldSets::Vector{IdSet{Any}}` holds all the possible values of each field,
-    where `x.FieldSets[i]` keeps all possibilities that the `i`th field can be
-- `x.ArgEscape::Int` (not implemented yet): indicates it will escape to the caller through `setfield!` on argument(s)
-  * `-1` : no escape
-  * `0` : unknown or multiple
-  * `n` : through argument N
+- `x.ReturnEscape::Bool`: indicates `x` may escape to the caller via return
+- `x.ThrownEscape::Bool`: indicates `x` may escape to somewhere through an exception
+- `x.EscapeSites::BitSet`: records program counters (SSA numbers) where `x` can escape (via any of the escapes)
+- `x.FieldSets::Union{Vector{IdSet{Any}},Bool}`: maintains all possible values that impose
+  escape information on fields of `x`
+- `x.ArgEscape::Int` (not implemented yet): indicates it will escape to the caller through
+  `setfield!` on argument(s)
 
 These attributes can be combined to create a partial lattice that has a finite height, given
 that input program has a finite number of statements, which is assured by Julia's semantics.
 
-There are utility constructors to create common `EscapeLattice`s, e.g.,
-- `NoEscape()`: the bottom element of this lattice, meaning it won't escape to anywhere
-- `AllEscape()`: the topmost element of this lattice, meaning it will escape to everywhere
+Escape analysis implementation is based on the data-flow algorithm described in the paper[^MM02].
+The analysis works on the lattice of `EscapeLattice` and transitions lattice elements from the
+bottom to the top until every lattice gets converged to a fixed point by maintaining a (conceptual)
+working set that contains program counters corresponding to  remaining SSA statements to be analyzed.
+The analysis only manages a single global state that tracks `EscapeLattice` of each argument
+and SSA statement, but also note that some flow-sensitivity is being encoded as program
+counters recorded in the `EscapeSites` property of each lattice element, which can be
+combined with domination analysis to reason about flow-sensitivity if necessary.
 
-The escape analysis will transition these elements from the bottom to the top,
-in the same direction as Julia's native type inference routine.
-An abstract state will be initialized with the bottom(-like) elements:
-- the call arguments are initialized as `ArgumentReturnEscape()`, because they're visible from a caller immediately
-- the other states are initialized as `NotAnalyzed()`, which is a special lattice element that
-  is slightly lower than `NoEscape`, but at the same time doesn't represent any meaning
-  other than it's not analyzed yet (thus it's not formally part of the lattice).
+One distinctive design of this analysis is that escape information is propagated in a
+_backward_ way, i.e. data flows _from usages to definitions_.
+For example, in the code snippet below, EA first analyzes the statement `return obj` and
+imposes `ReturnEscape` on `obj`, and then it analyzes `obj = Expr(:new, Obj, val)` and
+propagates `ReturnEscape` imposed on `obj` to `val`:
+```julia
+obj = Expr(:new, Obj, val) # lowered from `Obj(val)`
+return obj
+```
+The key observation here is that this backward analysis allows escape information to flow
+naturally along the use-def chain rather than control-flow, which can better handled by
+forward analysis otherwise. As a result, this scheme enables a very simple implementation of
+escape analysis, e.g. `PhiNode` for example can be handled relatively easily since we just
+need to propagate escape information imposed on it to its predecessors.
 
-Escape analysis implementation is based on the data-flow algorithm described in the old paper [^MM02].
-The analysis works on the lattice of [`EscapeLattice`](@ref) and transitions lattice elements
-from the bottom to the top in a _backward_ way, i.e. data flows from usage cites to definitions,
-until every lattice gets converged to a fixed point by maintaining a (conceptual) working set
-that contains program counters corresponding to remaining SSA statements to be analyzed.
-The analysis only manages a single global state that tracks `EscapeLattice` of each argument
-and SSA statement, but also note that some flow-sensitivity is encoded as program counters
-recorded in the `EscapeSites` property of each each lattice element.
+It would be also worth noting the `FieldSets` property enables a backward field analysis.
+It tracks all possibilities that _can escape fields of object_, which can be analyzed at
+"usage" sites, and escape information imposed on those tracked possibilities are propagated
+to the actual field values later at "definition" site. Especially, the analysis records a
+value that may impose escape information on field of object at `getfield` call, and then it
+propagates that escape information to the field when analyzing `Expr(:new)` or `setfield!`
+expressions.
+```julia
+obj = Expr(:new, Obj, val)
+v = getfield(obj, :val)
+return v
+```
+In the example above, `ReturnEscape` imposed on `v` is _not_ directly propagated to `obj`.
+Rather the identity of `v` is recorded in `obj`'s `FieldSets[1]` and then `v`'s escape
+information is propagated to `val` when `obj = Expr(:new, Obj, val)` is analyzed.
+
+Finally, the analysis also needs to track which values can be aliased to each other. This is
+needed because in Julia IR, the same object is sometimes represented by different IR elements.
+Since the analysis maintains `EscapeLattice` per IR element, we need to make sure those different
+IR elements that actually represent the same object to share the same escape information.
+Those program constructs that return the same object as their operand(s) like `PiNode` and
+`typeassert` are obvious examples that require this escape information aliasing.
+But the escape information equalization between aliased values is needed for other cases as
+well, most notably, it is necessary for correctly reasoning about mutations on `PhiNode`.
+Now let's consider the following example; `ϕ1` and `ϕ2` are aliased and thus `ReturnEscape`
+imposed on `y = ϕ1[]` needs to be propagated to `ϕ2[] = x`. The escape information can be
+propagated if the escape states of _predecessors_ of `ϕ1` and `ϕ2` (i.e. those two
+`RefValue` objects) are shared ("equalized"):
+```julia
+if cond::Bool
+    ϕ2 = ϕ1 = Ref("foo")
+else
+    ϕ2 = ϕ1 = Ref("bar")
+end
+ϕ2[] = x::String
+y = ϕ1[]
+return y
+```
 
-The analysis also collects alias information using an approach, which is inspired by
-the escape analysis algorithm explained in yet another old paper [^JVM05].
-In addition to managing escape lattice elements, the analysis state also maintains an "alias set",
-which is implemented as a disjoint set of aliased arguments and SSA statements.
-When the fields of object `x` are known precisely (i.e. `x.FieldSets isa Vector{IdSet{Any}}` holds),
-the alias set is updated each time `z = getfield(x, y)` is encountered in a way that `z` is
-aliased to all values of `x.FieldSets[y]`, so that escape information imposed on `z` will be
-propagated to all the aliased values and `z` can be replaced with an aliased value later.
-Note that in a case when the fields of object `x` can't known precisely (i.e. `x.FieldSets` is `true`),
-when `z = getfield(x, y)` is analyzed, escape information of `z` is propagated to `x` rather
-than any of `x`'s fields, which is the most conservative propagation since escape information
-imposed on `x` will end up being propagated to all of its fields anyway at definitions of `x`
-(i.e. `:new` expression or `setfield!` call).
+However, one interesting property of such alias information is that it is not known at "usage"
+site but can be derived at "definition" site (as aliasing is conceptually equivalent to assignment),
+and thus it doesn't naturally flow in a backward way. This means it can be inefficient if we
+update escape information whenever we see new aliasing. Rather, in order to handle aliasing
+effectively, EscapeAnalysis.jl uses an approach inspired by the escape analysis algorithm
+explained in this old JVM paper[^JVM05]. That is, in addition to managing escape lattice
+elements, the analysis also maintains an "equi-alias set", a disjoint set of aliased
+arguments and SSA statements which allows escape information between newly aliased values
+to be equalized efficiently.
 
 [^MM02]: _A Graph-Free approach to Data-Flow Analysis_.
          Markas Mohnen, 2002, April.