a simple and flow-insensitive alias analysis

This commit implements a simple, flow-insensitive alias analysis using an approach inspired by the escape analysis algorithm explained in the old JVM paper [^JVM05]. `EscapeLattice` is extended so that it also keeps track of possible field values. In more detail, `x::EscapeLattice` has the new field called `x.FieldSet::Union{Vector{IdSet{Any}},Bool}`, where: - `x.FieldSets === false` indicates the fields of `x` isn't analyzed yet - `x.FieldSets === true` indicates the fields of `x` can't be analyzed, e.g. the type of `x` is not concrete and thus the number of its fields can't known precisely - otherwise `x.FieldSets::Vector{IdSet{Any}}` holds all the possible values of each field, where `x.FieldSets[i]` keeps all possibilities that the `i`th field can be And now, in addition to managing escape lattice elements, the analysis state also maintains an "alias set" `state.aliasset::IntDisjointSet{Int}`, which is implemented as a disjoint set of aliased arguments and SSA statements. When the fields of object `x` are known precisely (i.e. `x.FieldSets isa Vector{IdSet{Any}}` holds), the alias set is updated each time `z = getfield(x, y)` is encountered in a way that `z` is aliased to all values of `x.FieldSets[y]`, so that escape information imposed on `z` will be propagated to all the aliased values and `z` can be replaced with an aliased value later. Note that in a case when the fields of object `x` can't known precisely (i.e. `x.FieldSets` is `true`), when `z = getfield(x, y)` is analyzed, escape information of `z` is propagated to `x` rather than any of `x`'s fields, which is the most conservative propagation since escape information imposed on `x` will end up being propagated to all of its fields anyway at definitions of `x` (i.e. `:new` expression or `setfield!` call). [^JVM05]: Escape Analysis in the Context of Dynamic Compilation and Deoptimization. Thomas Kotzmann and Hanspeter Mössenböck, 2005, June. <https://dl.acm.org/doi/10.1145/1064979.1064996>. Now this alias analysis should allow us to implement a "stronger" SROA, which eliminates the allocation of `r` within the following code: ```julia julia> result = analyze_escapes((String,)) do s r = Ref(s) broadcast(identity, r) end \#3(_2::String *, _3::Base.RefValue{String} ◌) in Main at REPL[2]:2 2 ↓ 1 ─ %1 = %new(Base.RefValue{String}, _2)::Base.RefValue{String} │╻╷╷ Ref 3 ✓ │ %2 = Core.tuple(%1)::Tuple{Base.RefValue{String}} │╻ broadcast ↓ │ %3 = Core.getfield(%2, 1)::Base.RefValue{String} ││ ◌ └── goto #3 if not true ││╻╷ materialize ◌ 2 ─ nothing::Nothing │ * 3 ┄ %6 = Base.getfield(%3, :x)::String │││╻╷╷╷╷ copy ◌ └── goto #4 ││││┃ getindex ◌ 4 ─ goto #5 ││││ ◌ 5 ─ goto #6 │││ ◌ 6 ─ goto #7 ││ ◌ 7 ─ return %6 │ julia> EscapeAnalysis.get_aliases(result.state.aliasset, Core.SSAValue(6), result.ir) 2-element Vector{Union{Core.Argument, Core.SSAValue}}: Core.Argument(2) :(%6) ``` Note that the allocation `%1` isn't analyzed as `ReturnEscape`, still `_2` is analyzed so.
aviatesk · Nov 19, 2021 · e2046b9 · e2046b9
1 parent 23a0439
commit e2046b9
Show file tree

Hide file tree

Showing 4 changed files with 844 additions and 107 deletions.
diff --git a/README.md b/README.md
@@ -10,6 +10,12 @@ This analysis works on a lattice called `x::EscapeLattice`, which holds the foll
     the caller simply because it's passed as call argument
 - `x.ThrownEscape::Bool`: indicates `x` may escape to somewhere through an exception (possibly as a field)
 - `x.EscapeSites::BitSet`: records program counters (SSA numbers) where `x` can escape
+- `x.FieldSets::Union{Vector{IdSet{Any}},Bool}`: maintains the sets of possible values of fields of `x`:
+  * `x.FieldSets === false` indicates the fields of `x` isn't analyzed yet
+  * `x.FieldSets === true` indicates the fields of `x` can't be analyzed, e.g. the type of `x`
+    is not concrete and thus the number of its fields can't known precisely
+  * otherwise `x.FieldSets::Vector{IdSet{Any}}` holds all the possible values of each field,
+    where `x.FieldSets[i]` keeps all possibilities that the `i`th field can be
 - `x.ArgEscape::Int` (not implemented yet): indicates it will escape to the caller through `setfield!` on argument(s)
   * `-1` : no escape
   * `0` : unknown or multiple
@@ -30,7 +36,7 @@ An abstract state will be initialized with the bottom(-like) elements:
   is slightly lower than `NoEscape`, but at the same time doesn't represent any meaning
   other than it's not analyzed yet (thus it's not formally part of the lattice).
 
-Escape analysis implementation is based on the data-flow algorithm described in the paper [^MM02].
+Escape analysis implementation is based on the data-flow algorithm described in the old paper [^MM02].
 The analysis works on the lattice of [`EscapeLattice`](@ref) and transitions lattice elements
 from the bottom to the top in a _backward_ way, i.e. data flows from usage cites to definitions,
 until every lattice gets converged to a fixed point by maintaining a (conceptual) working set
@@ -39,6 +45,24 @@ The analysis only manages a single global state that tracks `EscapeLattice` of e
 and SSA statement, but also note that some flow-sensitivity is encoded as program counters
 recorded in the `EscapeSites` property of each each lattice element.
 
+The analysis also collects alias information using an approach, which is inspired by
+the escape analysis algorithm explained in yet another old paper [^JVM05].
+In addition to managing escape lattice elements, the analysis state also maintains an "alias set",
+which is implemented as a disjoint set of aliased arguments and SSA statements.
+When the fields of object `x` are known precisely (i.e. `x.FieldSets isa Vector{IdSet{Any}}` holds),
+the alias set is updated each time `z = getfield(x, y)` is encountered in a way that `z` is
+aliased to all values of `x.FieldSets[y]`, so that escape information imposed on `z` will be
+propagated to all the aliased values and `z` can be replaced with an aliased value later.
+Note that in a case when the fields of object `x` can't known precisely (i.e. `x.FieldSets` is `true`),
+when `z = getfield(x, y)` is analyzed, escape information of `z` is propagated to `x` rather
+than any of `x`'s fields, which is the most conservative propagation since escape information
+imposed on `x` will end up being propagated to all of its fields anyway at definitions of `x`
+(i.e. `:new` expression or `setfield!` call).
+
 [^MM02]: _A Graph-Free approach to Data-Flow Analysis_.
          Markas Mohnen, 2002, April.
          <https://api.semanticscholar.org/CorpusID:28519618>.
+
+[^JVM05]: _Escape Analysis in the Context of Dynamic Compilation and Deoptimization_.
+          Thomas Kotzmann and Hanspeter Mössenböck, 2005, June.
+          <https://dl.acm.org/doi/10.1145/1064979.1064996>.